亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tfoot id='LfnFp'></tfoot>

<legend id='zZzWJ'><style id='L0zxB'><dir id='VZp06'><q id='cP4Pv'></q></dir></style></legend>

<i id='V3nOI'><tr id='oEqyJ'><dt id='Ink47'><q id='xcyG8'><span id='r4YYO'><b id='0bCm0'><form id='o7Jwa'><ins id='G6iUh'></ins><ul id='APISZ'></ul><sub id='2SL7z'></sub></form><legend id='ISa6e'></legend><bdo id='3ezaJ'><pre id='mbezv'><center id='MOY4z'></center></pre></bdo></b><th id='C4OzG'></th></span></q></dt></tr></i><div id='OdLzB'><tfoot id='H8Ant'></tfoot><dl id='WFq8Y'><fieldset id='QaFV8'></fieldset></dl></div>

<li id='o5VTm'><abbr id='QZcny'></abbr></li>

·

3D · MoDELS · 控制器 · state-of-the-art · 評論員 ·

2023 年 12 月 2 日

Diffusion Handles: Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D

Karran Pandey,Paul Guerrero,Matheus Gadelha,Yannick Hold-Geoffroy,Karan Singh,Niloy Mitra

Diffusion Handles is a novel approach to enabling 3D object edits on diffusion images. We accomplish these edits using existing pre-trained diffusion models, and 2D image depth estimation, without any fine-tuning or 3D object retrieval. The edited results remain plausible, photo-real, and preserve object identity. Diffusion Handles address a critically missing facet of generative image based creative design, and significantly advance the state-of-the-art in generative image editing. Our key insight is to lift diffusion activations for an object to 3D using a proxy depth, 3D-transform the depth and associated activations, and project them back to image space. The diffusion process applied to the manipulated activations with identity control, produces plausible edited images showing complex 3D occlusion and lighting effects. We evaluate Diffusion Handles: quantitatively, on a large synthetic data benchmark; and qualitatively by a user study, showing our output to be more plausible, and better than prior art at both, 3D editing and identity control.

相關內容

3D是英文“Three Dimensions”的簡稱，中文是指三維、三個維度、三個坐標，即有長、有寬、有高，換句話說，就是立體的，是相對于只有長和寬的平面（2D）而言。

Learning · 無監督 · 無監督學習 · Performer · 3D ·

2024 年 1 月 26 日

DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric Voxelization

Yanpeng Zhao,Siyu Gao,Yunbo Wang,Xiaokang Yang

Unsupervised learning of object-centric representations in dynamic visual scenes is challenging. Unlike most previous approaches that learn to decompose 2D images, we present DynaVol, a 3D scene generative model that unifies geometric structures and object-centric learning in a differentiable volume rendering framework. The key idea is to perform object-centric voxelization to capture the 3D nature of the scene, which infers the probability distribution over objects at individual spatial locations. These voxel features evolve over time through a canonical-space deformation function, forming the basis for global representation learning via slot attention. The voxel features and global features are complementary and are both leveraged by a compositional NeRF decoder for volume rendering. DynaVol remarkably outperforms existing approaches for unsupervised dynamic scene decomposition. Once trained, the explicitly meaningful voxel features enable additional capabilities that 2D scene decomposition methods cannot achieve: it is possible to freely edit the geometric shapes or manipulate the motion trajectories of the objects.

數據集 · LIDAR · Analysis · 三維重建 · 3D ·

2024 年 1 月 25 日

GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting

Butian Xiong,Zhuo Li,Zhen Li

from arxiv, IJCAI2024 submit, 8 pages

We introduce a novel large-scale scene reconstruction benchmark using the newly developed 3D representation approach, Gaussian Splatting, on our expansive U-Scene dataset. U-Scene encompasses over one and a half square kilometres, featuring a comprehensive RGB dataset coupled with LiDAR ground truth. For data acquisition, we employed the Matrix 300 drone equipped with the high-accuracy Zenmuse L1 LiDAR, enabling precise rooftop data collection. This dataset, offers a unique blend of urban and academic environments for advanced spatial analysis convers more than 1.5 km$^2$. Our evaluation of U-Scene with Gaussian Splatting includes a detailed analysis across various novel viewpoints. We also juxtapose these results with those derived from our accurate point cloud dataset, highlighting significant differences that underscore the importance of combine multi-modal information

圖像還原 · 縮放 · MoDELS · 采樣法 · Pivotal（公司） ·

2024 年 1 月 24 日

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Fanghua Yu,Jinjin Gu,Zheyuan Li,Jinfan Hu,Xiangtao Kong,Xintao Wang,Jingwen He,Yu Qiao,Chao Dong

We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up. Leveraging multi-modal techniques and advanced generative prior, SUPIR marks a significant advance in intelligent and realistic image restoration. As a pivotal catalyst within SUPIR, model scaling dramatically enhances its capabilities and demonstrates new potential for image restoration. We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each enriched with descriptive text annotations. SUPIR provides the capability to restore images guided by textual prompts, broadening its application scope and potential. Moreover, we introduce negative-quality prompts to further improve perceptual quality. We also develop a restoration-guided sampling method to suppress the fidelity issue encountered in generative-based restoration. Experiments demonstrate SUPIR's exceptional restoration effects and its novel capacity to manipulate restoration through textual prompts.

CLIP · 可理解性 · 圖像檢索 · 峰值 · Learning ·

2024 年 1 月 24 日

Enhancing Image Retrieval : A Comprehensive Study on Photo Search using the CLIP Mode

Naresh Kumar Lahajal,Harini S

Photo search, the task of retrieving images based on textual queries, has witnessed significant advancements with the introduction of CLIP (Contrastive Language-Image Pretraining) model. CLIP leverages a vision-language pre training approach, wherein it learns a shared representation space for images and text, enabling cross-modal understanding. This model demonstrates the capability to understand the semantic relationships between diverse image and text pairs, allowing for efficient and accurate retrieval of images based on natural language queries. By training on a large-scale dataset containing images and their associated textual descriptions, CLIP achieves remarkable generalization, providing a powerful tool for tasks such as zero-shot learning and few-shot classification. This abstract summarizes the foundational principles of CLIP and highlights its potential impact on advancing the field of photo search, fostering a seamless integration of natural language understanding and computer vision for improved information retrieval in multimedia applications

Analysis · Prompt · 生成式人工智能 · AI · AIM ·

2024 年 1 月 24 日

No Longer Trending on Artstation: Prompt Analysis of Generative AI Art

Jon McCormack,Maria Teresa Llano,Stephen James Krol,Nina Rajcic

from arxiv, Paper accepted for EvoMUSART 2024, Aberystwyth, Wales, United Kingdom, 3-5 April 2024

Image generation using generative AI is rapidly becoming a major new source of visual media, with billions of AI generated images created using diffusion models such as Stable Diffusion and Midjourney over the last few years. In this paper we collect and analyse over 3 million prompts and the images they generate. Using natural language processing, topic analysis and visualisation methods we aim to understand collectively how people are using text prompts, the impact of these systems on artists, and more broadly on the visual cultures they promote. Our study shows that prompting focuses largely on surface aesthetics, reinforcing cultural norms, popular conventional representations and imagery. We also find that many users focus on popular topics (such as making colouring books, fantasy art, or Christmas cards), suggesting that the dominant use for the systems analysed is recreational rather than artistic.

FAST · 模型評估 · 推斷 · Vision · MoDELS ·

2024 年 1 月 24 日

VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition

Xianfu Cheng,Weixiao Zhou,Xiang Li,Xiaoming Chen,Jian Yang,Tongliang Li,Zhoujun Li

from arxiv, 9 pages, 3 figures, 6 tables

Scene Text Recognition (STR) is a challenging task that involves recognizing text within images of natural scenes. Although current state-of-the-art models for STR exhibit high performance, they typically suffer from low inference efficiency due to their reliance on hybrid architectures comprised of visual encoders and sequence decoders. In this work, we propose the VIsion Permutable extractor for fast and efficient scene Text Recognition (VIPTR), which achieves an impressive balance between high performance and rapid inference speeds in the domain of STR. Specifically, VIPTR leverages a visual-semantic extractor with a pyramid structure, characterized by multiple self-attention layers, while eschewing the traditional sequence decoder. This design choice results in a lightweight and efficient model capable of handling inputs of varying sizes. Extensive experimental results on various standard datasets for both Chinese and English scene text recognition validate the superiority of VIPTR. Notably, the VIPTR-T (Tiny) variant delivers highly competitive accuracy on par with other lightweight models and achieves SOTA inference speeds. Meanwhile, the VIPTR-L (Large) variant attains greater recognition accuracy, while maintaining a low parameter count and favorable inference speed. Our proposed method provides a compelling solution for the STR challenge, which blends high accuracy with efficiency and greatly benefits real-world applications requiring fast and reliable text recognition. The code is publicly available at //github.com/cxfyxl/VIPTR.

向量化 · Processing（編程語言） · 圖形處理器 · Networking · 近似 ·

2024 年 1 月 22 日

PatternPortrait: Draw Me Like One of Your Scribbles

Sabine Wieluch,Friedhelm Schwenker

This paper introduces a process for generating abstract portrait drawings from pictures. Their unique style is created by utilizing single freehand pattern sketches as references to generate unique patterns for shading. The method involves extracting facial and body features from images and transforming them into vector lines. A key aspect of the research is the development of a graph neural network architecture designed to learn sketch stroke representations in vector form, enabling the generation of diverse stroke variations. The combination of these two approaches creates joyful abstract drawings that are realized via a pen plotter. The presented process garnered positive feedback from an audience of approximately 280 participants.

MoDELS · ChatGPT · BERT · 語言模型化 · 變換 ·

2023 年 2 月 18 日

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Ce Zhou,Qian Li,Chen Li,Jun Yu,Yixin Liu,Guangjing Wang,Kai Zhang,Cheng Ji,Qiben Yan,Lifang He,Hao Peng,Jianxin Li,Jia Wu,Ziwei Liu,Pengtao Xie,Caiming Xiong,Jian Pei,Philip S. Yu,Lichao Sun

from arxiv, 97 pages, 16 figures

The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. The idea of pretraining behind PFMs plays an important role in the application of large models. Different from previous methods that apply convolution and recurrent modules for feature extractions, the generative pre-training (GPT) method applies Transformer as the feature extractor and is trained on large datasets with an autoregressive paradigm. Similarly, the BERT apples transformers to train on large datasets as a contextual language model. Recently, the ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few show prompting. With the extraordinary success of PFMs, AI has made waves in a variety of fields over the past few years. Considerable methods, datasets, and evaluation metrics have been proposed in the literature, the need is raising for an updated survey. This study provides a comprehensive review of recent research advancements, current and future challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities. We first review the basic components and existing pretraining in natural language processing, computer vision, and graph learning. We then discuss other advanced PFMs for other data modalities and unified PFMs considering the data quality and quantity. Besides, we discuss relevant research about the fundamentals of the PFM, including model efficiency and compression, security, and privacy. Finally, we lay out key implications, future research directions, challenges, and open problems.

視頻描述生成（Video Caption） · INFORMS · Performer · 蒸餾 · Extensibility ·

2020 年 3 月 31 日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Boxiao Pan,Haoye Cai,De-An Huang,Kuan-Hui Lee,Adrien Gaidon,Ehsan Adeli,Juan Carlos Niebles

from arxiv, CVPR 2020

Video captioning is a challenging task that requires a deep understanding of visual scenes. State-of-the-art methods generate captions using either scene-level or object-level information but without explicitly modeling object interactions. Thus, they often fail to make visually grounded predictions, and are sensitive to spurious correlations. In this paper, we propose a novel spatio-temporal graph model for video captioning that exploits object interactions in space and time. Our model builds interpretable links and is able to provide explicit visual grounding. To avoid unstable performance caused by the variable number of objects, we further propose an object-aware knowledge distillation mechanism, in which local object information is used to regularize global scene features. We demonstrate the efficacy of our approach through extensive experiments on two benchmarks, showing our approach yields competitive performance with interpretable predictions.

視覺問答 · 自動問答 · MoDELS · 可辨認的 · 注意力機制 ·

2018 年 2 月 15 日

Learning to Count Objects in Natural Images for Visual Question Answering

Yan Zhang,Jonathon Hare,Adam Prügel-Bennett

from arxiv, Published in ICLR 2018

Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

state-of-the-art

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tr id='24846'><strong id='24846'></strong><small id='24846'></small><button id='24846'></button><li id='24846'><noscript id='24846'><big id='24846'></big><dt id='24846'></dt></noscript></li></tr><ol id='24846'><option id='24846'><table id='24846'><blockquote id='24846'><tbody id='24846'></tbody></blockquote></table></option></ol><u id='24846'></u><kbd id='24846'><kbd id='24846'></kbd></kbd>

<code id='24846'><strong id='24846'></strong></code>

<fieldset id='24846'></fieldset>

<span id='24846'></span>

<ins id='24846'></ins>

<acronym id='24846'><em id='24846'></em><td id='24846'><div id='24846'></div></td></acronym><address id='24846'><big id='24846'><big id='24846'></big><legend id='24846'></legend></big></address>

<i id='24846'><div id='24846'><ins id='24846'></ins></div></i>

<i id='24846'></i>