两个人的电影全免费观看720_国内精品VA视频在线观看_欧美日本国产在线A观看_亚州欧美日韩国产人成在线_AIGAO精品视频在线观看_蜜桃视频一区二区三区四区A V_久久久久精品无码

from arxiv, Copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

We present a novel pipeline for learning the conditional distribution of a building roof mesh given pixels from an aerial image, under the assumption that roof geometry follows a set of regular patterns. Unlike alternative methods that require multiple images of the same object, our approach enables estimating 3D roof meshes using only a single image for predictions. The approach employs the PolyGen, a deep generative transformer architecture for 3D meshes. We apply this model in a new domain and investigate the sensitivity of the image resolution. We propose a novel metric to evaluate the performance of the inferred meshes, and our results show that the model is robust even at lower resolutions, while qualitatively producing realistic representations for out-of-distribution samples.

相關內容

Learning

關注 12

目標領域 · 知識 (knowledge) · 核化 · Performer · 小樣本學習 ·

2023 年 5 月 9 日

Few-shot Image Generation via Adaptation-Aware Kernel Modulation

Yunqing Zhao,Keshigeyan Chandrasegaran,Milad Abdollahzadeh,Ngai-Man Cheung

from arxiv, The Thirty-Sixth Annual Conference on Neural Information Processing Systems (NeurIPS 2022), 14 pages

Few-shot image generation (FSIG) aims to learn to generate new and diverse samples given an extremely limited number of samples from a domain, e.g., 10 training samples. Recent work has addressed the problem using transfer learning approach, leveraging a GAN pretrained on a large-scale source domain dataset and adapting that model to the target domain based on very limited target domain samples. Central to recent FSIG methods are knowledge preserving criteria, which aim to select a subset of source model's knowledge to be preserved into the adapted model. However, a major limitation of existing methods is that their knowledge preserving criteria consider only source domain/source task, and they fail to consider target domain/adaptation task in selecting source model's knowledge, casting doubt on their suitability for setups of different proximity between source and target domain. Our work makes two contributions. As our first contribution, we re-visit recent FSIG works and their experiments. Our important finding is that, under setups which assumption of close proximity between source and target domains is relaxed, existing state-of-the-art (SOTA) methods which consider only source domain/source task in knowledge preserving perform no better than a baseline fine-tuning method. To address the limitation of existing methods, as our second contribution, we propose Adaptation-Aware kernel Modulation (AdAM) to address general FSIG of different source-target domain proximity. Extensive experimental results show that the proposed method consistently achieves SOTA performance across source/target domains of different proximity, including challenging setups when source and target domains are more apart. Project Page: //yunqing-me.github.io/AdAM/

MoDELS · 原點 · 塑造 · Learning · Prompt ·

2023 年 5 月 8 日

ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation

Yupei Lin,Sen Zhang,Xiaojun Yang,Xiao Wang,Yukai Shi

from arxiv, //yupeilin2388.github.io/publication/ReDiffuser

Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images. However, these models are often violated by several limitations. Firstly, they require the user to provide precise and contextually relevant descriptions for the desired image modifications. Secondly, current models can impose significant changes to the original image content during the editing process. In this paper, we explore ReGeneration learning in an image-to-image Diffusion model (ReDiffuser), that preserves the content of the original image without human prompting and the requisite editing direction is automatically discovered within the text embedding space. To ensure consistent preservation of the shape during image editing, we propose cross-attention guidance based on regeneration learning. This novel approach allows for enhanced expression of the target domain features while preserving the original shape of the image. In addition, we introduce a cooperative update strategy, which allows for efficient preservation of the original shape of an image, thereby improving the quality and consistency of shape preservation throughout the editing process. Our proposed method leverages an existing pre-trained text-image diffusion model without any additional training. Extensive experiments show that the proposed method outperforms existing work in both real and synthetic image editing.

Projection · 可辨認的 · motivation · Performer · 原點 ·

2023 年 5 月 8 日

Motion Detection in Diffraction Tomography by Common Circle Methods

Michael Quellmalz,Peter Elbau,Otmar Scherzer,Gabriele Steidl

from arxiv, 35 pages, 13 figures

The method of common lines is a well-established reconstruction technique in cryogenic electron microscopy (cryo-EM), which can be used to extract the relative orientations of an object given tomographic projection images from different directions. In this paper, we deal with an analogous problem in optical diffraction tomography. Based on the Fourier diffraction theorem, we show that rigid motions of the object, i.e., rotations and translations, can be determined by detecting common circles in the Fourier-transformed data. We introduce two methods to identify common circles. The first one is motivated by the common line approach for projection images and detects the relative orientation by parameterizing the common circles in the two images. The second one assumes a smooth motion over time and calculates the angular velocity of the rotational motion via an infinitesimal version of the common circle method. Interestingly, using the stereographic projection, both methods can be reformulated as common line methods, but these lines are, in contrast to those used in cryo-EM, not confined to pass through the origin and allow for a full reconstruction of the relative orientations. Numerical proof-of-the-concept examples demonstrate the performance of our reconstruction methods.

剪枝 · Learning · Performer · 回合 · Networking ·

2023 年 5 月 5 日

Learn how to Prune Pixels for Multi-view Neural Image-based Synthesis

Marta Milovanovi?,Enzo Tartaglione,Marco Cagnazzo,Félix Henry

Image-based rendering techniques stand at the core of an immersive experience for the user, as they generate novel views given a set of multiple input images. Since they have shown good performance in terms of objective and subjective quality, the research community devotes great effort to their improvement. However, the large volume of data necessary to render at the receiver's side hinders applications in limited bandwidth environments or prevents their employment in real-time applications. We present LeHoPP, a method for input pixel pruning, where we examine the importance of each input pixel concerning the rendered view, and we avoid the use of irrelevant pixels. Even without retraining the image-based rendering network, our approach shows a good trade-off between synthesis quality and pixel rate. When tested in the general neural rendering framework, compared to other pruning baselines, LeHoPP gains between $0.9$ dB and $3.6$ dB on average.

塊 · 控制器 · MoDELS · 去噪 · 噪聲 ·

2023 年 5 月 5 日

Guided Image Synthesis via Initial Image Editing in Diffusion Model

Jiafeng Mao,Xueting Wang,Kiyoharu Aizawa

Diffusion models have the ability to generate high quality images by denoising pure Gaussian noise images. While previous research has primarily focused on improving the control of image generation through adjusting the denoising process, we propose a novel direction of manipulating the initial noise to control the generated image. Through experiments on stable diffusion, we show that blocks of pixels in the initial latent images have a preference for generating specific content, and that modifying these blocks can significantly influence the generated image. In particular, we show that modifying a part of the initial image affects the corresponding region of the generated image while leaving other regions unaffected, which is useful for repainting tasks. Furthermore, we find that the generation preferences of pixel blocks are primarily determined by their values, rather than their position. By moving pixel blocks with a tendency to generate user-desired content to user-specified regions, our approach achieves state-of-the-art performance in layout-to-image generation. Our results highlight the flexibility and power of initial image manipulation in controlling the generated image.

MoDELS · state-of-the-art · Projection · 情景 · 變換 ·

2023 年 5 月 4 日

DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

Johanna Karras,Aleksander Holynski,Ting-Chun Wang,Ira Kemelmacher-Shlizerman

from arxiv, Project page: //grail.cs.washington.edu/projects/dreampose/

We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel finetuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation. Video results are available on our project page.

3D · Continuity · 估計/估計量 · MoDELS · 正則化項 ·

2022 年 3 月 8 日

Recovering 3D Human Mesh from Monocular Images: A Survey

Yating Tian,Hongwen Zhang,Yebin Liu,Limin Wang

from arxiv, Survey paper on monocular 3D human mesh recovery, Project page: //github.com/tinatiansjz/hmr-survey

Estimating human pose and shape from monocular images is a long-standing problem in computer vision. Since the release of statistical body models, 3D human mesh recovery has been drawing broader attention. With the same goal of obtaining well-aligned and physically plausible mesh results, two paradigms have been developed to overcome challenges in the 2D-to-3D lifting process: i) an optimization-based paradigm, where different data terms and regularization terms are exploited as optimization objectives; and ii) a regression-based paradigm, where deep learning techniques are embraced to solve the problem in an end-to-end fashion. Meanwhile, continuous efforts are devoted to improving the quality of 3D mesh labels for a wide range of datasets. Though remarkable progress has been achieved in the past decade, the task is still challenging due to flexible body motions, diverse appearances, complex environments, and insufficient in-the-wild annotations. To the best of our knowledge, this is the first survey to focus on the task of monocular 3D human mesh recovery. We start with the introduction of body models and then elaborate recovery frameworks and training objectives by providing in-depth analyses of their strengths and weaknesses. We also summarize datasets, evaluation metrics, and benchmark results. Open issues and future directions are discussed in the end, hoping to motivate researchers and facilitate their research in this area. A regularly updated project page can be found at //github.com/tinatiansjz/hmr-survey.

圖 · 鏈路預測 · Networking · 表示學習 · 學成 ·

2019 年 6 月 15 日

Dynamic Graph Representation Learning via Self-Attention Networks

Aravind Sankar,Yanhong Wu,Liang Gou,Wei Zhang,Hao Yang

Learning latent representations of nodes in graphs is an important and ubiquitous task with widespread applications such as link prediction, node classification, and graph visualization. Previous methods on graph representation learning mainly focus on static graphs, however, many real-world graphs are dynamic and evolve over time. In this paper, we present Dynamic Self-Attention Network (DySAT), a novel neural architecture that operates on dynamic graphs and learns node representations that capture both structural properties and temporal evolutionary patterns. Specifically, DySAT computes node representations by jointly employing self-attention layers along two dimensions: structural neighborhood and temporal dynamics. We conduct link prediction experiments on two classes of graphs: communication networks and bipartite rating networks. Our experimental results show that DySAT has a significant performance gain over several different state-of-the-art graph embedding baselines.

可理解性 · GAN · GANs · Better · 生成式對抗網絡 ·

2018 年 12 月 8 日

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

David Bau,Jun-Yan Zhu,Hendrik Strobelt,Bolei Zhou,Joshua B. Tenenbaum,William T. Freeman,Antonio Torralba

from arxiv, 18 pages, 19 figures

Generative Adversarial Networks (GANs) have recently achieved impressive results for many real-world applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, they have not been well visualized or understood. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this work, we present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level. We first identify a group of interpretable units that are closely related to object concepts using a segmentation-based network dissection method. Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. We examine the contextual relationship between these units and their surroundings by inserting the discovered object concepts into new images. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in a scene. We provide open source interpretation tools to help researchers and practitioners better understand their GAN models.

屬性空間 · 多樣性 · Pair · MoDELS · 訓練數據 ·

2018 年 8 月 2 日

Diverse Image-to-Image Translation via Disentangled Representations

Hsin-Ying Lee,Hung-Yu Tseng,Jia-Bin Huang,Maneesh Kumar Singh,Ming-Hsuan Yang

from arxiv, ECCV 2018 (Oral). Project page: //vllab.ucmerced.edu/hylee/DRIT/ Code: //github.com/HsinYingLee/DRIT/

Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for many applications: 1) the lack of aligned training pairs and 2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Our model takes the encoded content features extracted from a given input and the attribute vectors sampled from the attribute space to produce diverse outputs at test time. To handle unpaired training data, we introduce a novel cross-cycle consistency loss based on disentangled representations. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks without paired training data. For quantitative comparisons, we measure realism with user study and diversity with a perceptual distance metric. We apply the proposed model to domain adaptation and show competitive performance when compared to the state-of-the-art on the MNIST-M and the LineMod datasets.