东京热加勒比中文无码,国产精品午夜福利鲁丝片在线,亚洲香蕉日日精品一区2区三区,无码AV免费永久免费永久专区,欧美精品久久久久久精品爆乳

Recent work has showcased the significant potential of diffusion models in pose-guided person image synthesis. However, owing to the inconsistency in pose between the source and target images, synthesizing an image with a distinct pose, relying exclusively on the source image and target pose information, remains a formidable challenge. This paper presents Progressive Conditional Diffusion Models (PCDMs) that incrementally bridge the gap between person images under the target and source poses through three stages. Specifically, in the first stage, we design a simple prior conditional diffusion model that predicts the global features of the target image by mining the global alignment relationship between pose coordinates and image appearance. Then, the second stage establishes a dense correspondence between the source and target images using the global features from the previous stage, and an inpainting conditional diffusion model is proposed to further align and enhance the contextual features, generating a coarse-grained person image. In the third stage, we propose a refining conditional diffusion model to utilize the coarsely generated image from the previous stage as a condition, achieving texture restoration and enhancing fine-detail consistency. The three-stage PCDMs work progressively to generate the final high-quality and high-fidelity synthesized image. Both qualitative and quantitative results demonstrate the consistency and photorealism of our proposed PCDMs under challenging scenarios.The code and model will be available at //github.com/muzishen/PCDMs.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 數據集 · 值域 · 回合 · Networks ·

2023 年 12 月 1 日

HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios

HyunJun Jung,Guangyao Zhai,Shun-Cheng Wu,Patrick Ruhkamp,Hannah Schieber,Giulia Rizzoli,Pengyuan Wang,Hongcheng Zhao,Lorenzo Garattoni,Sven Meier,Daniel Roth,Nassir Navab,Benjamin Busam

Estimating 6D object poses is a major challenge in 3D computer vision. Building on successful instance-level approaches, research is shifting towards category-level pose estimation for practical applications. Current category-level datasets, however, fall short in annotation quality and pose variety. Addressing this, we introduce HouseCat6D, a new category-level 6D pose dataset. It features 1) multi-modality with Polarimetric RGB and Depth (RGBD+P), 2) encompasses 194 diverse objects across 10 household categories, including two photometrically challenging ones, and 3) provides high-quality pose annotations with an error range of only 1.35 mm to 1.74 mm. The dataset also includes 4) 41 large-scale scenes with comprehensive viewpoint and occlusion coverage, 5) a checkerboard-free environment, and 6) dense 6D parallel-jaw robotic grasp annotations. Additionally, we present benchmark results for leading category-level pose estimation networks.

Learning · 三維重建 · 全 · Processing（編程語言） · 多樣性 ·

2023 年 11 月 30 日

Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data

Yu Deng,Duomin Wang,Xiaohang Ren,Xingyu Chen,Baoyuan Wang

from arxiv, Project page: //yudeng.github.io/Portrait4D/

Existing one-shot 4D head synthesis methods usually learn from monocular videos with the aid of 3DMM reconstruction, yet the latter is evenly challenging which restricts them from reasonable 4D head synthesis. We present a method to learn one-shot 4D head synthesis via large-scale synthetic data. The key is to first learn a part-wise 4D generative model from monocular images via adversarial learning, to synthesize multi-view images of diverse identities and full motions as training data; then leverage a transformer-based animatable triplane reconstructor to learn 4D head reconstruction using the synthetic data. A novel learning strategy is enforced to enhance the generalizability to real images by disentangling the learning process of 3D reconstruction and reenactment. Experiments demonstrate our superiority over the prior art.

統計量 · MoDELS · 相同 · 相互獨立的 · 樣例 ·

2023 年 11 月 30 日

Unconditionally Secure Commitments with Quantum Auxiliary Inputs

Tomoyuki Morimae,Barak Nehoran,Takashi Yamakawa

from arxiv, 42 pages

We show the following unconditional results on quantum commitments in two related yet different models: 1. We revisit the notion of quantum auxiliary-input commitments introduced by Chailloux, Kerenidis, and Rosgen (Comput. Complex. 2016) where both the committer and receiver take the same quantum state, which is determined by the security parameter, as quantum auxiliary inputs. We show that computationally-hiding and statistically-binding quantum auxiliary-input commitments exist unconditionally, i.e., without relying on any unproven assumption, while Chailloux et al. assumed a complexity-theoretic assumption, ${\bf QIP}\not\subseteq{\bf QMA}$. On the other hand, we observe that achieving both statistical hiding and statistical binding at the same time is impossible even in the quantum auxiliary-input setting. To the best of our knowledge, this is the first example of unconditionally proving computational security of any form of (classical or quantum) commitments for which statistical security is impossible. As intermediate steps toward our construction, we introduce and unconditionally construct post-quantum sparse pseudorandom distributions and quantum auxiliary-input EFI pairs which may be of independent interest. 2. We introduce a new model which we call the common reference quantum state (CRQS) model where both the committer and receiver take the same quantum state that is randomly sampled by an efficient setup algorithm. We unconditionally prove that there exist statistically hiding and statistically binding commitments in the CRQS model, circumventing the impossibility in the plain model. We also discuss their applications to zero-knowledge proofs, oblivious transfers, and multi-party computations.

穩健性 · MoDELS · Performer · state-of-the-art · ML ·

2023 年 11 月 30 日

Towards Calibrated Robust Fine-Tuning of Vision-Language Models

Changdae Oh,Mijoo Kim,Hyesu Lim,Junhyeok Park,Euiseog Jeong,Zhi-Qi Cheng,Kyungwoo Song

from arxiv, NeurIPS 2023 Workshop on Distribution Shifts (DistShift), camera-ready

While fine-tuning unlocks the potential of a pre-trained model for a specific task, it compromises the model's ability to generalize to out-of-distribution (OOD) datasets. To mitigate this, robust fine-tuning aims to ensure performance on OOD datasets as well as on an in-distribution (ID) dataset for which the model is being tuned. However, another criterion for reliable machine learning (ML), confidence calibration, has been overlooked despite its increasing demand for real-world high-stakes ML applications (e.g., autonomous driving and medical diagnosis). For the first time, we raise concerns about the calibration of fine-tuned vision-language models (VLMs) under distribution shift by showing that naive fine-tuning and even state-of-the-art robust fine-tuning methods hurt the calibration of pre-trained VLMs, especially on OOD datasets. To address this issue, we provide a simple approach, called calibrated robust fine-tuning (CaRot), that incentivizes calibration and robustness on both ID and OOD datasets. Empirical results on ImageNet-1K distribution shift evaluation verify the effectiveness of our method.

估計/估計量 · 可約的 · Graph Transformer · MoDELS · state-of-the-art ·

2023 年 11 月 29 日

Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation

Or Hirschorn,Shai Avidan

Traditional 2D pose estimation models are limited by their category-specific design, making them suitable only for predefined object categories. This restriction becomes particularly challenging when dealing with novel objects due to the lack of relevant training data. To address this limitation, category-agnostic pose estimation (CAPE) was introduced. CAPE aims to enable keypoint localization for arbitrary object categories using a single model, requiring minimal support images with annotated keypoints. This approach not only enables object pose generation based on arbitrary keypoint definitions but also significantly reduces the associated costs, paving the way for versatile and adaptable pose estimation applications. We present a novel approach to CAPE that leverages the inherent geometrical relations between keypoints through a newly designed Graph Transformer Decoder. By capturing and incorporating this crucial structural information, our method enhances the accuracy of keypoint localization, marking a significant departure from conventional CAPE techniques that treat keypoints as isolated entities. We validate our approach on the MP-100 benchmark, a comprehensive dataset comprising over 20,000 images spanning more than 100 categories. Our method outperforms the prior state-of-the-art by substantial margins, achieving remarkable improvements of 2.16% and 1.82% under 1-shot and 5-shot settings, respectively. Furthermore, our method's end-to-end training demonstrates both scalability and efficiency compared to previous CAPE approaches.

MoDELS · Learning · 可理解性 · 可辨認的 · state-of-the-art ·

2023 年 11 月 29 日

One-Shot Open Affordance Learning with Foundation Models

Gen Li,Deqing Sun,Laura Sevilla-Lara,Varun Jampani

We introduce One-shot Open Affordance Learning (OOAL), where a model is trained with just one example per base object category, but is expected to identify novel objects and affordances. While vision-language models excel at recognizing novel objects and scenes, they often struggle to understand finer levels of granularity such as affordances. To handle this issue, we conduct a comprehensive analysis of existing foundation models, to explore their inherent understanding of affordances and assess the potential for data-limited affordance learning. We then propose a vision-language framework with simple and effective designs that boost the alignment between visual features and affordance text embeddings. Experiments on two affordance segmentation benchmarks show that the proposed method outperforms state-of-the-art models with less than 1% of the full training data, and exhibits reasonable generalization capability on unseen objects and affordances.

可約的 · Networking · 變換 · MoDELS · Backbone ·

2023 年 11 月 29 日

Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling

Huangjie Zheng,Zhendong Wang,Jianbo Yuan,Guanghan Ning,Pengcheng He,Quanzeng You,Hongxia Yang,Mingyuan Zhou

Diffusion models excel at generating photo-realistic images but come with significant computational costs in both training and sampling. While various techniques address these computational challenges, a less-explored issue is designing an efficient and adaptable network backbone for iterative refinement. Current options like U-Net and Vision Transformer often rely on resource-intensive deep networks and lack the flexibility needed for generating images at variable resolutions or with a smaller network than used in training. This study introduces LEGO bricks, which seamlessly integrate Local-feature Enrichment and Global-content Orchestration. These bricks can be stacked to create a test-time reconfigurable diffusion backbone, allowing selective skipping of bricks to reduce sampling costs and generate higher-resolution images than the training data. LEGO bricks enrich local regions with an MLP and transform them using a Transformer block while maintaining a consistent full-resolution image across all bricks. Experimental results demonstrate that LEGO bricks enhance training efficiency, expedite convergence, and facilitate variable-resolution image generation while maintaining strong generative performance. Moreover, LEGO significantly reduces sampling time compared to other methods, establishing it as a valuable enhancement for diffusion models.

AI · Extensibility · INTERACT · 道德化 · Notability ·

2023 年 11 月 29 日

Social AI Improves Well-Being Among Female Young Adults

Ebony Zhang,Xiaoding Lu

The rise of language models like ChatGPT has introduced Social AI as a new form of entertainment, particularly among young adults who engage with AI-powered agents. This paper investigates the effects of these interactions on users' social and mental well-being, a subject that has incited extensive debate among both the public and scholars. Our study involved a survey of 5,260 users of Chai, a Social AI Platform. The findings indicate significant benefits, with notable variations across demographics. Female users, in particular, reported the most substantial improvements: 43.4% strongly agreed that Social AI positively impacted their mental health, exceeding male users by 10.5%. In managing social anxieties, 38.9% of females strongly agreed on a positive impact, compared to 30.0% for males and 27.1% for other genders. Historically, new media and technology have often been met with groundless moral panic, with societal figures raising concerns without substantial evidence of harm. Our research indicates the importance of approaching such claims with caution and emphasizes the necessity of an evidence-based perspective in discussions about the behavioral effects of emerging technologies.

Pyramid · MoDELS · Extensibility · state-of-the-art · Performer ·

2022 年 12 月 1 日

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Wan-Cyuan Fan,Yen-Chun Chen,Dongdong Chen,Yu Cheng,Lu Yuan,Yu-Chiang Frank Wang

from arxiv, AAAI 2023

Diffusion models (DMs) have shown great potential for high-quality image synthesis. However, when it comes to producing images with complex scenes, how to properly describe both image global structures and object details remains a challenging task. In this paper, we present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis. Our model decomposes an input image into scale-dependent vector quantized features, followed by a coarse-to-fine gating for producing image output. During the above multi-scale representation learning stage, additional input conditions like text, scene graph, or image layout can be further exploited. Thus, Frido can be also applied for conditional or cross-modality image synthesis. We conduct extensive experiments over various unconditioned and conditional image generation tasks, ranging from text-to-image synthesis, layout-to-image, scene-graph-to-image, to label-to-image. More specifically, we achieved state-of-the-art FID scores on five benchmarks, namely layout-to-image on COCO and OpenImages, scene-graph-to-image on COCO and Visual Genome, and label-to-image on COCO. Code is available at //github.com/davidhalladay/Frido.

生成模型 · MoDELS · 表示學習 · 學成 · 可辨認的 ·

2021 年 6 月 9 日

Generative Models as a Data Source for Multiview Representation Learning

Ali Jahanian,Xavier Puig,Yonglong Tian,Phillip Isola

Generative models are now capable of producing highly realistic images that look nearly indistinguishable from the data on which they are trained. This raises the question: if we have good enough generative models, do we still need datasets? We investigate this question in the setting of learning general-purpose visual representations from a black-box generative model rather than directly from data. Given an off-the-shelf image generator without any access to its training data, we train representations from the samples output by this generator. We compare several representation learning methods that can be applied to this setting, using the latent space of the generator to generate multiple "views" of the same semantic content. We show that for contrastive methods, this multiview data can naturally be used to identify positive pairs (nearby in latent space) and negative pairs (far apart in latent space). We find that the resulting representations rival those learned directly from real data, but that good performance requires care in the sampling strategy applied and the training method. Generative models can be viewed as a compressed and organized copy of a dataset, and we envision a future where more and more "model zoos" proliferate while datasets become increasingly unwieldy, missing, or private. This paper suggests several techniques for dealing with visual representation learning in such a future. Code is released on our project page: //ali-design.github.io/GenRep/