亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We present FashionComposer for compositional fashion image generation. Unlike previous methods, FashionComposer is highly flexible. It takes multi-modal input (i.e., text prompt, parametric human model, garment image, and face image) and supports personalizing the appearance, pose, and figure of the human and assigning multiple garments in one pass. To achieve this, we first develop a universal framework capable of handling diverse input modalities. We construct scaled training data to enhance the model's robust compositional capabilities. To accommodate multiple reference images (garments and faces) seamlessly, we organize these references in a single image as an "asset library" and employ a reference UNet to extract appearance features. To inject the appearance features into the correct pixels in the generated result, we propose subject-binding attention. It binds the appearance features from different "assets" with the corresponding text features. In this way, the model could understand each asset according to their semantics, supporting arbitrary numbers and types of reference images. As a comprehensive solution, FashionComposer also supports many other applications like human album generation, diverse virtual try-on tasks, etc.

相關內容

ACM SIGACCESS Conference on Computers and Accessibility是為殘疾人和老年人提供與計算機相關的設計、評估、使用和教育研究的首要論壇。我們歡迎提交原始的高質量的有關計算和可訪問性的主題。今年,ASSETS首次將其范圍擴大到包括關于計算機無障礙教育相關主題的原創高質量研究。官網鏈接: · Performance · Performer · Processing(編程語言) · 語言模型化 ·
2024 年 12 月 27 日
 DeepSeek-AI,Aixin Liu,Bei Feng,Bing Xue,Bingxuan Wang,Bochao Wu,Chengda Lu,Chenggang Zhao,Chengqi Deng,Chenyu Zhang,Chong Ruan,Damai Dai,Daya Guo,Dejian Yang,Deli Chen,Dongjie Ji,Erhang Li,Fangyun Lin,Fucong Dai,Fuli Luo,Guangbo Hao,Guanting Chen,Guowei Li,H. Zhang,Han Bao,Hanwei Xu,Haocheng Wang,Haowei Zhang,Honghui Ding,Huajian Xin,Huazuo Gao,Hui Li,Hui Qu,J. L. Cai,Jian Liang,Jianzhong Guo,Jiaqi Ni,Jiashi Li,Jiawei Wang,Jin Chen,Jingchang Chen,Jingyang Yuan,Junjie Qiu,Junlong Li,Junxiao Song,Kai Dong,Kai Hu,Kaige Gao,Kang Guan,Kexin Huang,Kuai Yu,Lean Wang,Lecong Zhang,Lei Xu,Leyi Xia,Liang Zhao,Litong Wang,Liyue Zhang,Meng Li,Miaojun Wang,Mingchuan Zhang,Minghua Zhang,Minghui Tang,Mingming Li,Ning Tian,Panpan Huang,Peiyi Wang,Peng Zhang,Qiancheng Wang,Qihao Zhu,Qinyu Chen,Qiushi Du,R. J. Chen,R. L. Jin,Ruiqi Ge,Ruisong Zhang,Ruizhe Pan,Runji Wang,Runxin Xu,Ruoyu Zhang,Ruyi Chen,S. S. Li,Shanghao Lu,Shangyan Zhou,Shanhuang Chen,Shaoqing Wu,Shengfeng Ye,Shengfeng Ye,Shirong Ma,Shiyu Wang,Shuang Zhou,Shuiping Yu,Shunfeng Zhou,Shuting Pan,T. Wang,Tao Yun,Tian Pei,Tianyu Sun,W. L. Xiao,Wangding Zeng,Wanjia Zhao,Wei An,Wen Liu,Wenfeng Liang,Wenjun Gao,Wenqin Yu,Wentao Zhang,X. Q. Li,Xiangyue Jin,Xianzu Wang,Xiao Bi,Xiaodong Liu,Xiaohan Wang,Xiaojin Shen,Xiaokang Chen,Xiaokang Zhang,Xiaosha Chen,Xiaotao Nie,Xiaowen Sun,Xiaoxiang Wang,Xin Cheng,Xin Liu,Xin Xie,Xingchao Liu,Xingkai Yu,Xinnan Song,Xinxia Shan,Xinyi Zhou,Xinyu Yang,Xinyuan Li,Xuecheng Su,Xuheng Lin,Y. K. Li,Y. Q. Wang,Y. X. Wei,Y. X. Zhu,Yang Zhang,Yanhong Xu,Yanhong Xu,Yanping Huang,Yao Li,Yao Zhao,Yaofeng Sun,Yaohui Li,Yaohui Wang,Yi Yu,Yi Zheng,Yichao Zhang,Yifan Shi,Yiliang Xiong,Ying He,Ying Tang,Yishi Piao,Yisong Wang,Yixuan Tan,Yiyang Ma,Yiyuan Liu,Yongqiang Guo,Yu Wu,Yuan Ou,Yuchen Zhu,Yuduan Wang,Yue Gong,Yuheng Zou,Yujia He,Yukun Zha,Yunfan Xiong,Yunxian Ma,Yuting Yan,Yuxiang Luo,Yuxiang You,Yuxuan Liu,Yuyang Zhou,Z. F. Wu,Z. Z. Ren,Zehui Ren,Zhangli Sha,Zhe Fu,Zhean Xu,Zhen Huang,Zhen Zhang,Zhenda Xie,Zhengyan Zhang,Zhewen Hao,Zhibin Gou,Zhicheng Ma,Zhigang Yan,Zhihong Shao,Zhipeng Xu,Zhiyu Wu,Zhongyu Zhang,Zhuoshu Li,Zihui Gu,Zijia Zhu,Zijun Liu,Zilin Li,Ziwei Xie,Ziyang Song,Ziyi Gao,Zizheng Pan
DeepSeek-AI,Aixin Liu,Bei Feng,Bing Xue,Bingxuan Wang,Bochao Wu,Chengda Lu,Chenggang Zhao,Chengqi Deng,Chenyu Zhang,Chong Ruan,Damai Dai,Daya Guo,Dejian Yang,Deli Chen,Dongjie Ji,Erhang Li,Fangyun Lin,Fucong Dai,Fuli Luo,Guangbo Hao,Guanting Chen,Guowei Li,H. Zhang,Han Bao,Hanwei Xu,Haocheng Wang,Haowei Zhang,Honghui Ding,Huajian Xin,Huazuo Gao,Hui Li,Hui Qu,J. L. Cai,Jian Liang,Jianzhong Guo,Jiaqi Ni,Jiashi Li,Jiawei Wang,Jin Chen,Jingchang Chen,Jingyang Yuan,Junjie Qiu,Junlong Li,Junxiao Song,Kai Dong,Kai Hu,Kaige Gao,Kang Guan,Kexin Huang,Kuai Yu,Lean Wang,Lecong Zhang,Lei Xu,Leyi Xia,Liang Zhao,Litong Wang,Liyue Zhang,Meng Li,Miaojun Wang,Mingchuan Zhang,Minghua Zhang,Minghui Tang,Mingming Li,Ning Tian,Panpan Huang,Peiyi Wang,Peng Zhang,Qiancheng Wang,Qihao Zhu,Qinyu Chen,Qiushi Du,R. J. Chen,R. L. Jin,Ruiqi Ge,Ruisong Zhang,Ruizhe Pan,Runji Wang,Runxin Xu,Ruoyu Zhang,Ruyi Chen,S. S. Li,Shanghao Lu,Shangyan Zhou,Shanhuang Chen,Shaoqing Wu,Shengfeng Ye,Shengfeng Ye,Shirong Ma,Shiyu Wang,Shuang Zhou,Shuiping Yu,Shunfeng Zhou,Shuting Pan,T. Wang,Tao Yun,Tian Pei,Tianyu Sun,W. L. Xiao,Wangding Zeng,Wanjia Zhao,Wei An,Wen Liu,Wenfeng Liang,Wenjun Gao,Wenqin Yu,Wentao Zhang,X. Q. Li,Xiangyue Jin,Xianzu Wang,Xiao Bi,Xiaodong Liu,Xiaohan Wang,Xiaojin Shen,Xiaokang Chen,Xiaokang Zhang,Xiaosha Chen,Xiaotao Nie,Xiaowen Sun,Xiaoxiang Wang,Xin Cheng,Xin Liu,Xin Xie,Xingchao Liu,Xingkai Yu,Xinnan Song,Xinxia Shan,Xinyi Zhou,Xinyu Yang,Xinyuan Li,Xuecheng Su,Xuheng Lin,Y. K. Li,Y. Q. Wang,Y. X. Wei,Y. X. Zhu,Yang Zhang,Yanhong Xu,Yanhong Xu,Yanping Huang,Yao Li,Yao Zhao,Yaofeng Sun,Yaohui Li,Yaohui Wang,Yi Yu,Yi Zheng,Yichao Zhang,Yifan Shi,Yiliang Xiong,Ying He,Ying Tang,Yishi Piao,Yisong Wang,Yixuan Tan,Yiyang Ma,Yiyuan Liu,Yongqiang Guo,Yu Wu,Yuan Ou,Yuchen Zhu,Yuduan Wang,Yue Gong,Yuheng Zou,Yujia He,Yukun Zha,Yunfan Xiong,Yunxian Ma,Yuting Yan,Yuxiang Luo,Yuxiang You,Yuxuan Liu,Yuyang Zhou,Z. F. Wu,Z. Z. Ren,Zehui Ren,Zhangli Sha,Zhe Fu,Zhean Xu,Zhen Huang,Zhen Zhang,Zhenda Xie,Zhengyan Zhang,Zhewen Hao,Zhibin Gou,Zhicheng Ma,Zhigang Yan,Zhihong Shao,Zhipeng Xu,Zhiyu Wu,Zhongyu Zhang,Zhuoshu Li,Zihui Gu,Zijia Zhu,Zijun Liu,Zilin Li,Ziwei Xie,Ziyang Song,Ziyi Gao,Zizheng Pan

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at //github.com/deepseek-ai/DeepSeek-V3.

Data plays a fundamental role in the training of Large Language Models (LLMs). Effective data management, particularly in the formulation of a well-suited training dataset, holds significance for enhancing model performance and improving training efficiency during pretraining and supervised fine-tuning phases. Despite the considerable importance of data management, the current research community still falls short in providing a systematic analysis of the rationale behind management strategy selection, its consequential effects, methodologies for evaluating curated datasets, and the ongoing pursuit of improved strategies. Consequently, the exploration of data management has attracted more and more attention among the research community. This survey provides a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs, covering various noteworthy aspects of data management strategy design: data quantity, data quality, domain/task composition, etc. Looking toward the future, we extrapolate existing challenges and outline promising directions for development in this field. Therefore, this survey serves as a guiding resource for practitioners aspiring to construct powerful LLMs through effective data management practices. The collection of the latest papers is available at //github.com/ZigeW/data_management_LLM.

The rapid advances in Vision Transformer (ViT) refresh the state-of-the-art performances in various vision tasks, overshadowing the conventional CNN-based models. This ignites a few recent striking-back research in the CNN world showing that pure CNN models can achieve as good performance as ViT models when carefully tuned. While encouraging, designing such high-performance CNN models is challenging, requiring non-trivial prior knowledge of network design. To this end, a novel framework termed Mathematical Architecture Design for Deep CNN (DeepMAD) is proposed to design high-performance CNN models in a principled way. In DeepMAD, a CNN network is modeled as an information processing system whose expressiveness and effectiveness can be analytically formulated by their structural parameters. Then a constrained mathematical programming (MP) problem is proposed to optimize these structural parameters. The MP problem can be easily solved by off-the-shelf MP solvers on CPUs with a small memory footprint. In addition, DeepMAD is a pure mathematical framework: no GPU or training data is required during network design. The superiority of DeepMAD is validated on multiple large-scale computer vision benchmark datasets. Notably on ImageNet-1k, only using conventional convolutional layers, DeepMAD achieves 0.7% and 1.5% higher top-1 accuracy than ConvNeXt and Swin on Tiny level, and 0.8% and 0.9% higher on Small level.

Diffusion models (DMs) have shown great potential for high-quality image synthesis. However, when it comes to producing images with complex scenes, how to properly describe both image global structures and object details remains a challenging task. In this paper, we present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis. Our model decomposes an input image into scale-dependent vector quantized features, followed by a coarse-to-fine gating for producing image output. During the above multi-scale representation learning stage, additional input conditions like text, scene graph, or image layout can be further exploited. Thus, Frido can be also applied for conditional or cross-modality image synthesis. We conduct extensive experiments over various unconditioned and conditional image generation tasks, ranging from text-to-image synthesis, layout-to-image, scene-graph-to-image, to label-to-image. More specifically, we achieved state-of-the-art FID scores on five benchmarks, namely layout-to-image on COCO and OpenImages, scene-graph-to-image on COCO and Visual Genome, and label-to-image on COCO. Code is available at //github.com/davidhalladay/Frido.

Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.

We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CoDEx, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CoDEx dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CoDEx for five extensively tuned embedding models. Finally, we differentiate CoDEx from the popular FB15K-237 knowledge graph completion dataset by showing that CoDEx covers more diverse and interpretable content, and is a more difficult link prediction benchmark. Data, code, and pretrained models are available at //bit.ly/2EPbrJs.

We present CURL: Contrastive Unsupervised Representations for Reinforcement Learning. CURL extracts high-level features from raw pixels using contrastive learning and performs off-policy control on top of the extracted features. CURL outperforms prior pixel-based methods, both model-based and model-free, on complex tasks in the DeepMind Control Suite and Atari Games showing 1.9x and 1.6x performance gains at the 100K environment and interaction steps benchmarks respectively. On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency and performance of methods that use state-based features.

We present a new method to learn video representations from large-scale unlabeled video data. Ideally, this representation will be generic and transferable, directly usable for new tasks such as action recognition and zero or few-shot learning. We formulate unsupervised representation learning as a multi-modal, multi-task learning problem, where the representations are shared across different modalities via distillation. Further, we introduce the concept of loss function evolution by using an evolutionary search algorithm to automatically find optimal combination of loss functions capturing many (self-supervised) tasks and modalities. Thirdly, we propose an unsupervised representation evaluation metric using distribution matching to a large unlabeled dataset as a prior constraint, based on Zipf's law. This unsupervised constraint, which is not guided by any labeling, produces similar results to weakly-supervised, task-specific ones. The proposed unsupervised representation learning results in a single RGB network and outperforms previous methods. Notably, it is also more effective than several label-based methods (e.g., ImageNet), with the exception of large, fully labeled video datasets.

Deep Learning has enabled remarkable progress over the last years on a variety of tasks, such as image recognition, speech recognition, and machine translation. One crucial aspect for this progress are novel neural architectures. Currently employed architectures have mostly been developed manually by human experts, which is a time-consuming and error-prone process. Because of this, there is growing interest in automated neural architecture search methods. We provide an overview of existing work in this field of research and categorize them according to three dimensions: search space, search strategy, and performance estimation strategy.

We present Generative Adversarial Capsule Network (CapsuleGAN), a framework that uses capsule networks (CapsNets) instead of the standard convolutional neural networks (CNNs) as discriminators within the generative adversarial network (GAN) setting, while modeling image data. We provide guidelines for designing CapsNet discriminators and the updated GAN objective function, which incorporates the CapsNet margin loss, for training CapsuleGAN models. We show that CapsuleGAN outperforms convolutional-GAN at modeling image data distribution on the MNIST dataset of handwritten digits, evaluated on the generative adversarial metric and at semi-supervised image classification.

北京阿比特科技有限公司