夏娃韩剧电视剧在剧免费韩剧TV,狼友视频首页,亚洲WW无码专区在线观看

Model merging (e.g., via interpolation or task arithmetic) fuses multiple models trained on different tasks to generate a multi-task solution. The technique has been proven successful in previous studies, where the models are trained on similar tasks and with the same initialization. In this paper, we expand on this concept to a multimodal setup by merging transformers trained on different modalities. Furthermore, we conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture to create a parameter-efficient modality-agnostic architecture. Through comprehensive experiments, we systematically investigate the key factors impacting model performance after merging, including initialization, merging mechanisms, and model architectures. Our analysis leads to an effective training recipe for matching the performance of the modality-agnostic baseline (i.e. pre-trained from scratch) via model merging. Our code is available at: //github.com/ylsung/vl-merging

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 語言模型化 · MoDELS · Vision · Performance ·

2023 年 6 月 13 日

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Tao Gong,Chengqi Lyu,Shilong Zhang,Yudong Wang,Miao Zheng,Qian Zhao,Kuikun Liu,Wenwei Zhang,Ping Luo,Kai Chen

from arxiv, 10 pages, 8 figures

We present a vision and language model named MultiModal-GPT to conduct multi-round dialogue with humans. MultiModal-GPT can follow various instructions from humans, such as generating a detailed caption, counting the number of interested objects, and answering general questions from users. MultiModal-GPT is parameter-efficiently fine-tuned from OpenFlamingo, with Low-rank Adapter (LoRA) added both in the cross-attention part and the self-attention part of the language model. We first construct instruction templates with vision and language data for multi-modality instruction tuning to make the model understand and follow human instructions. We find the quality of training data is vital for the dialogue performance, where few data containing short answers can lead the model to respond shortly to any instructions. To further enhance the ability to chat with humans of the MultiModal-GPT, we utilize language-only instruction-following data to train the MultiModal-GPT jointly. The joint training of language-only and visual-language instructions with the \emph{same} instruction template effectively improves dialogue performance. Various demos show the ability of continuous dialogue of MultiModal-GPT with humans. Code, dataset, and demo are at //github.com/open-mmlab/Multimodal-GPT

FT · Performer · Continuity · 數據增強 · Analysis ·

2023 年 6 月 13 日

Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

Zhengxiang Shi,Aldo Lipani

from arxiv, Accepted at ESANN 2023

In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has been a topic of ongoing debate. In this study, we evaluate the effectiveness of three different FT methods in conjugation with back-translation across an array of 7 diverse NLP tasks, including classification and regression types, covering single-sentence and sentence-pair tasks. Contrary to prior assumptions that DA does not contribute to the enhancement of LMs' FT performance, our findings reveal that continued pre-training on augmented data can effectively improve the FT performance of the downstream tasks. In the most favourable case, continued pre-training improves the performance of FT by more than 10% in the few-shot learning setting. Our finding highlights the potential of DA as a powerful tool for bolstering LMs' performance.

模型選擇 · 估計/估計量 · MoDELS · Analysis · Extensibility ·

2023 年 6 月 13 日

Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation

Divyat Mahajan,Ioannis Mitliagkas,Brady Neal,Vasilis Syrgkanis

from arxiv, Preprint. Under Review

We study the problem of model selection in causal inference, specifically for the case of conditional average treatment effect (CATE) estimation under binary treatments. Unlike model selection in machine learning, there is no perfect analogue of cross-validation as we do not observe the counterfactual potential outcome for any data point. Towards this, there have been a variety of proxy metrics proposed in the literature, that depend on auxiliary nuisance models estimated from the observed data (propensity score model, outcome regression model). However, the effectiveness of these metrics has only been studied on synthetic datasets as we can access the counterfactual data for them. We conduct an extensive empirical analysis to judge the performance of these metrics introduced in the literature, and novel ones introduced in this work, where we utilize the latest advances in generative modeling to incorporate multiple realistic datasets. Our analysis suggests novel model selection strategies based on careful hyperparameter tuning of CATE estimators and causal ensembling.

MoDELS · 生成模型 · Automator · Extensibility · SimPLe ·

2023 年 6 月 12 日

MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images

Junchen Zhu,Huan Yang,Huiguo He,Wenjing Wang,Zixi Tuo,Wen-Huang Cheng,Lianli Gao,Jingkuan Song,Jianlong Fu

In this paper, we present MovieFactory, a powerful framework to generate cinematic-picture (3072$\times$1280), film-style (multi-scene), and multi-modality (sounding) movies on the demand of natural languages. As the first fully automated movie generation model to the best of our knowledge, our approach empowers users to create captivating movies with smooth transitions using simple text inputs, surpassing existing methods that produce soundless videos limited to a single scene of modest quality. To facilitate this distinctive functionality, we leverage ChatGPT to expand user-provided text into detailed sequential scripts for movie generation. Then we bring scripts to life visually and acoustically through vision generation and audio retrieval. To generate videos, we extend the capabilities of a pretrained text-to-image diffusion model through a two-stage process. Firstly, we employ spatial finetuning to bridge the gap between the pretrained image model and the new video dataset. Subsequently, we introduce temporal learning to capture object motion. In terms of audio, we leverage sophisticated retrieval models to select and align audio elements that correspond to the plot and visual content of the movie. Extensive experiments demonstrate that our MovieFactory produces movies with realistic visuals, diverse scenes, and seamlessly fitting audio, offering users a novel and immersive experience. Generated samples can be found in YouTube or Bilibili (1080P).

MoDELS · Less · Extensibility · 縮放 · 自然語言處理 ·

2023 年 6 月 11 日

Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability

Jiacheng Ye,Xijia Tao,Lingpeng Kong

Multilingual transfer ability, which reflects how well the models fine-tuned on one source language can be applied to other languages, has been well studied in multilingual pre-trained models (e.g., BLOOM). However, such ability has not been investigated for English-centric models (e.g., LLaMA). To fill this gap, we study the following research questions. First, does multilingual transfer ability exist in English-centric models and how does it compare with multilingual pretrained models? Second, does it only appears when English is the source language for the English-centric model? Third, how does it vary in different tasks? We take multilingual reasoning ability as our focus and conduct extensive experiments across four types of reasoning tasks. We find that the multilingual pretrained model does not always outperform an English-centric model. Furthermore, English appears to be a less suitable source language, and the choice of source language becomes less important when the English-centric model scales up. In addition, different types of tasks exhibit different multilingual transfer abilities. These findings demonstrate that English-centric models not only possess multilingual transfer ability but may even surpass the transferability of multilingual pretrained models if well-trained. By showing the strength and weaknesses, the experiments also provide valuable insights into enhancing multilingual reasoning abilities for the English-centric models.

量子機器學習 · Machine Learning · Learning · 酉矩陣 · Taxonomy ·

2023 年 6 月 10 日

An Empirical Study of Bugs in Quantum Machine Learning Frameworks

Pengzhan Zhao,Xiongfei Wu,Junjie Luo,Zhuo Li,Jianjun Zhao

from arxiv, This paper will be appeared in the proceedings of the 2023 IEEE International Conference on Quantum Software (QSW 2023), July 2-8, 2023

Quantum computing has emerged as a promising domain for the machine learning (ML) area, offering significant computational advantages over classical counterparts. With the growing interest in quantum machine learning (QML), ensuring the correctness and robustness of software platforms to develop such QML programs is critical. A necessary step for ensuring the reliability of such platforms is to understand the bugs they typically suffer from. To address this need, this paper presents the first comprehensive study of bugs in QML frameworks. We inspect 391 real-world bugs collected from 22 open-source repositories of nine popular QML frameworks. We find that 1) 28% of the bugs are quantum-specific, such as erroneous unitary matrix implementation, calling for dedicated approaches to find and prevent them; 2) We manually distilled a taxonomy of five symptoms and nine root cause of bugs in QML platforms; 3) We summarized four critical challenges for QML framework developers. The study results provide researchers with insights into how to ensure QML framework quality and present several actionable suggestions for QML framework developers to improve their code quality.

變換 · Better · Vision · Attention · MoDELS ·

2023 年 6 月 9 日

ViT-CX: Causal Explanation of Vision Transformers

Weiyan Xie,Xiao-Hui Li,Caleb Chen Cao,Nevin L. Zhang

from arxiv, IJCAI2023 Camera-ready

Despite the popularity of Vision Transformers (ViTs) and eXplainable AI (XAI), only a few explanation methods have been designed specially for ViTs thus far. They mostly use attention weights of the [CLS] token on patch embeddings and often produce unsatisfactory saliency maps. This paper proposes a novel method for explaining ViTs called ViT-CX. It is based on patch embeddings, rather than attentions paid to them, and their causal impacts on the model output. Other characteristics of ViTs such as causal overdetermination are also considered in the design of ViT-CX. The empirical results show that ViT-CX produces more meaningful saliency maps and does a better job revealing all important evidence for the predictions than previous methods. The explanation generated by ViT-CX also shows significantly better faithfulness to the model. The codes and appendix are available at //github.com/vaynexie/CausalX-ViT.

MoDELS · 模型平均 · 泛函 · 線性的 · Extensibility ·

2023 年 6 月 9 日

Model Averaging by Cross-validation for Partially Linear Functional Additive Models

Shishi Liu,Jingxiao Zhang

In this paper, we propose a model averaging approach for addressing model uncertainty in the context of partial linear functional additive models. These models are designed to describe the relation between a response and mixed-types of predictors by incorporating both the parametric effect of scalar variables and the additive effect of a functional variable. The proposed model averaging scheme assigns weights to candidate models based on the minimization of a multi-fold cross-validation criterion. Furthermore, we establish the asymptotic optimality of the resulting estimator in terms of achieving the lowest possible square prediction error loss under model misspecification. Extensive simulation studies and an application to a near infrared spectra dataset are presented to support and illustrate our method.

MoDELS · DDPM · 生成模型 · Processing（編程語言） · Vision ·

2022 年 9 月 6 日

A Survey on Generative Diffusion Model

Hanqun Cao,Cheng Tan,Zhangyang Gao,Guangyong Chen,Pheng-Ann Heng,Stan Z. Li

Deep learning shows great potential in generation tasks thanks to deep latent representation. Generative models are classes of models that can generate observations randomly with respect to certain implied parameters. Recently, the diffusion Model becomes a raising class of generative models by virtue of its power-generating ability. Nowadays, great achievements have been reached. More applications except for computer vision, speech generation, bioinformatics, and natural language processing are to be explored in this field. However, the diffusion model has its natural drawback of a slow generation process, leading to many enhanced works. This survey makes a summary of the field of the diffusion model. We firstly state the main problem with two landmark works - DDPM and DSM. Then, we present a diverse range of advanced techniques to speed up the diffusion models - training schedule, training-free sampling, mixed-modeling, and score & diffusion unification. Regarding existing models, we also provide a benchmark of FID score, IS, and NLL according to specific NFE. Moreover, applications with diffusion models are introduced including computer vision, sequence modeling, audio, and AI for science. Finally, there is a summarization of this field together with limitations & further directions.

MoDELS · 生成模型 · Processing（編程語言） · Taxonomy · Signal Processing ·

2022 年 9 月 2 日

Diffusion Models: A Comprehensive Survey of Methods and Applications

Ling Yang,Zhilong Zhang,Shenda Hong

from arxiv, 23 pages

Diffusion models are a class of deep generative models that have shown impressive results on various tasks with dense theoretical founding. Although diffusion models have achieved impressive quality and diversity of sample synthesis than other state-of-the-art models, they still suffer from costly sampling procedure and sub-optimal likelihood estimation. Recent studies have shown great enthusiasm on improving the performance of diffusion model. In this article, we present a first comprehensive review of existing variants of the diffusion models. Specifically, we provide a first taxonomy of diffusion models and categorize them variants to three types, namely sampling-acceleration enhancement, likelihood-maximization enhancement and data-generalization enhancement. We also introduce in detail other five generative models (i.e., variational autoencoders, generative adversarial networks, normalizing flow, autoregressive models, and energy-based models), and clarify the connections between diffusion models and these generative models. Then we make a thorough investigation into the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification. Furthermore, we propose new perspectives pertaining to the development of this generative model.