国产乱伦对白刺激视频,欧美日韩国产在线一区二区观看

Instruction tuning represents a prevalent strategy employed by Multimodal Large Language Models (MLLMs) to align with human instructions and adapt to new tasks. Nevertheless, MLLMs encounter the challenge of adapting to users' evolving knowledge and demands. Therefore, how to retain existing skills while acquiring new knowledge needs to be investigated. In this paper, we present a comprehensive benchmark, namely Continual Instruction tuNing (CoIN), to assess existing MLLMs in the sequential instruction tuning paradigm. CoIN comprises 10 commonly used datasets spanning 8 task categories, ensuring a diverse range of instructions and tasks. Besides, the trained model is evaluated from two aspects: Instruction Following and General Knowledge, which assess the alignment with human intention and knowledge preserved for reasoning, respectively. Experiments on CoIN demonstrate that current powerful MLLMs still suffer catastrophic forgetting, and the failure in intention alignment assumes the main responsibility, instead of the knowledge forgetting. To this end, we introduce MoELoRA to MLLMs which is effective to retain the previous instruction alignment. Experimental results consistently illustrate the forgetting decreased from this method on CoIN.

相關內容

tuning

關注 2

知識 (knowledge) · 圖 · MoDELS · Agent · 知識圖譜 ·

2024 年 4 月 24 日

KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction

Jack Boylan,Shashank Mangla,Dominic Thorn,Demian Gholipour Ghalandari,Parsa Ghaffari,Chris Hokamp

from arxiv, Text2KG 2024, ESWC 2024

This study explores the use of Large Language Models (LLMs) for automatic evaluation of knowledge graph (KG) completion models. Historically, validating information in KGs has been a challenging task, requiring large-scale human annotation at prohibitive cost. With the emergence of general-purpose generative AI and LLMs, it is now plausible that human-in-the-loop validation could be replaced by a generative agent. We introduce a framework for consistency and validation when using generative models to validate knowledge graphs. Our framework is based upon recent open-source developments for structural and semantic validation of LLM outputs, and upon flexible approaches to fact checking and verification, supported by the capacity to reference external knowledge sources of any kind. The design is easy to adapt and extend, and can be used to verify any kind of graph-structured data through a combination of model-intrinsic knowledge, user-supplied context, and agents capable of external knowledge retrieval.

可理解性 · 語言模型化 · MoDELS · Performer · 查準率/準確率 ·

2024 年 4 月 24 日

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

Zihao Wang,Shuyu Li,Tao Zhang,Qi Wang,Pengfei Yu,Jinyang Luo,Yan Liu,Ming Xi,Kejun Zhang

from arxiv, Accepted by International Joint Conference on Artificial Intelligence 2024 (IJCAI 2024)

The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1,000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of annotated data for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music. All data related to the benchmark, along with the scoring code and detailed appendices, have been open-sourced (//github.com/CarlWangChina/MuChin/).

tuning · MoDELS · Performer · 代碼 · 稀疏 ·

2024 年 4 月 23 日

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts

Yifeng Ding,Jiawei Liu,Yuxiang Wei,Terry Yue Zhuo,Lingming Zhang

We introduce XFT, a simple yet powerful training scheme, by simply merging upcycled Mixture-of-Experts (MoE) to unleash the performance limit of instruction-tuned code Large Language Models (LLMs). While vanilla sparse upcycling fails to improve instruction tuning, XFT introduces a shared expert mechanism with a novel routing weight normalization strategy into sparse upcycling, which significantly boosts instruction tuning. After fine-tuning the upcycled MoE model, XFT introduces a learnable model merging mechanism to compile the upcycled MoE model back to a dense model, achieving upcycled MoE-level performance with only dense-model compute. By applying XFT to a 1.3B model, we create a new state-of-the-art tiny code LLM (<3B) with 67.1 and 64.6 pass@1 on HumanEval and HumanEval+ respectively. With the same data and model architecture, XFT improves supervised fine-tuning (SFT) by 13% on HumanEval+, along with consistent improvements from 2% to 13% on MBPP+, MultiPL-E, and DS-1000, demonstrating its generalizability. XFT is fully orthogonal to existing techniques such as Evol-Instruct and OSS-Instruct, opening a new dimension for improving code instruction tuning. Codes are available at //github.com/ise-uiuc/xft .

MoDELS · Guidance · 大學 · Extensibility · ControlNet ·

2024 年 4 月 23 日

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Lingmin Ran,Xiaodong Cun,Jia-Wei Liu,Rui Zhao,Song Zijie,Xintao Wang,Jussi Keppo,Mike Zheng Shou

from arxiv, Project page: //showlab.github.io/X-Adapter/

We introduce X-Adapter, a universal upgrader to enable the pretrained plug-and-play modules (e.g., ControlNet, LoRA) to work directly with the upgraded text-to-image diffusion model (e.g., SDXL) without further retraining. We achieve this goal by training an additional network to control the frozen upgraded model with the new text-image data pairs. In detail, X-Adapter keeps a frozen copy of the old model to preserve the connectors of different plugins. Additionally, X-Adapter adds trainable mapping layers that bridge the decoders from models of different versions for feature remapping. The remapped features will be used as guidance for the upgraded model. To enhance the guidance ability of X-Adapter, we employ a null-text training strategy for the upgraded model. After training, we also introduce a two-stage denoising strategy to align the initial latents of X-Adapter and the upgraded model. Thanks to our strategies, X-Adapter demonstrates universal compatibility with various plugins and also enables plugins of different versions to work together, thereby expanding the functionalities of diffusion community. To verify the effectiveness of the proposed method, we conduct extensive experiments and the results show that X-Adapter may facilitate wider application in the upgraded foundational diffusion model.

AIGC · Prompt · Engineering · INFORMS · 輸出 ·

2024 年 4 月 22 日

Cross-Modal Generative Semantic Communications for Mobile AIGC: Joint Semantic Encoding and Prompt Engineering

Yinqiu Liu,Hongyang Du,Dusit Niyato,Jiawen Kang,Zehui Xiong,Shiwen Mao,Ping Zhang,Xuemin Shen

Employing massive Mobile AI-Generated Content (AIGC) Service Providers (MASPs) with powerful models, high-quality AIGC services can become accessible for resource-constrained end users. However, this advancement, referred to as mobile AIGC, also introduces a significant challenge: users should download large AIGC outputs from the MASPs, leading to substantial bandwidth consumption and potential transmission failures. In this paper, we apply cross-modal Generative Semantic Communications (G-SemCom) in mobile AIGC to overcome wireless bandwidth constraints. Specifically, we utilize a series of cross-modal attention maps to indicate the correlation between user prompts and each part of AIGC outputs. In this way, the MASP can analyze the prompt context and filter the most semantically important content efficiently. Only semantic information is transmitted, with which users can recover the entire AIGC output with high quality while saving mobile bandwidth. Since the transmitted information not only preserves the semantics but also prompts the recovery, we formulate a joint semantic encoding and prompt engineering problem to optimize the bandwidth allocation among users. Particularly, we present a human-perceptual metric named Joint Perpetual Similarity and Quality (JPSQ), which is fused by two learning-based measurements regarding semantic similarity and aesthetic quality, respectively. Furthermore, we develop the Attention-aware Deep Diffusion (ADD) algorithm, which learns attention maps and leverages the diffusion process to enhance the environment exploration ability. Extensive experiments demonstrate that our proposal can reduce the bandwidth consumption of mobile users by 49.4% on average, with almost no perceptual difference in AIGC output quality. Moreover, the ADD algorithm shows superior performance over baseline DRL methods, with 1.74x higher overall reward.

回合 · 相似度 · GROUP · 路徑 · 有向 ·

2024 年 4 月 20 日

PACNav: Enhancing Collective Navigation for UAV Swarms in Communication-Challenged Environments

Afzal Ahmad,Daniel Bonilla Licea,Giuseppe Silano,Tomas Baca,Martin Saska

from arxiv, 2 pages, Accepted for discussion at the workshop session "Breaking Swarm Stereotypes" at ICRA'24 in Yokohama, Japan

This article presents Persistence Administered Collective Navigation (PACNav) as an approach for achieving decentralized collective navigation of Unmanned Aerial Vehicle (UAV) swarms. The technique is inspired by the flocking and collective navigation behavior observed in natural swarms, such as cattle herds, bird flocks, and even large groups of humans. PACNav relies solely on local observations of relative positions of UAVs, making it suitable for large swarms deprived of communication capabilities and external localization systems. We introduce the novel concepts of path persistence and path similarity, which allow each swarm member to analyze the motion of others. PACNav is grounded on two main principles: (1) UAVs with little variation in motion direction exhibit high path persistence and are considered reliable leaders by other UAVs; (2) groups of UAVs that move in a similar direction demonstrate high path similarity, and such groups are assumed to contain a reliable leader. The proposed approach also incorporates a reactive collision avoidance mechanism to prevent collisions with swarm members and environmental obstacles. The method is validated through simulated and real-world experiments conducted in a natural forest.

語言模型化 · 多峰值 · MoDELS · 詞元分析器 · 大語言模型 ·

2024 年 4 月 19 日

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Chuofan Ma,Yi Jiang,Jiannan Wu,Zehuan Yuan,Xiaojuan Qi

We introduce Groma, a Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability. Beyond holistic image understanding, Groma is adept at region-level tasks such as region captioning and visual grounding. Such capabilities are built upon a localized visual tokenization mechanism, where an image input is decomposed into regions of interest and subsequently encoded into region tokens. By integrating region tokens into user instructions and model responses, we seamlessly enable Groma to understand user-specified region inputs and ground its textual output to images. Besides, to enhance the grounded chat ability of Groma, we curate a visually grounded instruction dataset by leveraging the powerful GPT-4V and visual prompting techniques. Compared with MLLMs that rely on the language model or external module for localization, Groma consistently demonstrates superior performances in standard referring and grounding benchmarks, highlighting the advantages of embedding localization into image tokenization. Project page: //groma-mllm.github.io/.

虛擬現實（VR） · 講稿 · VR · 跡 · 可理解性 ·

2024 年 4 月 19 日

AipanVR: A Virtual Reality Experience for Preserving Uttarakhand's Traditional Art Form

Nishant Chaudhary,Mihir Raj,Richik Bhattacharjee,Anmol Srivastava,Rakesh Sah,Pankaj Badoni

from arxiv, Demonstrated at ISMAR 2020

This paper presents a demonstration of the developed prototype showcasing a way to preserve the Intangible Cultural Heritage of Uttarakhand, India. Aipan is a traditional art form practiced in the Kumaon region in the state of Uttarakhand. It is typically used to decorate floors and walls at places of worship or entrances of homes and is considered auspicious to begin any work or event. This art is associated with a great degree of social, cultural as well as religious significance and is passed from generation to generation. However, in the present era of modernization and technological advancements, this art form now stands on the verge of depletion. This study presents a humble attempt to preserve this vanishing art form through the use of Virtual Reality (VR). Ethnographic studies were conducted in Almora, Nainital, and Haldwani regions of Uttarakhand to trace the origins as well as to gain a deeper understanding of this art form. A total of ten (N =10) Aipan designers were interviewed. Several interesting insights are revealed through these studies that show the potential to be incorporated as a VR experience.

圖 · 變換 · 傅立葉變換 · MoDELS · 求逆 ·

2024 年 4 月 18 日

Beyond Spatio-Temporal Representations: Evolving Fourier Transform for Temporal Graphs

Anson Bastos,Kuldeep Singh,Abhishek Nadgeri,Manish Singh,Toyotaro Suzumura

from arxiv, Accepted as a full conference paper in the International Conference on Learning Representations 2024

We present the Evolving Graph Fourier Transform (EFT), the first invertible spectral transform that captures evolving representations on temporal graphs. We motivate our work by the inadequacy of existing methods for capturing the evolving graph spectra, which are also computationally expensive due to the temporal aspect along with the graph vertex domain. We view the problem as an optimization over the Laplacian of the continuous time dynamic graph. Additionally, we propose pseudo-spectrum relaxations that decompose the transformation process, making it highly computationally efficient. The EFT method adeptly captures the evolving graph's structural and positional properties, making it effective for downstream tasks on evolving graphs. Hence, as a reference implementation, we develop a simple neural model induced with EFT for capturing evolving graph spectra. We empirically validate our theoretical findings on a number of large-scale and standard temporal graph benchmarks and demonstrate that our model achieves state-of-the-art performance.

MoDELS · 知識 (knowledge) · SOTA · Learning · 占優策略 ·

2024 年 4 月 18 日

Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization

Swaroop Nath,Tejpalsingh Siledar,Sankara Sri Raghava Ravindra Muddu,Rupasai Rangaraju,Harshad Khadilkar,Pushpak Bhattacharyya,Suman Banerjee,Amey Patil,Sudhanshu Shekhar Singh,Muthusamy Chelliah,Nikesh Garera

from arxiv, 19 pages, 6 figures, 21 tables

Reinforcement Learning from Human Feedback (RLHF) has become a dominating strategy in aligning Language Models (LMs) with human values/goals. The key to the strategy is learning a reward model ($\varphi$), which can reflect the latent reward model of humans. While this strategy has proven effective, the training methodology requires a lot of human preference annotation (usually in the order of tens of thousands) to train $\varphi$. Such a large-scale annotation is justifiable when it's a one-time effort, and the reward model is universally applicable. However, human goals are subjective and depend on the task, requiring task-specific preference annotations, which can be impractical to fulfill. To address this challenge, we propose a novel approach to infuse domain knowledge into $\varphi$, which reduces the amount of preference annotation required ($21\times$), omits Alignment Tax, and provides some interpretability. We validate our approach in E-Commerce Opinion Summarization, with a significant reduction in dataset size (to just $940$ samples) while advancing the SOTA ($\sim4$ point ROUGE-L improvement, $68\%$ of times preferred by humans over SOTA). Our contributions include a novel Reward Modeling technique and two new datasets: PromptOpinSumm (supervised data for Opinion Summarization) and OpinPref (a gold-standard human preference dataset). The proposed methodology opens up avenues for efficient RLHF, making it more adaptable to applications with varying human values. We release the artifacts (Code: github.com/efficient-rlhf. PromptOpinSumm: hf.co/prompt-opin-summ. OpinPref: hf.co/opin-pref) for usage under MIT License.