两个人的电影全免费观看720_美女自拍理论视频_18岁以下禁止观看的黄色免费网站_香蕉网久久综合影院_奇米777四色精品综合影院_久久国产乱子伦精品_国产成人情侣激情不卡小视频

In this work, we introduce Libra, a prototype model with a decoupled vision system on a large language model (LLM). The decoupled vision system decouples inner-modal modeling and cross-modal interaction, yielding unique visual information modeling and effective cross-modal comprehension. Libra is trained through discrete auto-regressive modeling on both vision and language inputs. Specifically, we incorporate a routed visual expert with a cross-modal bridge module into a pretrained LLM to route the vision and language flows during attention computing to enable different attention patterns in inner-modal modeling and cross-modal interaction scenarios. Experimental results demonstrate that the dedicated design of Libra achieves a strong MLLM baseline that rivals existing works in the image-to-text scenario with merely 50 million training data, providing a new perspective for future multimodal foundation models. Code is available at //github.com/YifanXu74/Libra.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 語言模型化 · MoDELS · 可理解性 · 值域 ·

2024 年 6 月 25 日

PatentEval: Understanding Errors in Patent Generation

You Zuo,Kim Gerdes,Eric Villemonte de La Clergerie,Beno?t Sagot

In this work, we introduce a comprehensive error typology specifically designed for evaluating two distinct tasks in machine-generated patent texts: claims-to-abstract generation, and the generation of the next claim given previous ones. We have also developed a benchmark, PatentEval, for systematically assessing language models in this context. Our study includes a comparative analysis, annotated by humans, of various models. These range from those specifically adapted during training for tasks within the patent domain to the latest general-purpose large language models (LLMs). Furthermore, we explored and evaluated some metrics to approximate human judgments in patent text evaluation, analyzing the extent to which these metrics align with expert assessments. These approaches provide valuable insights into the capabilities and limitations of current language models in the specialized field of patent text generation.

Performer · 代碼 · MoDELS · 相關系數 · 語言模型化 ·

2024 年 6 月 21 日

DistiLRR: Transferring Code Repair for Low-Resource Programming Languages

Kyle Wong,Alfonso Amayuelas,Liangming Pan,William Yang Wang

Large language models (LLMs) have shown remarkable performance on code generation tasks. A recent application of LLMs for code generation is iterative code repair, where a model fixes an incorrect program by rationalizing about errors and generating a new program. However, code repair is primarily studied on high-resource languages like Python, and the framework's efficacy is under-explored on low-resource languages. To apply code repair for low-resource languages, we propose Distilling Low-Resource Repairs (DistiLRR), an approach that transfers the reasoning and code generation ability from a teacher model to a student model. Our results show that DistiLRR consistently outperforms baselines on low-resource languages, but has similar performance on high-resource languages. To investigate this behavior, we perform a further analysis and find that the correlation between rationale quality and code correctness is weaker than previously perceived. We hypothesize this weakness is magnified in low-resource settings where base models lack deep knowledge of a programming language, leading to wavering benefits of code repair between high-resource and low-resource languages.

語言模型化 · 泛函 · Analysis · Integration · MoDELS ·

2024 年 6 月 20 日

QuST-LLM: Integrating Large Language Models for Comprehensive Spatial Transcriptomics Analysis

Chao Hui Huang

from arxiv, 12 pages, 7 figures

In this paper, we introduce QuST-LLM, an innovative extension of QuPath that utilizes the capabilities of large language models (LLMs) to analyze and interpret spatial transcriptomics (ST) data. This tool effectively simplifies the intricate and high-dimensional nature of ST data by offering a comprehensive workflow that includes data loading, region selection, gene expression analysis, and functional annotation. QuST-LLM employs LLMs to transform complex ST data into understandable and detailed biological narratives based on gene ontology annotations, thereby significantly improving the interpretability of ST data. Consequently, users can interact with their own ST data using natural language. Hence, QuST-LLM provides researchers with a potent functionality to unravel the spatial and functional complexities of tissues, fostering novel insights and advancements in biomedical research.

多峰值 · 語言模型化 · MoDELS · Vision · SimPLe ·

2024 年 6 月 20 日

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

Wanting Xu,Yang Liu,Langping He,Xucheng Huang,Ling Jiang

We introduce Xmodel-VLM, a cutting-edge multimodal vision language model. It is designed for efficient deployment on consumer GPU servers. Our work directly confronts a pivotal industry issue by grappling with the prohibitive service costs that hinder the broad adoption of large-scale multimodal systems. Through rigorous training, we have developed a 1B-scale language model from the ground up, employing the LLaVA paradigm for modal alignment. The result, which we call Xmodel-VLM, is a lightweight yet powerful multimodal vision language model. Extensive testing across numerous classic multimodal benchmarks has revealed that despite its smaller size and faster execution, Xmodel-VLM delivers performance comparable to that of larger models. Our model checkpoints and code are publicly available on GitHub at //github.com/XiaoduoAILab/XmodelVLM.

MoDELS · 估計/估計量 · Markovian · state-of-the-art · 回合 ·

2024 年 6 月 18 日

Bayesian Consistency for Long Memory Processes: A Semiparametric Perspective

Clara Grazian

In this work, we will investigate a Bayesian approach to estimating the parameters of long memory models. Long memory, characterized by the phenomenon of hyperbolic autocorrelation decay in time series, has garnered significant attention. This is because, in many situations, the assumption of short memory, such as the Markovianity assumption, can be deemed too restrictive. Applications for long memory models can be readily found in fields such as astronomy, finance, and environmental sciences. However, current parametric and semiparametric approaches to modeling long memory present challenges, particularly in the estimation process. In this study, we will introduce various methods applied to this problem from a Bayesian perspective, along with a novel semiparametric approach for deriving the posterior distribution of the long memory parameter. Additionally, we will establish the asymptotic properties of the model. An advantage of this approach is that it allows to implement state-of-the-art efficient algorithms for nonparametric Bayesian models.

可理解性 · MoDELS · 數據集 · state-of-the-art · 圖像字幕 ·

2024 年 6 月 18 日

VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding

Xiang Li,Jian Ding,Mohamed Elhoseiny

from arxiv, Submitted for consideration at a conference

We introduce a new benchmark designed to advance the development of general-purpose, large-scale vision-language models for remote sensing images. Although several vision-language datasets in remote sensing have been proposed to pursue this goal, existing datasets are typically tailored to single tasks, lack detailed object information, or suffer from inadequate quality control. Exploring these improvement opportunities, we present a Versatile vision-language Benchmark for Remote Sensing image understanding, termed VRSBench. This benchmark comprises 29,614 images, with 29,614 human-verified detailed captions, 52,472 object references, and 123,221 question-answer pairs. It facilitates the training and evaluation of vision-language models across a broad spectrum of remote sensing image understanding tasks. We further evaluated state-of-the-art models on this benchmark for three vision-language tasks: image captioning, visual grounding, and visual question answering. Our work aims to significantly contribute to the development of advanced vision-language models in the field of remote sensing. The data and code can be accessed at //github.com/lx709/VRSBench.

MoDELS · 模型評估 · 聯邦學習 · Learning · Notability ·

2024 年 6 月 18 日

GPT-FL: Generative Pre-trained Model-Assisted Federated Learning

Tuo Zhang,Tiantian Feng,Samiul Alam,Dimitrios Dimitriadis,Sunwoo Lee,Mi Zhang,Shrikanth S. Narayanan,Salman Avestimehr

In this work, we propose GPT-FL, a generative pre-trained model-assisted federated learning (FL) framework. At its core, GPT-FL leverages generative pre-trained models to generate diversified synthetic data. These generated data are used to train a downstream model on the server, which is then fine-tuned with private client data under the standard FL framework. We show that GPT-FL consistently outperforms state-of-the-art FL methods in terms of model test accuracy, communication efficiency, and client sampling efficiency. Through comprehensive ablation analysis across various data modalities, we discover that the downstream model generated by synthetic data plays a crucial role in controlling the direction of gradient diversity during FL training, which enhances convergence speed and contributes to the notable accuracy boost observed with GPT-FL. Also, regardless of whether the target data falls within or outside the domain of the pre-trained generative model, GPT-FL consistently achieves significant performance gains, surpassing the results obtained by models trained solely with FL or synthetic data. The code is available at //github.com/AvestimehrResearchGroup/GPT-FL.

多峰值 · 優化器 · MoDELS · 語言模型化 · 大語言模型 ·

2024 年 6 月 17 日

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Fei Wang,Wenxuan Zhou,James Y. Huang,Nan Xu,Sheng Zhang,Hoifung Poon,Muhao Chen

Direct preference optimization (DPO) has shown to be an effective method for large language model (LLM) alignment. Recent works have attempted to apply DPO to multimodal scenarios but have found it challenging to achieve consistent improvement. Through a comparative experiment, we identify the unconditional preference problem in multimodal preference optimization, where the model overlooks the image condition. To address this problem, we propose mDPO, a multimodal DPO objective that prevents the over-prioritization of language-only preferences by also optimizing image preference. Moreover, we introduce a reward anchor that forces the reward to be positive for chosen responses, thereby avoiding the decrease in their likelihood -- an intrinsic problem of relative preference optimization. Experiments on two multimodal LLMs of different sizes and three widely used benchmarks demonstrate that mDPO effectively addresses the unconditional preference problem in multimodal preference optimization and significantly improves model performance, particularly in reducing hallucination.

位置編碼 · INFORMS · 哈爾濱工業大學（HIT） · Better · MoDELS ·

2024 年 6 月 17 日

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

Zhenyu He,Guhao Feng,Shengjie Luo,Kai Yang,Liwei Wang,Jingjing Xu,Zhi Zhang,Hongxia Yang,Di He

from arxiv, 17 pages, 7 figures, 8 tables; ICML 2024 Camera Ready version; Code: //github.com/zhenyuhe00/BiPE

In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute positional encoding. The inter-segment encoding specifies the segment index, models the relationships between segments, and aims to improve extrapolation capabilities via relative positional encoding. Theoretical analysis shows this disentanglement of positional information makes learning more effective. The empirical results also show that our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.

語音識別 · MoDELS · 多峰值 · Spotlight · 模型評估 ·

2024 年 6 月 15 日

AVR: Synergizing Foundation Models for Audio-Visual Humor Detection

Sarthak Sharma,Orchid Chetia Phukan,Drishti Singh,Arun Balaji Buduru,Rajesh Sharma

from arxiv, Accepted to INTERSPEECH 2024 Show & Tell Demonstrations

In this work, we present, AVR application for audio-visual humor detection. While humor detection has traditionally centered around textual analysis, recent advancements have spotlighted multimodal approaches. However, these methods lean on textual cues as a modality, necessitating the use of ASR systems for transcribing the audio-data. This heavy reliance on ASR accuracy can pose challenges in real-world applications. To address this bottleneck, we propose an innovative audio-visual humor detection system that circumvents textual reliance, eliminating the need for ASR models. Instead, the proposed approach hinges on the intricate interplay between audio and visual content for effective humor detection.