好诱人的搜子好爽免费观看-亚洲日韩中文字幕一级乱码在线播放不卡

Although fine-tuning Large Language Models (LLMs) with multilingual data can rapidly enhance the multilingual capabilities of LLMs, they still exhibit a performance gap between the dominant language (e.g., English) and non-dominant ones due to the imbalance of training data across languages. To further enhance the performance of non-dominant languages, we propose ShifCon, a Shift-based Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one. Specifically, it shifts the representations of non-dominant languages into the dominant language subspace, allowing them to access relatively rich information encoded in the model parameters. The enriched representations are then shifted back into their original language subspace before generation. Moreover, we introduce a subspace distance metric to pinpoint the optimal layer area for shifting representations and employ multilingual contrastive learning to further enhance the alignment of representations within this area. Experiments demonstrate that our ShifCon framework significantly enhances the performance of non-dominant languages, particularly for low-resource ones. Further analysis offers extra insights to verify the effectiveness of ShifCon and propel future research

相關內容

contrastive

關注 1

cache · MoDELS · 模型評估 · 語言模型化 · Processing（編程語言） ·

2024 年 12 月 19 日

DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Yuhan Liu,Yuyang Huang,Jiayi Yao,Zhuohan Gu,Kuntai Du,Hanchen Li,Yihua Cheng,Junchen Jiang,Shan Lu,Madan Musuvathi,Esha Choukse

Large Language Models (LLMs) are increasingly employed in complex workflows, where different LLMs and fine-tuned variants collaboratively address complex tasks. However, these systems face significant inefficiencies due to redundant context processing of the shared context. We propose DroidSpeak, a framework that optimizes context sharing between fine-tuned LLMs derived from the same foundational model. DroidSpeak identifies critical layers in the KV cache and selectively recomputes them, enabling effective reuse of intermediate data while maintaining high accuracy. Our approach balances computational efficiency and task fidelity, significantly reducing inference latency and throughput bottlenecks. Experiments on diverse datasets and model pairs demonstrate that DroidSpeak achieves up to 3x higher throughputs and 2.6x faster prefill times with negligible accuracy loss compared to full recomputation.

多樣性 · 設計 · Harmony · 得分 · Elevate ·

2024 年 12 月 19 日

HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs

Pham Vu Tuan Dat,Long Doan,Huynh Thi Thanh Binh

from arxiv, 18 pages, 12 figures

Automatic Heuristic Design (AHD) is an active research area due to its utility in solving complex search and NP-hard combinatorial optimization problems in the real world. The recent advancements in Large Language Models (LLMs) introduce new possibilities by coupling LLMs with evolutionary computation to automatically generate heuristics, known as LLM-based Evolutionary Program Search (LLM-EPS). While previous LLM-EPS studies obtained great performance on various tasks, there is still a gap in understanding the properties of heuristic search spaces and achieving a balance between exploration and exploitation, which is a critical factor in large heuristic search spaces. In this study, we address this gap by proposing two diversity measurement metrics and perform an analysis on previous LLM-EPS approaches, including FunSearch, EoH, and ReEvo. Results on black-box AHD problems reveal that while EoH demonstrates higher diversity than FunSearch and ReEvo, its objective score is unstable. Conversely, ReEvo's reflection mechanism yields good objective scores but fails to optimize diversity effectively. With this finding in mind, we introduce HSEvo, an adaptive LLM-EPS framework that maintains a balance between diversity and convergence with a harmony search algorithm. Through experimentation, we find that HSEvo achieved high diversity indices and good objective scores while remaining cost-effective. These results underscore the importance of balancing exploration and exploitation and understanding heuristic search spaces in designing frameworks in LLM-EPS.

數據集 · 機器閱讀理解 · 情景 · 模型評估 · 自然語言處理 ·

2024 年 12 月 18 日

2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset

Marta R. Costa-jussà,Bokai Yu,Pierre Andrews,Belen Alastruey,Necati Cihan Camgoz,Joe Chuang,Jean Maillard,Christophe Ropers,Arina Turkantenko,Carleigh Wood

We introduce the first highly multilingual speech and American Sign Language (ASL) comprehension dataset by extending BELEBELE. Our dataset covers 74 spoken languages at the intersection of BELEBELE and FLEURS, and one sign language (ASL). We evaluate 2M-BELEBELE dataset for both 5-shot and zero-shot settings and across languages, the speech comprehension accuracy is ~ 2-3% average lower compared to reading comprehension.

MoDELS · SimPLe · 多峰值 · INFORMS · Notability ·

2024 年 12 月 18 日

XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser

Xianfu Cheng,Hang Zhang,Jian Yang,Xiang Li,Weixiao Zhou,Fei Liu,Kui Wu,Xiangyuan Guan,Tao Sun,Xianjie Wu,Tongliang Li,Zhoujun Li

from arxiv, 15 pages, 8 figures, 8 tables

In the domain of Document AI, parsing semi-structured image form is a crucial Key Information Extraction (KIE) task. The advent of pre-trained multimodal models significantly empowers Document AI frameworks to extract key information from form documents in different formats such as PDF, Word, and images. Nonetheless, form parsing is still encumbered by notable challenges like subpar capabilities in multilingual parsing and diminished recall in industrial contexts in rich text and rich visuals. In this work, we introduce a simple but effective \textbf{M}ultimodal and \textbf{M}ultilingual semi-structured \textbf{FORM} \textbf{PARSER} (\textbf{XFormParser}), which anchored on a comprehensive Transformer-based pre-trained language model and innovatively amalgamates semantic entity recognition (SER) and relation extraction (RE) into a unified framework. Combined with Bi-LSTM, the performance of multilingual parsing is significantly improved. Furthermore, we develop InDFormSFT, a pioneering supervised fine-tuning (SFT) industrial dataset that specifically addresses the parsing needs of forms in various industrial contexts. XFormParser has demonstrated its unparalleled effectiveness and robustness through rigorous testing on established benchmarks. Compared to existing state-of-the-art (SOTA) models, XFormParser notably achieves up to 1.79\% F1 score improvement on RE tasks in language-specific settings. It also exhibits exceptional cross-task performance improvements in multilingual and zero-shot settings. The codes, datasets, and pre-trained models are publicly available at //github.com/zhbuaa0/xformparser.

MoDELS · 語言模型化 · INFORMS · 端到端 · 大語言模型 ·

2024 年 12 月 18 日

InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models

Cong Wei,Yujie Zhong,Haoxian Tan,Yingsen Zeng,Yong Liu,Zheng Zhao,Yujiu Yang

Boosted by Multi-modal Large Language Models (MLLMs), text-guided universal segmentation models for the image and video domains have made rapid progress recently. However, these methods are often developed separately for specific domains, overlooking the similarities in task settings and solutions across these two areas. In this paper, we define the union of referring segmentation and reasoning segmentation at both the image and video levels as Instructed Visual Segmentation (IVS). Correspondingly, we propose InstructSeg, an end-to-end segmentation pipeline equipped with MLLMs for IVS. Specifically, we employ an object-aware video perceiver to extract temporal and object information from reference frames, facilitating comprehensive video understanding. Additionally, we introduce vision-guided multi-granularity text fusion to better integrate global and detailed text information with fine-grained visual guidance. By leveraging multi-task and end-to-end training, InstructSeg demonstrates superior performance across diverse image and video segmentation tasks, surpassing both segmentation specialists and MLLM-based methods with a single model. Our code is available at //github.com/congvvc/InstructSeg.

多峰值 · MoDELS · 可辨認的 · AI · 語言模型化 ·

2024 年 12 月 18 日

Mobilizing Waldo: Evaluating Multimodal AI for Public Mobilization

Manuel Cebrian,Petter Holme,Niccolo Pescetelli

from arxiv, 11 pages, 2 figures, 2 tables

Advancements in multimodal Large Language Models (LLMs), such as OpenAI's GPT-4o, offer significant potential for mediating human interactions across various contexts. However, their use in areas such as persuasion, influence, and recruitment raises ethical and security concerns. To evaluate these models ethically in public influence and persuasion scenarios, we developed a prompting strategy using "Where's Waldo?" images as proxies for complex, crowded gatherings. This approach provides a controlled, replicable environment to assess the model's ability to process intricate visual information, interpret social dynamics, and propose engagement strategies while avoiding privacy concerns. By positioning Waldo as a hypothetical agent tasked with face-to-face mobilization, we analyzed the model's performance in identifying key individuals and formulating mobilization tactics. Our results show that while the model generates vivid descriptions and creative strategies, it cannot accurately identify individuals or reliably assess social dynamics in these scenarios. Nevertheless, this methodology provides a valuable framework for testing and benchmarking the evolving capabilities of multimodal LLMs in social contexts.

Analysis · MoDELS · 樣本 · 有向 · 相互獨立的 ·

2024 年 12 月 18 日

GLACIAL: Granger and Learning-based Causality Analysis for Longitudinal Imaging Studies

Minh Nguyen,Gia H. Ngo,Mert R. Sabuncu

from arxiv, Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) //melba-journal.org/2024:028

The Granger framework is useful for discovering causal relations in time-varying signals. However, most Granger causality (GC) methods are developed for densely sampled timeseries data. A substantially different setting, particularly common in medical imaging, is the longitudinal study design, where multiple subjects are followed and sparsely observed over time. Longitudinal studies commonly track several biomarkers, which are likely governed by nonlinear dynamics that might have subject-specific idiosyncrasies and exhibit both direct and indirect causes. Furthermore, real-world longitudinal data often suffer from widespread missingness. GC methods are not well-suited to handle these issues. In this paper, we propose an approach named GLACIAL (Granger and LeArning-based CausalIty Analysis for Longitudinal studies) to fill this methodological gap by marrying GC with a multi-task neural forecasting model. GLACIAL treats subjects as independent samples and uses the model's average prediction accuracy on hold-out subjects to probe causal links. Input dropout and model interpolation are used to efficiently learn nonlinear dynamic relationships between a large number of variables and to handle missing values respectively. Extensive simulations and experiments on a real longitudinal medical imaging dataset show GLACIAL beating competitive baselines and confirm its utility. Our code is available at //github.com/mnhng/GLACIAL.

Prompt · MoDELS · 語言模型化 · 大語言模型 · Learning ·

2024 年 12 月 17 日

Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment

Congzhi Zhang,Linhai Zhang,Jialong Wu,Yulan He,Deyu Zhou

from arxiv, Accepted by AAAI 2025

Despite the notable advancements of existing prompting methods, such as In-Context Learning and Chain-of-Thought for Large Language Models (LLMs), they still face challenges related to various biases. Traditional debiasing methods primarily focus on the model training stage, including approaches based on data augmentation and reweighting, yet they struggle with the complex biases inherent in LLMs. To address such limitations, the causal relationship behind the prompting methods is uncovered using a structural causal model, and a novel causal prompting method based on front-door adjustment is proposed to effectively mitigate LLMs biases. In specific, causal intervention is achieved by designing the prompts without accessing the parameters and logits of LLMs. The chain-of-thought generated by LLM is employed as the mediator variable and the causal effect between input prompts and output answers is calculated through front-door adjustment to mitigate model biases. Moreover, to accurately represent the chain-of-thoughts and estimate the causal effects, contrastive learning is used to fine-tune the encoder of chain-of-thought by aligning its space with that of the LLM. Experimental results show that the proposed causal prompting approach achieves excellent performance across seven natural language processing datasets on both open-source and closed-source LLMs.

Extensibility · 多樣性 · HTTPS · MoDELS · 語言模型化 ·

2024 年 12 月 17 日

OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

Shuting Wang,Jiejun Tan,Zhicheng Dou,Ji-Rong Wen

As a typical and practical application of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) techniques have gained extensive attention, particularly in vertical domains where LLMs may lack domain-specific knowledge. In this paper, we introduce an omnidirectional and automatic RAG benchmark, OmniEval, in the financial domain. Our benchmark is characterized by its multi-dimensional evaluation framework, including (1) a matrix-based RAG scenario evaluation system that categorizes queries into five task classes and 16 financial topics, leading to a structured assessment of diverse query scenarios; (2) a multi-dimensional evaluation data generation approach, which combines GPT-4-based automatic generation and human annotation, achieving an 87.47\% acceptance ratio in human evaluations on generated instances; (3) a multi-stage evaluation system that evaluates both retrieval and generation performance, result in a comprehensive evaluation on the RAG pipeline; and (4) robust evaluation metrics derived from rule-based and LLM-based ones, enhancing the reliability of assessments through manual annotations and supervised fine-tuning of an LLM evaluator. Our experiments demonstrate the comprehensiveness of OmniEval, which includes extensive test datasets and highlights the performance variations of RAG systems across diverse topics and tasks, revealing significant opportunities for RAG models to improve their capabilities in vertical domains. We open source the code of our benchmark in \href{//github.com/RUC-NLPIR/OmniEval}{//github.com/RUC-NLPIR/OmniEval}.

MoDELS · 數據集 · 語言模型化 · 原點 · 相似度 ·

2024 年 12 月 17 日

Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features

Yupei Li,Manuel Milling,Lucia Specia,Bj?rn W. Schuller

The availability of high-quality APIs for Large Language Models (LLMs) has facilitated the widespread creation of Machine-Generated Content (MGC), posing challenges such as academic plagiarism and the spread of misinformation. Existing MGC detectors often focus solely on surface-level information, overlooking implicit and structural features. This makes them susceptible to deception by surface-level sentence patterns, particularly for longer texts and in texts that have been subsequently paraphrased. To overcome these challenges, we introduce novel methodologies and datasets. Besides the publicly available dataset Plagbench, we developed the paraphrased Long-Form Question and Answer (paraLFQA) and paraphrased Writing Prompts (paraWP) datasets using GPT and DIPPER, a discourse paraphrasing tool, by extending artifacts from their original versions. To address the challenge of detecting highly similar paraphrased texts, we propose MhBART, an encoder-decoder model designed to emulate human writing style while incorporating a novel difference score mechanism. This model outperforms strong classifier baselines and identifies deceptive sentence patterns. To better capture the structure of longer texts at document level, we propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features. It results in substantial performance gains across both datasets -- 15.5\% absolute improvement on paraLFQA, 4\% absolute improvement on paraWP, and 1.5\% absolute improvement on M4 compared to SOTA approaches.