亚洲AV永久无码精品九之-日韩精品国产阿V免费在线观看网址

Despite the impressive capabilities of large language models (LLMs) across diverse applications, they still suffer from trustworthiness issues, such as hallucinations and misalignments. Retrieval-augmented language models (RAG) have been proposed to enhance the credibility of generations by grounding external knowledge, but the theoretical understandings of their generation risks remains unexplored. In this paper, we answer: 1) whether RAG can indeed lead to low generation risks, 2) how to provide provable guarantees on the generation risks of RAG and vanilla LLMs, and 3) what sufficient conditions enable RAG models to reduce generation risks. We propose C-RAG, the first framework to certify generation risks for RAG models. Specifically, we provide conformal risk analysis for RAG models and certify an upper confidence bound of generation risks, which we refer to as conformal generation risk. We also provide theoretical guarantees on conformal generation risks for general bounded risk functions under test distribution shifts. We prove that RAG achieves a lower conformal generation risk than that of a single LLM when the quality of the retrieval model and transformer is non-trivial. Our intensive empirical results demonstrate the soundness and tightness of our conformal generation risk guarantees across four widely-used NLP datasets on four state-of-the-art retrieval models.

相關內容

Conformer

關注 0

Agent · INTERACT · 知識 (knowledge) · 大語言模型 · INFORMS ·

2024 年 4 月 11 日

WESE: Weak Exploration to Strong Exploitation for LLM Agents

Xu Huang,Weiwen Liu,Xiaolong Chen,Xingmei Wang,Defu Lian,Yasheng Wang,Ruiming Tang,Enhong Chen

Recently, large language models (LLMs) have demonstrated remarkable potential as an intelligent agent. However, existing researches mainly focus on enhancing the agent's reasoning or decision-making abilities through well-designed prompt engineering or task-specific fine-tuning, ignoring the procedure of exploration and exploitation. When addressing complex tasks within open-world interactive environments, these methods exhibit limitations. Firstly, the lack of global information of environments leads to greedy decisions, resulting in sub-optimal solutions. On the other hand, irrelevant information acquired from the environment not only adversely introduces noise, but also incurs additional cost. This paper proposes a novel approach, Weak Exploration to Strong Exploitation (WESE), to enhance LLM agents in solving open-world interactive tasks. Concretely, WESE involves decoupling the exploration and exploitation process, employing a cost-effective weak agent to perform exploration tasks for global knowledge. A knowledge graph-based strategy is then introduced to store the acquired knowledge and extract task-relevant knowledge, enhancing the stronger agent in success rate and efficiency for the exploitation task. Our approach is flexible enough to incorporate diverse tasks, and obtains significant improvements in both success rates and efficiency across four interactive benchmarks.

MoDELS · 多峰值 · 可理解性 · Integration · Processing（編程語言） ·

2024 年 4 月 8 日

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Bo He,Hengduo Li,Young Kyun Jang,Menglin Jia,Xuefei Cao,Ashish Shah,Abhinav Shrivastava,Ser-Nam Lim

from arxiv, Accepted at CVPR 2024

With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding. In this study, we mainly focus on designing an efficient and effective model for long-term video understanding. Instead of trying to process more frames simultaneously like most existing work, we propose to process videos in an online manner and store past video information in a memory bank. This allows our model to reference historical video content for long-term analysis without exceeding LLMs' context length constraints or GPU memory limits. Our memory bank can be seamlessly integrated into current multimodal LLMs in an off-the-shelf manner. We conduct extensive experiments on various video understanding tasks, such as long-video understanding, video question answering, and video captioning, and our model can achieve state-of-the-art performances across multiple datasets. Code available at //boheumd.github.io/MA-LMM/.

MoDELS · Performer · 大語言模型 · 語言模型化 · Chatbot ·

2024 年 4 月 4 日

ChipNeMo: Domain-Adapted LLMs for Chip Design

Mingjie Liu,Teodor-Dumitru Ene,Robert Kirby,Chris Cheng,Nathaniel Pinckney,Rongjian Liang,Jonah Alben,Himyanshu Anand,Sanmitra Banerjee,Ismet Bayraktaroglu,Bonita Bhaskaran,Bryan Catanzaro,Arjun Chaudhuri,Sharon Clay,Bill Dally,Laura Dang,Parikshit Deshpande,Siddhanth Dhodhi,Sameer Halepete,Eric Hill,Jiashang Hu,Sumit Jain,Ankit Jindal,Brucek Khailany,George Kokai,Kishor Kunal,Xiaowei Li,Charley Lind,Hao Liu,Stuart Oberman,Sujeet Omar,Ghasem Pasandi,Sreedhar Pratty,Jonathan Raiman,Ambar Sarkar,Zhengjiang Shao,Hanfei Sun,Pratik P Suthar,Varun Tej,Walker Turner,Kaizhe Xu,Haoxing Ren

from arxiv, Updated results for ChipNeMo-70B model

ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis. Our evaluations demonstrate that domain-adaptive pretraining of language models, can lead to superior performance in domain related downstream tasks compared to their base LLaMA2 counterparts, without degradations in generic capabilities. In particular, our largest model, ChipNeMo-70B, outperforms the highly capable GPT-4 on two of our use cases, namely engineering assistant chatbot and EDA scripts generation, while exhibiting competitive performance on bug summarization and analysis. These results underscore the potential of domain-specific customization for enhancing the effectiveness of large language models in specialized applications.

tuning · MoDELS · 語言模型化 · Performer · Agent ·

2024 年 4 月 4 日

CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models

Xuechen Liang,Meiling Tao,Tianyu Shi,Yiting Xie

Open large language models (LLMs) have significantly advanced the field of natural language processing, showcasing impressive performance across various tasks.Despite the significant advancements in LLMs, their effective operation still relies heavily on human input to accurately guide the dialogue flow, with agent tuning being a crucial optimization technique that involves human adjustments to the model for better response to such guidance.Addressing this dependency, our work introduces the TinyAgent model, trained on a meticulously curated high-quality dataset. We also present the Collaborative Multi-Agent Tuning (CMAT) framework, an innovative system designed to augment language agent capabilities through adaptive weight updates based on environmental feedback. This framework fosters collaborative learning and real-time adaptation among multiple intelligent agents, enhancing their context-awareness and long-term memory. In this research, we propose a new communication agent framework that integrates multi-agent systems with environmental feedback mechanisms, offering a scalable method to explore cooperative behaviors. Notably, our TinyAgent-7B model exhibits performance on par with GPT-3.5, despite having fewer parameters, signifying a substantial improvement in the efficiency and effectiveness of LLMs.

CP · Conformer · 語言模型化 · 大語言模型 · MoDELS ·

2024 年 4 月 4 日

API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access

Jiayuan Su,Jing Luo,Hongwei Wang,Lu Cheng

This study aims to address the pervasive challenge of quantifying uncertainty in large language models (LLMs) without logit-access. Conformal Prediction (CP), known for its model-agnostic and distribution-free features, is a desired approach for various LLMs and data distributions. However, existing CP methods for LLMs typically assume access to the logits, which are unavailable for some API-only LLMs. In addition, logits are known to be miscalibrated, potentially leading to degraded CP performance. To tackle these challenges, we introduce a novel CP method that (1) is tailored for API-only LLMs without logit-access; (2) minimizes the size of prediction sets; and (3) ensures a statistical guarantee of the user-defined coverage. The core idea of this approach is to formulate nonconformity measures using both coarse-grained (i.e., sample frequency) and fine-grained uncertainty notions (e.g., semantic similarity). Experimental results on both close-ended and open-ended Question Answering tasks show our approach can mostly outperform the logit-based CP baselines.

大語言模型 · 語言模型化 · MoDELS · HTTPS · 數學 ·

2024 年 4 月 3 日

ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

Yifan Xu,Xiao Liu,Xinghan Liu,Zhenyu Hou,Yueyan Li,Xiaohan Zhang,Zihan Wang,Aohan Zeng,Zhengxiao Du,Wenyi Zhao,Jie Tang,Yuxiao Dong

Large language models (LLMs) have shown excellent mastering of human language, but still struggle in real-world applications that require mathematical problem-solving. While many strategies and datasets to enhance LLMs' mathematics are developed, it remains a challenge to simultaneously maintain and improve both language and mathematical capabilities in deployed LLM systems.In this work, we tailor the Self-Critique pipeline, which addresses the challenge in the feedback learning stage of LLM alignment. We first train a general Math-Critique model from the LLM itself to provide feedback signals. Then, we sequentially employ rejective fine-tuning and direct preference optimization over the LLM's own generations for data collection. Based on ChatGLM3-32B, we conduct a series of experiments on both academic and our newly created challenging dataset, MathUserEval. Results show that our pipeline significantly enhances the LLM's mathematical problem-solving while still improving its language ability, outperforming LLMs that could be two times larger. Related techniques have been deployed to ChatGLM\footnote{\url{//chatglm.cn}}, an online serving LLM. Related evaluation dataset and scripts are released at \url{//github.com/THUDM/ChatGLM-Math}.

圖 · U-Net · Neural Networks · 卷積 · Networking ·

2024 年 4 月 2 日

MAgNET: A Graph U-Net Architecture for Mesh-Based Simulations

Saurabh Deshpande,Stéphane P. A. Bordas,Jakub Lengiewicz

In many cutting-edge applications, high-fidelity computational models prove to be too slow for practical use and are therefore replaced by much faster surrogate models. Recently, deep learning techniques have increasingly been utilized to accelerate such predictions. To enable learning on large-dimensional and complex data, specific neural network architectures have been developed, including convolutional and graph neural networks. In this work, we present a novel encoder-decoder geometric deep learning framework called MAgNET, which extends the well-known convolutional neural networks to accommodate arbitrary graph-structured data. MAgNET consists of innovative Multichannel Aggregation (MAg) layers and graph pooling/unpooling layers, forming a graph U-Net architecture that is analogous to convolutional U-Nets. We demonstrate the predictive capabilities of MAgNET in surrogate modeling for non-linear finite element simulations in the mechanics of solids.

INTERACT · 大語言模型 · MoDELS · HTTPS · 訓練數據 ·

2024 年 4 月 1 日

LLM Attributor: Interactive Visual Attribution for LLM Generation

Seongmin Lee,Zijie J. Wang,Aishwarya Chakravarthy,Alec Helbling,ShengYun Peng,Mansi Phute,Duen Horng Chau,Minsuk Kahng

from arxiv, 8 pages, 3 figures, For a video demo, see //youtu.be/mIG2MDQKQxM

While large language models (LLMs) have shown remarkable capability to generate convincing text across diverse domains, concerns around its potential risks have highlighted the importance of understanding the rationale behind text generation. We present LLM Attributor, a Python library that provides interactive visualizations for training data attribution of an LLM's text generation. Our library offers a new way to quickly attribute an LLM's text generation to training data points to inspect model behaviors, enhance its trustworthiness, and compare model-generated text with user-provided text. We describe the visual and interactive design of our tool and highlight usage scenarios for LLaMA2 models fine-tuned with two different datasets: online articles about recent disasters and finance-related question-answer pairs. Thanks to LLM Attributor's broad support for computational notebooks, users can easily integrate it into their workflow to interactively visualize attributions of their models. For easier access and extensibility, we open-source LLM Attributor at //github.com/poloclub/ LLM-Attribution. The video demo is available at //youtu.be/mIG2MDQKQxM.

Machine Learning · 分布式機器學習 · Learning · Performer · 簇 ·

2024 年 3 月 29 日

TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning

William Won,Midhilesh Elavazhagan,Sudarshan Srinivasan,Ajaya Durg,Samvit Kaul,Swati Gupta,Tushar Krishna

The surge of artificial intelligence, specifically large language models, has led to a rapid advent towards the development of large-scale machine learning training clusters. Collective communications within these clusters tend to be heavily bandwidth-bound, necessitating techniques to optimally utilize the available network bandwidth. This puts the routing algorithm for the collective at the forefront of determining the performance. Unfortunately, communication libraries used in distributed machine learning today are limited by a fixed set of routing algorithms. This constraints collective performance within the domain of next-generation training clusters that employ intricate, heterogeneous, and asymmetric, large-scale topologies. Further, the emergence of irregular topologies attributed to runtime phenomena such as device failures serves to compound the complexity of the challenge. To this end, this paper introduces TACOS, an automated synthesizer that generates topology-aware collective algorithms for common distributed machine learning collectives across arbitrary input network topologies. TACOS was able to synthesize All-Reduce algorithm for a heterogeneous 512-NPU system in just 6.09 minutes while achieving performance improvement up to 4.27x over state-of-the-art prior work. TACOS exhibits high scalability, with synthesis time scaling quadratically with the number of NPUs. In contrast to prior works' NP-hard approaches, TACOS with 40K NPUs completes in 2.52 hours.

語言模型化 · Performer · Agent · MoDELS · Learning ·

2023 年 5 月 19 日

Introspective Tips: Large Language Model for In-Context Decision Making

Liting Chen,Lu Wang,Hang Dong,Yali Du,Jie Yan,Fangkai Yang,Shuang Li,Pu Zhao,Si Qin,Saravan Rajmohan,Qingwei Lin,Dongmei Zhang

from arxiv, 22 pages, 4 figures

The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks. In this study, we employ ``Introspective Tips" to facilitate LLMs in self-optimizing their decision-making. By introspectively examining trajectories, LLM refines its policy by generating succinct and valuable tips. Our method enhances the agent's performance in both few-shot and zero-shot learning situations by considering three essential scenarios: learning from the agent's past experiences, integrating expert demonstrations, and generalizing across diverse games. Importantly, we accomplish these improvements without fine-tuning the LLM parameters; rather, we adjust the prompt to generalize insights from the three aforementioned situations. Our framework not only supports but also emphasizes the advantage of employing LLM in in-contxt decision-making. Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.