国产综合欧美日韩激情在线_国产真实乱人伦视频在线观看_就去色成人网_三年片在线看免费观看_成人国产亚洲精品2区_久久精品黄色夫妻视频_手机免费中文字幕第二区

Nathaniel Li,Alexander Pan,Anjali Gopal,Summer Yue,Daniel Berrios,Alice Gatti,Justin D. Li,Ann-Kathrin Dombrowski,Shashwat Goel,Long Phan,Gabriel Mukobi,Nathan Helm-Burger,Rassin Lababidi,Lennart Justen,Andrew B. Liu,Michael Chen,Isabelle Barrass,Oliver Zhang,Xiaoyuan Zhu,Rishub Tamirisa,Bhrugu Bharathi,Adam Khoja,Zhenqi Zhao,Ariel Herbert-Voss,Cort B. Breuer,Samuel Marks,Oam Patel,Andy Zou,Mantas Mazeika,Zifan Wang,Palash Oswal,Weiran Lin,Adam A. Hunt,Justin Tienken-Harder,Kevin Y. Shih,Kemper Talley,John Guan,Russell Kaplan,Ian Steneker,David Campbell,Brad Jokubaitis,Alex Levinson,Jean Wang,William Qian,Kallol Krishna Karmakar,Steven Basart,Stephen Fitz,Mindy Levine,Ponnurangam Kumaraguru,Uday Tupakula,Vijay Varadharajan,Ruoyu Wang,Yan Shoshitaishvili,Jimmy Ba,Kevin M. Esvelt,Alexandr Wang,Dan Hendrycks

from arxiv, See the project page at //wmdp.ai

The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at //wmdp.ai

相關內容

可約的

關注 2

推斷 · MoDELS · 解碼 · 大語言模型 · FAST ·

2024 年 6 月 24 日

Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

Euiin Yi,Taehyeon Kim,Hongseok Jeung,Du-Seong Chang,Se-Young Yun

Large language models (LLMs) have revolutionized natural language processing and broadened their applicability across diverse commercial applications. However, the deployment of these models is constrained by high inference time in multilingual settings. To mitigate this challenge, this paper explores a training recipe of an assistant model in speculative decoding, which are leveraged to draft and-then its future tokens are verified by the target LLM. We show that language-specific draft models, optimized through a targeted pretrain-and-finetune strategy, substantially brings a speedup of inference time compared to the previous methods. We validate these models across various languages in inference time, out-of-domain speedup, and GPT-4o evaluation.

MoDELS · Processing（編程語言） · 標量 · 可辨認的 · 成對型 ·

2024 年 6 月 24 日

Towards Comprehensive Preference Data Collection for Reward Modeling

Yulan Hu,Qingyang Li,Sheng Ouyang,Ge Chen,Kaihui Chen,Lijun Mei,Xucheng Ye,Fuzheng Zhang,Yong Liu

Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models (LLMs) with human preferences, thereby enhancing the quality of responses generated. A critical component of RLHF is the reward model, which is trained on preference data and outputs a scalar reward during the inference stage. However, the collection of preference data still lacks thorough investigation. Recent studies indicate that preference data is collected either by AI or humans, where chosen and rejected instances are identified among pairwise responses. We question whether this process effectively filters out noise and ensures sufficient diversity in collected data. To address these concerns, for the first time, we propose a comprehensive framework for preference data collection, decomposing the process into four incremental steps: Prompt Generation, Response Generation, Response Filtering, and Human Labeling. This structured approach ensures the collection of high-quality preferences while reducing reliance on human labor. We conducted comprehensive experiments based on the data collected at different stages, demonstrating the effectiveness of the proposed data collection method.

多峰值 · Analysis · Agent · 語言模型化 · 可穿戴設備 ·

2024 年 6 月 20 日

LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors

Sheikh Asif Imran,Mohammad Nur Hossain Khan,Subrata Biswas,Bashima Islam

from arxiv, Under review at ARR (for EMNLP 2024)

Integrating inertial measurement units (IMUs) with large language models (LLMs) advances multimodal AI by enhancing human activity understanding. We introduce SensorCaps, a dataset of 26,288 IMU-derived activity narrations, and OpenSQA, an instruction-following dataset with 257,562 question-answer pairs. Combining LIMU-BERT and Llama, we develop LLaSA, a Large Multimodal Agent capable of interpreting and responding to activity and motion analysis queries. Our evaluation demonstrates LLaSA's effectiveness in activity classification and question answering, highlighting its potential in healthcare, sports science, and human-computer interaction. These contributions advance sensor-aware language models and open new research avenues. Our code repository and datasets can be found on //github.com/BASHLab/LLaSA.

語言模型化 · 大語言模型 · MoDELS · 優化器 · 估計/估計量 ·

2024 年 6 月 20 日

ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

Zhiyu Mei,Wei Fu,Kaiwei Li,Guangju Wang,Huanchen Zhang,Yi Wu

from arxiv, 13 pages (15 pages with references), 13 figures

Reinforcement Learning from Human Feedback (RLHF) stands as a pivotal technique in empowering large language model (LLM) applications. Since RLHF involves diverse computational workloads and intricate dependencies among multiple LLMs, directly adopting parallelization techniques from supervised training can result in sub-optimal performance. To overcome this limitation, we propose a novel approach named parameter ReaLlocation, which dynamically redistributes LLM parameters in the cluster and adapts parallelization strategies during training. Building upon this idea, we introduce ReaLHF, a pioneering system capable of automatically discovering and running efficient execution plans for RLHF training given the desired algorithmic and hardware configurations. ReaLHF formulates the execution plan for RLHF as an augmented dataflow graph. Based on this formulation, ReaLHF employs a tailored search algorithm with a lightweight cost estimator to discover an efficient execution plan. Subsequently, the runtime engine deploys the selected plan by effectively parallelizing computations and redistributing parameters. We evaluate ReaLHF on the LLaMA-2 models with up to $4\times70$ billion parameters and 128 GPUs. The experiment results showcase ReaLHF's substantial speedups of $2.0-10.6\times$ compared to baselines. Furthermore, the execution plans generated by ReaLHF exhibit an average of $26\%$ performance improvement over heuristic approaches based on Megatron-LM. The source code of ReaLHF is publicly available at //github.com/openpsi-project/ReaLHF .

MoDELS · Learning · 數據可用性 · 多峰值 · 值域 ·

2024 年 6 月 18 日

Synergizing Foundation Models and Federated Learning: A Survey

Shenghui Li,Fanghua Ye,Meng Fang,Jiaxu Zhao,Yun-Hin Chan,Edith C. -H. Ngai,Thiemo Voigt

The recent development of Foundation Models (FMs), represented by large language models, vision transformers, and multimodal models, has been making a significant impact on both academia and industry. Compared with small-scale models, FMs have a much stronger demand for high-volume data during the pre-training phase. Although general FMs can be pre-trained on data collected from open sources such as the Internet, domain-specific FMs need proprietary data, posing a practical challenge regarding the amount of data available due to privacy concerns. Federated Learning (FL) is a collaborative learning paradigm that breaks the barrier of data availability from different participants. Therefore, it provides a promising solution to customize and adapt FMs to a wide range of domain-specific tasks using distributed datasets whilst preserving privacy. This survey paper discusses the potentials and challenges of synergizing FL and FMs and summarizes core techniques, future directions, and applications. A periodically updated paper collection on FM-FL is available at //github.com/lishenghui/awesome-fm-fl.

優化器 · 語言模型化 · MoDELS · CRAFT · Prompt ·

2024 年 6 月 17 日

Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs

Krista Opsahl-Ong,Michael J Ryan,Josh Purtell,David Broman,Christopher Potts,Matei Zaharia,Omar Khattab

from arxiv, Krista and Michael contributed equally to this work

Language Model Programs, i.e. sophisticated pipelines of modular language model (LM) calls, are increasingly advancing NLP tasks, but they require crafting prompts that are jointly effective for all modules. We study prompt optimization for LM programs, i.e. how to update these prompts to maximize a downstream metric without access to module-level labels or gradients. To make this tractable, we factorize our problem into optimizing the free-form instructions and few-shot demonstrations of every module and introduce several strategies to craft task-grounded instructions and navigate credit assignment across modules. Our strategies include (i) program- and data-aware techniques for proposing effective instructions, (ii) a stochastic mini-batch evaluation function for learning a surrogate model of our objective, and (iii) a meta-optimization procedure in which we refine how LMs construct proposals over time. Using these insights we develop MIPRO, a novel optimizer that outperforms baselines on five of six diverse LM programs using a best-in-class open-source model (Llama-3-8B), by as high as 12.9% accuracy. We will release our new optimizers and benchmark in DSPy at //github.com/stanfordnlp/dspy

可理解性 · MoDELS · Processing（編程語言） · 自動問答 · 詞元分析器 ·

2024 年 6 月 17 日

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Cunxiang Wang,Ruoxi Ning,Boqi Pan,Tonghui Wu,Qipeng Guo,Cheng Deng,Guangsheng Bao,Xiangkun Hu,Zheng Zhang,Qian Wang,Yue Zhang

The rapid advancement of Large Language Models (LLMs) has introduced a new frontier in natural language processing, particularly in understanding and processing long-context information. However, the evaluation of these models' long-context abilities remains a challenge due to the limitations of current benchmarks. To address this gap, we introduce NovelQA, a benchmark specifically designed to test the capabilities of LLMs with extended texts. Constructed from English novels, NovelQA offers a unique blend of complexity, length, and narrative coherence, making it an ideal tool for assessing deep textual understanding in LLMs. This paper presents the design and construction of NovelQA, highlighting its manual annotation, and diverse question types. Our evaluation of Long-context LLMs on NovelQA reveals significant insights into the models' performance, particularly emphasizing the challenges they face with multi-hop reasoning, detail-oriented questions, and extremely long input with an average length more than 200,000 tokens. The results underscore the necessity for further advancements in LLMs to improve their long-context comprehension.

語言模型化 · tuning · MoDELS · 大語言模型 · Performer ·

2024 年 6 月 17 日

DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models

Fan Zhou,Siqiao Xue,Danrui Qi,Wenhui Shi,Wang Zhao,Ganglin Wei,Hongyang Zhang,Caigai Jiang,Gangwei Jiang,Zhixuan Chu,Faqiang Chen

Large language models (LLMs) becomes the dominant paradigm for the challenging task of text-to-SQL. LLM-empowered text-to-SQL methods are typically categorized into prompting-based and tuning approaches. Compared to prompting-based methods, benchmarking fine-tuned LLMs for text-to-SQL is important yet under-explored, partially attributed to the prohibitively high computational cost. In this paper, we present DB-GPT-Hub, an open benchmark suite for LLM-empowered text-to-SQL, which primarily focuses on tuning LLMs at large scales. The proposed benchmark consists of: 1. a standardized and comprehensive evaluation of text-to-SQL tasks by fine-tuning medium to large-sized open LLMs; 2. a modularized and easy-to-extend codebase with mainstream LLMs and experimental scenarios supported, which prioritizes fine-tuning methods but can be easily extended to prompt-based setting. Our work investigates the potential gains and the performance boundaries of tuning approaches, compared to prompting approaches and explores optimal solutions tailored to specific scenarios. We hope DB-GPT-Hub, along with these findings, enables further research and broad applications that would otherwise be difficult owing to the absence of a dedicated open benchmark. The project code has been released at //github.com/eosphoros-ai/DB-GPT-Hub.

NLP · 語言模型化 · 大語言模型 · MoDELS · Taxonomy ·

2024 年 5 月 21 日

Large Language Models Meet NLP: A Survey

Libo Qin,Qiguang Chen,Xiachong Feng,Yang Wu,Yongheng Zhang,Yinghui Li,Min Li,Wanxiang Che,Philip S. Yu

While large language models (LLMs) like ChatGPT have shown impressive capabilities in Natural Language Processing (NLP) tasks, a systematic investigation of their potential in this field remains largely unexplored. This study aims to address this gap by exploring the following questions: (1) How are LLMs currently applied to NLP tasks in the literature? (2) Have traditional NLP tasks already been solved with LLMs? (3) What is the future of the LLMs for NLP? To answer these questions, we take the first step to provide a comprehensive overview of LLMs in NLP. Specifically, we first introduce a unified taxonomy including (1) parameter-frozen application and (2) parameter-tuning application to offer a unified perspective for understanding the current progress of LLMs in NLP. Furthermore, we summarize the new frontiers and the associated challenges, aiming to inspire further groundbreaking advancements. We hope this work offers valuable insights into the {potential and limitations} of LLMs in NLP, while also serving as a practical guide for building effective LLMs in NLP.

Prompt · MoDELS · TOOLS · Continuity · INTERACT ·

2023 年 11 月 21 日

Prompting Frameworks for Large Language Models: A Survey

Xiaoxia Liu,Jingyi Wang,Jun Sun,Xiaohan Yuan,Guoliang Dong,Peng Di,Wenhai Wang,Dongxia Wang

Since the launch of ChatGPT, a powerful AI Chatbot developed by OpenAI, large language models (LLMs) have made significant advancements in both academia and industry, bringing about a fundamental engineering paradigm shift in many areas. While LLMs are powerful, it is also crucial to best use their power where "prompt'' plays a core role. However, the booming LLMs themselves, including excellent APIs like ChatGPT, have several inherent limitations: 1) temporal lag of training data, and 2) the lack of physical capabilities to perform external actions. Recently, we have observed the trend of utilizing prompt-based tools to better utilize the power of LLMs for downstream tasks, but a lack of systematic literature and standardized terminology, partly due to the rapid evolution of this field. Therefore, in this work, we survey related prompting tools and promote the concept of the "Prompting Framework" (PF), i.e. the framework for managing, simplifying, and facilitating interaction with large language models. We define the lifecycle of the PF as a hierarchical structure, from bottom to top, namely: Data Level, Base Level, Execute Level, and Service Level. We also systematically depict the overall landscape of the emerging PF field and discuss potential future research and challenges. To continuously track the developments in this area, we maintain a repository at //github.com/lxx0628/Prompting-Framework-Survey, which can be a useful resource sharing platform for both academic and industry in this field.