亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tfoot id='7c94d'></tfoot>

<legend id='7c94d'><style id='7c94d'><dir id='7c94d'><q id='7c94d'></q></dir></style></legend>

<i id='7c94d'><tr id='7c94d'><dt id='7c94d'><q id='7c94d'><span id='7c94d'><b id='7c94d'><form id='7c94d'><ins id='7c94d'></ins><ul id='7c94d'></ul><sub id='7c94d'></sub></form><legend id='7c94d'></legend><bdo id='7c94d'><pre id='7c94d'><center id='7c94d'></center></pre></bdo></b><th id='7c94d'></th></span></q></dt></tr></i><div id='7c94d'><tfoot id='7c94d'></tfoot><dl id='7c94d'><fieldset id='7c94d'></fieldset></dl></div>

<li id='7c94d'><abbr id='7c94d'></abbr></li>

·

MoDELS · 語言模型化 · TOOLS · 模型性能 · Performance ·

2024 年 9 月 26 日

Quality Matters: Evaluating Synthetic Data for Tool-Using LLMs

Shadi Iskander,Nachshon Cohen,Zohar Karnin,Ori Shapira,Sofia Tolmach

Training large language models (LLMs) for external tool usage is a rapidly expanding field, with recent research focusing on generating synthetic data to address the shortage of available data. However, the absence of systematic data quality checks poses complications for properly training and testing models. To that end, we propose two approaches for assessing the reliability of data for training LLMs to use external tools. The first approach uses intuitive, human-defined correctness criteria. The second approach uses a model-driven assessment with in-context evaluation. We conduct a thorough evaluation of data quality on two popular benchmarks, followed by an extrinsic evaluation that showcases the impact of data quality on model performance. Our results demonstrate that models trained on high-quality data outperform those trained on unvalidated data, even when trained with a smaller quantity of data. These findings empirically support the significance of assessing and ensuring the reliability of training data for tool-using LLMs.

相關內容

MoDELS

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · Networking · MoDELS · 大語言模型 · 泛函 ·

2024 年 11 月 4 日

The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units

Badr AlKhamissi,Greta Tuckute,Antoine Bosselut,Martin Schrimpf

from arxiv, Preprint

Large language models (LLMs) exhibit remarkable capabilities on not just language tasks, but also various tasks that are not linguistic in nature, such as logical reasoning and social inference. In the human brain, neuroscience has identified a core language system that selectively and causally supports language processing. We here ask whether similar specialization for language emerges in LLMs. We identify language-selective units within 18 popular LLMs, using the same localization approach that is used in neuroscience. We then establish the causal role of these units by demonstrating that ablating LLM language-selective units -- but not random units -- leads to drastic deficits in language tasks. Correspondingly, language-selective LLM units are more aligned to brain recordings from the human language system than random units. Finally, we investigate whether our localization method extends to other cognitive domains: while we find specialized networks in some LLMs for reasoning and social capabilities, there are substantial differences among models. These findings provide functional and causal evidence for specialization in large language models, and highlight parallels with the functional organization in the brain.

MoDELS · Learning · 語言模型化 · TOOLS · TEAM ·

2024 年 11 月 4 日

ACC-Debate: An Actor-Critic Approach to Multi-Agent Debate

Andrew Estornell,Jean-Francois Ton,Yuanshun Yao,Yang Liu

Large language models (LLMs) have demonstrated a remarkable ability to serve as general-purpose tools for various language-based tasks. Recent works have demonstrated that the efficacy of such models can be improved through iterative dialog between multiple models, frequently referred to as multi-agent debate (MAD). While debate shows promise as a means of improving model efficacy, most works in this area treat debate as an emergent behavior, rather than a learned behavior. In doing so, current debate frameworks rely on collaborative behaviors to have been sufficiently trained into off-the-shelf models. To address this limitation, we propose ACC-Debate, an Actor-Critic based learning framework to produce a two-agent team specialized in debate. We demonstrate that ACC-Debate outperforms SotA debate techniques on a wide array of benchmarks.

MoDELS · Weight · Tensor · 可約的 · 語言模型化 ·

2024 年 11 月 3 日

TRAWL: Tensor Reduced and Approximated Weights for Large Language Models

Yiran Luo,Het Patel,Yu Fu,Dawon Ahn,Jia Chen,Yue Dong,Evangelos E. Papalexakis

from arxiv, 4 pages. Submitted to NAACL 2025 and under review

Recent research has shown that pruning large-scale language models for inference is an effective approach to improving model efficiency, significantly reducing model weights with minimal impact on performance. Interestingly, pruning can sometimes even enhance accuracy by removing noise that accumulates during training, particularly through matrix decompositions. However, recent work has primarily focused on single matrix decompositions or lower precision techniques, which may fail to fully capture structural patterns. To address these limitations, we introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a technique that applies tensor decomposition across multiple weight matrices to effectively denoise LLMs by capturing global structural patterns. Our experiments show that TRAWL improves model performance by up to 16% over baseline models on benchmark datasets, without requiring additional data, training, or fine-tuning.

CSS · 置信度 · 標注 · 大語言模型 · 評論員 ·

2024 年 11 月 1 日

LLM Confidence Evaluation Measures in Zero-Shot CSS Classification

David Farr,Iain Cruickshank,Nico Manzonelli,Nicholas Clark,Kate Starbird,Jevin West

Assessing classification confidence is critical for leveraging large language models (LLMs) in automated labeling tasks, especially in the sensitive domains presented by Computational Social Science (CSS) tasks. In this paper, we make three key contributions: (1) we propose an uncertainty quantification (UQ) performance measure tailored for data annotation tasks, (2) we compare, for the first time, five different UQ strategies across three distinct LLMs and CSS data annotation tasks, (3) we introduce a novel UQ aggregation strategy that effectively identifies low-confidence LLM annotations and disproportionately uncovers data incorrectly labeled by the LLMs. Our results demonstrate that our proposed UQ aggregation strategy improves upon existing methods andcan be used to significantly improve human-in-the-loop data annotation processes.

MoDELS · 語言模型化 · TSE · 大語言模型 · 設計 ·

2024 年 10 月 31 日

Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning

Jinghan Zhang,Fengran Mo,Xiting Wang,Kunpeng Liu

Recent advances in large language models (LLMs) have demonstrated their potential in handling complex reasoning tasks, which are usually achieved by constructing a thought chain to guide the model to solve the problem with multi-step thinking. However, existing methods often remain confined to previously explored solution spaces and thus overlook the critical blind spot within LLMs' cognitive range. To address these issues, we design the Thought Space Explorer (TSE), a novel framework to expand and optimize thought structures to guide LLMs to explore their blind spots of thinking. By generating new reasoning steps and branches based on the original thought structure with various designed strategies, TSE broadens the thought space and alleviates the impact of blind spots for LLM reasoning. Experimental results on multiple levels of reasoning tasks demonstrate the efficacy of TSE. We also conduct extensive analysis to understand how structured and expansive thought can contribute to unleashing the potential of LLM reasoning capabilities.

MoDELS · HTTPS · 語言模型化 · 縮放 · 值域 ·

2024 年 10 月 31 日

GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages

Amir Hossein Kargaran,Fran?ois Yvon,Hinrich Schütze

from arxiv, NeurIPS 2024

The need for large text corpora has increased with the advent of pretrained language models and, in particular, the discovery of scaling laws for these models. Most available corpora have sufficient data only for languages with large dominant communities. However, there is no corpus available that (i) covers a wide range of minority languages; (ii) is generated by an open-source reproducible pipeline; and (iii) is rigorously cleaned from noise, making it trustworthy to use. We present GlotCC, a clean, document-level, 2TB general domain corpus derived from CommonCrawl, covering more than 1000 languages. We make GlotCC and the system used to generate it - including the pipeline, language identification model, and filters - available to the research community. Corpus v. 1.0 //huggingface.co/datasets/cis-lmu/GlotCC-v1, Pipeline v. 3.0 //github.com/cisnlp/GlotCC.

Agent · 優化器 · contrastive · TOOLS · 大語言模型 ·

2024 年 10 月 31 日

AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning

Shirley Wu,Shiyu Zhao,Qian Huang,Kexin Huang,Michihiro Yasunaga,Kaidi Cao,Vassilis N. Ioannidis,Karthik Subbian,Jure Leskovec,James Zou

from arxiv, NeurIPS 2024 main conference

Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and reduce hallucinations. However, developing prompting techniques that enable LLM agents to effectively use these tools and knowledge remains a heuristic and labor-intensive task. Here, we introduce AvaTaR, a novel and automated framework that optimizes an LLM agent to effectively leverage provided tools, improving performance on a given task. During optimization, we design a comparator module to iteratively deliver insightful and comprehensive prompts to the LLM agent by contrastively reasoning between positive and negative examples sampled from training data. We demonstrate AvaTaR on four complex multimodal retrieval datasets featuring textual, visual, and relational information, and three general question-answering (QA) datasets. We find AvaTaR consistently outperforms state-of-the-art approaches across all seven tasks, exhibiting strong generalization ability when applied to novel cases and achieving an average relative improvement of 14% on the Hit@1 metric for the retrieval datasets and 13% for the QA datasets. Code and dataset are available at //github.com/zou-group/avatar.

MoDELS · 語言模型化 · 約束 · 大語言模型 · 知識 (knowledge) ·

2024 年 10 月 31 日

ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models

Jio Oh,Soyeon Kim,Junseok Seo,Jindong Wang,Ruochen Xu,Xing Xie,Steven Euijong Whang

Large language models (LLMs) have achieved unprecedented performances in various applications, yet evaluating them is still challenging. Existing benchmarks are either manually constructed or are automatic, but lack the ability to evaluate the thought process of LLMs with arbitrary complexity. We contend that utilizing existing relational databases based on the entity-relationship (ER) model is a promising approach for constructing benchmarks as they contain structured knowledge that can be used to question LLMs. Unlike knowledge graphs, which are also used to evaluate LLMs, relational databases have integrity constraints that can be used to better construct complex in-depth questions and verify answers: (1) functional dependencies can be used to pinpoint critical keywords that an LLM must know to properly answer a given question containing certain attribute values; and (2) foreign key constraints can be used to join relations and construct multi-hop questions, which can be arbitrarily long and used to debug intermediate answers. We thus propose ERBench, which uses these integrity constraints to convert any database into an LLM benchmark. ERBench supports continuous evaluation as databases change, multimodal questions, and various prompt engineering techniques. In our experiments, we construct LLM benchmarks using databases of multiple domains and make an extensive comparison of contemporary LLMs. We show how ERBench can properly evaluate any LLM by not only checking for answer correctness, but also effectively verifying the rationales by looking for the right keywords.

語言模型化 · Performer · Agent · MoDELS · Learning ·

2023 年 5 月 19 日

Introspective Tips: Large Language Model for In-Context Decision Making

Liting Chen,Lu Wang,Hang Dong,Yali Du,Jie Yan,Fangkai Yang,Shuang Li,Pu Zhao,Si Qin,Saravan Rajmohan,Qingwei Lin,Dongmei Zhang

from arxiv, 22 pages, 4 figures

The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks. In this study, we employ ``Introspective Tips" to facilitate LLMs in self-optimizing their decision-making. By introspectively examining trajectories, LLM refines its policy by generating succinct and valuable tips. Our method enhances the agent's performance in both few-shot and zero-shot learning situations by considering three essential scenarios: learning from the agent's past experiences, integrating expert demonstrations, and generalizing across diverse games. Importantly, we accomplish these improvements without fine-tuning the LLM parameters; rather, we adjust the prompt to generalize insights from the three aforementioned situations. Our framework not only supports but also emphasizes the advantage of employing LLM in in-contxt decision-making. Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.

蒸餾 · BERT · 語言模型化 · Performer · 可理解性 ·

2019 年 9 月 23 日

TinyBERT: Distilling BERT for Natural Language Understanding

Xiaoqi Jiao,Yichun Yin,Lifeng Shang,Xin Jiang,Xiao Chen,Linlin Li,Fang Wang,Qun Liu

from arxiv, 13 pages, 2 figures, 9 tables

Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on some resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we firstly propose a novel transformer distillation method that is a specially designed knowledge distillation (KD) method for transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large teacher BERT can be well transferred to a small student TinyBERT. Moreover, we introduce a new two-stage learning framework for TinyBERT, which performs transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture both the general-domain and task-specific knowledge of the teacher BERT. TinyBERT is empirically effective and achieves comparable results with BERT in GLUE datasets, while being 7.5x smaller and 9.4x faster on inference. TinyBERT is also significantly better than state-of-the-art baselines, even with only about 28% parameters and 31% inference time of baselines.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

語言模型化

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<li id='7c94d'></li>

_{^{<dd id='7c94d'><tbody id='7c94d'><td id='7c94d'><optgroup id='7c94d'><strong id='7c94d'></strong></optgroup><address id='7c94d'><ul id='7c94d'></ul></address><big id='7c94d'></big></td><table id='7c94d'></table></tbody><pre id='7c94d'></pre></dd><span id='7c94d'><b id='7c94d'></b></span>}}


<dfn id='7c94d'><optgroup id='7c94d'></optgroup></dfn><tfoot id='7c94d'><bdo id='7c94d'><div id='7c94d'></div><i id='7c94d'><dt id='7c94d'></dt></i></bdo></tfoot>

_{<fieldset id='7c94d'></fieldset>}