亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<li id='HvSDB'></li>

_{^{<dd id='ebrnB'><tbody id='L7n9e'><td id='HjMYs'><optgroup id='WqAPm'><strong id='bAL5O'></strong></optgroup><address id='GYqMi'><ul id='YPdMP'></ul></address><big id='Atoep'></big></td><table id='Xo8Yp'></table></tbody><pre id='nmwB4'></pre></dd><span id='xGI42'><b id='OAwJ5'></b></span>}}


<dfn id='1mlto'><optgroup id='I5rwZ'></optgroup></dfn><tfoot id='1VW0O'><bdo id='DaKwU'><div id='DJA4m'></div><i id='32sLD'><dt id='4Q0YD'></dt></i></bdo></tfoot>

_{<fieldset id='oMXgc'></fieldset>}

·

簇 · MoDELS · 語言模型化 · 大語言模型 · GPUs ·

2024 年 5 月 28 日

Pipette: Automatic Fine-grained Large Language Model Training Configurator for Real-World Clusters

Jinkyu Yim,Jaeyong Song,Yerim Choi,Jaebeen Lee,Jaewon Jung,Hongsun Jang,Jinho Lee

from arxiv, published at DATE 2024

Training large language models (LLMs) is known to be challenging because of the huge computational and memory capacity requirements. To address these issues, it is common to use a cluster of GPUs with 3D parallelism, which splits a model along the data batch, pipeline stage, and intra-layer tensor dimensions. However, the use of 3D parallelism produces the additional challenge of finding the optimal number of ways on each dimension and mapping the split models onto the GPUs. Several previous studies have attempted to automatically find the optimal configuration, but many of these lacked several important aspects. For instance, the heterogeneous nature of the interconnect speeds is often ignored. While the peak bandwidths for the interconnects are usually made equal, the actual attained bandwidth varies per link in real-world clusters. Combined with the critical path modeling that does not properly consider the communication, they easily fall into sub-optimal configurations. In addition, they often fail to consider the memory requirement per GPU, often recommending solutions that could not be executed. To address these challenges, we propose Pipette, which is an automatic fine-grained LLM training configurator for real-world clusters. By devising better performance models along with the memory estimator and fine-grained individual GPU assignment, Pipette achieves faster configurations that satisfy the memory constraints. We evaluated Pipette on large clusters to show that it provides a significant speedup over the prior art. The implementation of Pipette is available at //github.com/yimjinkyu1/date2024_pipette.

相關內容

MoDELS · 控制器 · Performer · 大語言模型 · Learning ·

2024 年 7 月 9 日

ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization

Wai Man Si,Michael Backes,Yang Zhang

In-context learning (ICL) is a recent advancement in the capabilities of large language models (LLMs). This feature allows users to perform a new task without updating the model. Concretely, users can address tasks during the inference time by conditioning on a few input-label pair demonstrations along with the test input. It is different than the conventional fine-tuning paradigm and offers more flexibility. However, this capability also introduces potential issues. For example, users may use the model on any data without restriction, such as performing tasks with improper or sensitive content, which might violate the model policy or conflict with the model owner's interests. As a model owner, it is crucial to establish a mechanism to control the model's behavior under ICL, depending on the model owner's requirements for various content. To this end, we introduce the concept of "applicability authorization" tailored for LLMs, particularly for ICL behavior, and propose a simple approach, ICLGuard. It is a fine-tuning framework designed to allow the model owner to regulate ICL behavior on different data. ICLGuard preserves the original LLM and fine-tunes only a minimal set of additional trainable parameters to "guard" the LLM. Empirical results show that the guarded LLM can deactivate its ICL ability on target data without affecting its ICL ability on other data and its general functionality across all data.

Performer · 數據選擇 · 大語言模型 · MoDELS · AIM ·

2024 年 7 月 9 日

Entropy Law: The Story Behind Data Compression and LLM Performance

Mingjia Yin,Chuhan Wu,Yufei Wang,Hao Wang,Wei Guo,Yasheng Wang,Yong Liu,Ruiming Tang,Defu Lian,Enhong Chen

Data is the cornerstone of large language models (LLMs), but not all data is useful for model learning. Carefully selected data can better elicit the capabilities of LLMs with much less computational overhead. Most methods concentrate on evaluating the quality of individual samples in data selection, while the combinatorial effects among samples are neglected. Even if each sample is of perfect quality, their combinations may be suboptimal in teaching LLMs due to their intrinsic homogeneity or contradiction. In this paper, we aim to uncover the underlying relationships between LLM performance and data selection. Inspired by the information compression nature of LLMs, we uncover an ``entropy law'' that connects LLM performance with data compression ratio and first-epoch training loss, which reflect the information redundancy of a dataset and the mastery of inherent knowledge encoded in this dataset, respectively. Through both theoretical deduction and empirical evaluation, we find that model performance is negatively correlated to the compression ratio of training data, which usually yields a lower training loss. Based on the findings of the entropy law, we propose a quite efficient and universal data selection method named \textbf{ZIP} for training LLMs, which aim to prioritize data subsets exhibiting a low compression ratio. Based on a multi-stage algorithm that selects diverse data in a greedy manner, we can obtain a good data subset with satisfactory diversity. Extensive experiments have been conducted to validate the entropy law and the superiority of ZIP across different LLM backbones and alignment stages. We also present an interesting application of entropy law that can detect potential performance risks at the beginning of model training.

Agent · INTERACT · 大語言模型 · Notability · INFORMS ·

2024 年 7 月 9 日

FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making

Yangyang Yu,Zhiyuan Yao,Haohang Li,Zhiyang Deng,Yupeng Cao,Zhi Chen,Jordan W. Suchow,Rong Liu,Zhenyu Cui,Denghui Zhang,Zhaozhuo Xu,Koduvayur Subbalakshmi,Guojun Xiong,Yueru He,Jimin Huang,Dong Li,Qianqian Xie

from arxiv, LLM Applications, LLM Agents, Financial Technology, Quantitative Finance, Algorithmic Trading, Cognitive Science

Large language models (LLMs) have demonstrated notable potential in conducting complex tasks and are increasingly utilized in various financial applications. However, high-quality sequential financial investment decision-making remains challenging. These tasks require multiple interactions with a volatile environment for every decision, demanding sufficient intelligence to maximize returns and manage risks. Although LLMs have been used to develop agent systems that surpass human teams and yield impressive investment returns, opportunities to enhance multi-sourced information synthesis and optimize decision-making outcomes through timely experience refinement remain unexplored. Here, we introduce the FinCon, an LLM-based multi-agent framework with CONceptual verbal reinforcement tailored for diverse FINancial tasks. Inspired by effective real-world investment firm organizational structures, FinCon utilizes a manager-analyst communication hierarchy. This structure allows for synchronized cross-functional agent collaboration towards unified goals through natural language interactions and equips each agent with greater memory capacity than humans. Additionally, a risk-control component in FinCon enhances decision quality by episodically initiating a self-critiquing mechanism to update systematic investment beliefs. The conceptualized beliefs serve as verbal reinforcement for the future agent's behavior and can be selectively propagated to the appropriate node that requires knowledge updates. This feature significantly improves performance while reducing unnecessary peer-to-peer communication costs. Moreover, FinCon demonstrates strong generalization capabilities in various financial tasks, including single stock trading and portfolio management.

MoDELS · 語言模型化 · AIM · Continuity · Performance ·

2024 年 7 月 8 日

Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian

Tommaso Mario Buonocore,Simone Rancati,Enea Parimbelli

from arxiv, 6 pages, 1 figure, 3 tables

The development of domain-specific language models has significantly advanced natural language processing applications in various specialized fields, particularly in biomedicine. However, the focus has largely been on English-language models, leaving a gap for less-resourced languages such as Italian. This paper introduces Igea, the first decoder-only language model designed explicitly for biomedical text generation in Italian. Built on the Minerva model and continually pretrained on a diverse corpus of Italian medical texts, Igea is available in three model sizes: 350 million, 1 billion, and 3 billion parameters. The models aim to balance computational efficiency and performance, addressing the challenges of managing the peculiarities of medical terminology in Italian. We evaluate Igea using a mix of in-domain biomedical corpora and general-purpose benchmarks, highlighting its efficacy and retention of general knowledge even after the domain-specific training. This paper discusses the model's development and evaluation, providing a foundation for future advancements in Italian biomedical NLP.

多樣性 · 大語言模型 · 數據集 · 蒸餾 · MoDELS ·

2024 年 7 月 8 日

SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation

Abhishek Divekar,Greg Durrett

from arxiv, Code available at //github.com/amazon-science/synthesizrr

It is often desirable to distill the capabilities of large language models (LLMs) into smaller student models due to compute and memory constraints. One way to do this for classification tasks is via dataset synthesis, which can be accomplished by generating examples of each label from the LLM. Prior approaches to synthesis use few-shot prompting, which relies on the LLM's parametric knowledge to generate usable examples. However, this leads to issues of repetition, bias towards popular entities, and stylistic differences from human text. In this work, we propose Synthesize by Retrieval and Refinement (SynthesizRR), which uses retrieval augmentation to introduce variety into the dataset synthesis process: as retrieved passages vary, the LLM is seeded with different content to generate its examples. We empirically study the synthesis of six datasets, covering topic classification, sentiment analysis, tone detection, and humor, requiring complex synthesis strategies. We find that SynthesizRR greatly improves lexical and semantic diversity, similarity to human-written text, and distillation performance, when compared to 32-shot prompting and four prior approaches. We release our extensive codebase at //github.com/amazon-science/synthesizrr

剪枝 · 可約的 · MoDELS · 語言模型化 · 大語言模型 ·

2024 年 7 月 8 日

Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations

Bowen Shen,Zheng Lin,Daren Zha,Wei Liu,Jian Luan,Bin Wang,Weiping Wang

from arxiv, Findings of ACL 2024

Structured pruning fundamentally reduces computational and memory overheads of large language models (LLMs) and offers a feasible solution for end-side LLM deployment. Structurally pruned models remain dense and high-precision, highly compatible with further tuning and compression. However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a high compression ratio for scaled-up LLMs remains a challenge. In this paper, we introduce a task-agnostic structured pruning approach coupled with a compact Transformer architecture design. The proposed approach, named TransAct, reduces transitional activations inside multi-head attention (MHA) and multi-layer perceptron (MLP) modules, while preserving the inter-module activations that are sensitive to perturbations. Hence, the LLM is pruned into an intra-module low-rank architecture, significantly reducing weights, KV Cache and attention computation. TransAct is implemented on the LLaMA model and evaluated on downstream benchmarks. Results verify the optimality of our approach at high compression with respect to both efficiency and performance. Further, ablation studies reveal the strength of activation-guided iterative pruning and provide experimental analysis on the redundancy of MHA and MLP modules.

知識 (knowledge) · 圖 · Integration · 知識圖譜 · MoDELS ·

2024 年 7 月 7 日

EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs

Zixuan Dong,Baoyun Peng,Yufei Wang,Jia Fu,Xiaodong Wang,Yongxue Shan,Xin Zhou

from arxiv, 10 pages, 4 figures, 3 tables

While large language models (LLMs) have shown remarkable capabilities in natural language processing, they struggle with complex, multi-step reasoning tasks involving knowledge graphs (KGs). Existing approaches that integrate LLMs and KGs either underutilize the reasoning abilities of LLMs or suffer from prohibitive computational costs due to tight coupling. To address these limitations, we propose a novel collaborative framework named EffiQA that can strike a balance between performance and efficiency via an iterative paradigm. EffiQA consists of three stages: global planning, efficient KG exploration, and self-reflection. Specifically, EffiQA leverages the commonsense capability of LLMs to explore potential reasoning pathways through global planning. Then, it offloads semantic pruning to a small plug-in model for efficient KG exploration. Finally, the exploration results are fed to LLMs for self-reflection to further improve the global planning and efficient KG exploration. Empirical evidence on multiple KBQA benchmarks shows EffiQA's effectiveness, achieving an optimal balance between reasoning accuracy and computational costs. We hope the proposed new framework will pave the way for efficient, knowledge-intensive querying by redefining the integration of LLMs and KGs, fostering future research on knowledge-based question answering.

語言模型化 · Performer · Agent · MoDELS · Learning ·

2023 年 5 月 19 日

Introspective Tips: Large Language Model for In-Context Decision Making

Liting Chen,Lu Wang,Hang Dong,Yali Du,Jie Yan,Fangkai Yang,Shuang Li,Pu Zhao,Si Qin,Saravan Rajmohan,Qingwei Lin,Dongmei Zhang

from arxiv, 22 pages, 4 figures

The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks. In this study, we employ ``Introspective Tips" to facilitate LLMs in self-optimizing their decision-making. By introspectively examining trajectories, LLM refines its policy by generating succinct and valuable tips. Our method enhances the agent's performance in both few-shot and zero-shot learning situations by considering three essential scenarios: learning from the agent's past experiences, integrating expert demonstrations, and generalizing across diverse games. Importantly, we accomplish these improvements without fine-tuning the LLM parameters; rather, we adjust the prompt to generalize insights from the three aforementioned situations. Our framework not only supports but also emphasizes the advantage of employing LLM in in-contxt decision-making. Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.

任務對話系統 · INFORMS · 圖 · Networking · entity ·

2020 年 8 月 11 日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Xiaoze Jiang,Siyi Du,Zengchang Qin,Yajing Sun,Jing Yu

from arxiv, Accepted by the 28th ACM International Conference on Multimedia (ACM MM 2020)

Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts. Classical approaches pay more attention to the integration of the current question, vision knowledge and text knowledge, despising the heterogeneous semantic gaps between the cross-modal information. In the meantime, the concatenation operation has become de-facto standard to the cross-modal information fusion, which has a limited ability in information retrieval. In this paper, we propose a novel Knowledge-Bridge Graph Network (KBGN) model by using graph to bridge the cross-modal semantic relations between vision and text knowledge in fine granularity, as well as retrieving required knowledge via an adaptive information selection mode. Moreover, the reasoning clues for visual dialogue can be clearly drawn from intra-modal entities and inter-modal bridges. Experimental results on VisDial v1.0 and VisDial-Q datasets demonstrate that our model outperforms exiting models with state-of-the-art results.

Processing（編程語言） · MoDELS · NLP · Taxonomy · 語言表示 ·

2020 年 3 月 18 日

Pre-trained Models for Natural Language Processing: A Survey

Xipeng Qiu,Tianxiang Sun,Yige Xu,Yunfan Shao,Ning Dai,Xuanjing Huang

from arxiv, Invited Review of Science China Technological Sciences

Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy with four perspectives. Next, we describe how to adapt the knowledge of PTMs to the downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

語言模型化

大語言模型

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<dir id='yyOyC'><del id='OqtaC'><del id='vZc0p'></del><pre id='x4K7Y'><pre id='fmBsJ'><option id='3SlEL'><address id='HXfpj'></address><bdo id='qTkKI'><tr id='q4fj1'><acronym id='FzKAE'><pre id='LjLBc'></pre></acronym><div id='m7vZK'></div></tr></bdo></option></pre><small id='7eYz5'><address id='HTX19'><u id='OT0yH'><legend id='mhsCd'><option id='CiVaa'><abbr id='ylOm7'></abbr><li id='uyuaM'><pre id='lfPot'></pre></li></option></legend><select id='VMUBZ'></select></u></address></small></pre></del><sup id='3jn41'></sup><blockquote id='fN4Vi'><dt id='XtXyF'></dt></blockquote><blockquote id='vfK4O'></blockquote></dir><tt id='qy0Vv'></tt><u id='gBejr'><tt id='Zytki'><form id='o9Uv2'></form></tt><td id='REJ3Z'><dt id='3rrS8'></dt></td></u>

<code id='CedS3'><i id='SJmio'><q id='98qSx'><legend id='AWSFb'><pre id='aanc7'><style id='DxTN3'><acronym id='kvMRB'><i id='2gvmq'><form id='776ia'><option id='67MA8'><center id='1HdTb'></center></option></form></i></acronym></style><tt id='C2aWy'></tt></pre></legend></q></i></code><center id='vZJpe'></center>

<dd id='sjcBC'></dd>

<style id='VHUTi'></style><sub id='I2mY3'><dfn id='qTUQp'><abbr id='g9akz'><big id='ELYZA'><bdo id='rJku5'></bdo></big></abbr></dfn></sub>_{<dir id='0R24g'></dir>}