亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tr id='4qzlm'><strong id='4qzlm'></strong><small id='4qzlm'></small><button id='4qzlm'></button><li id='4qzlm'><noscript id='4qzlm'><big id='4qzlm'></big><dt id='4qzlm'></dt></noscript></li></tr><ol id='4qzlm'><option id='4qzlm'><table id='4qzlm'><blockquote id='4qzlm'><tbody id='4qzlm'></tbody></blockquote></table></option></ol><u id='4qzlm'></u><kbd id='4qzlm'><kbd id='4qzlm'></kbd></kbd>

<code id='4qzlm'><strong id='4qzlm'></strong></code>

<fieldset id='4qzlm'></fieldset>

<span id='4qzlm'></span>

<ins id='4qzlm'></ins>

<acronym id='4qzlm'><em id='4qzlm'></em><td id='4qzlm'><div id='4qzlm'></div></td></acronym><address id='4qzlm'><big id='4qzlm'><big id='4qzlm'></big><legend id='4qzlm'></legend></big></address>

<i id='4qzlm'><div id='4qzlm'><ins id='4qzlm'></ins></div></i>

<i id='4qzlm'></i>

·

Performer · 縮放 · MoDELS · 語言模型化 · 知識 (knowledge) ·

2024 年 1 月 11 日

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Damai Dai,Chengqi Deng,Chenggang Zhao,R. X. Xu,Huazuo Gao,Deli Chen,Jiashi Li,Wangding Zeng,Xingkai Yu,Y. Wu,Zhenda Xie,Y. K. Li,Panpan Huang,Fuli Luo,Chong Ruan,Zhifang Sui,Wenfeng Liang

In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for managing computational costs when scaling up model parameters. However, conventional MoE architectures like GShard, which activate the top-$K$ out of $N$ experts, face challenges in ensuring expert specialization, i.e. each expert acquires non-overlapping and focused knowledge. In response, we propose the DeepSeekMoE architecture towards ultimate expert specialization. It involves two principal strategies: (1) finely segmenting the experts into $mN$ ones and activating $mK$ from them, allowing for a more flexible combination of activated experts; (2) isolating $K_s$ experts as shared ones, aiming at capturing common knowledge and mitigating redundancy in routed experts. Starting from a modest scale with 2B parameters, we demonstrate that DeepSeekMoE 2B achieves comparable performance with GShard 2.9B, which has 1.5 times the expert parameters and computation. In addition, DeepSeekMoE 2B nearly approaches the performance of its dense counterpart with the same number of total parameters, which set the upper bound of MoE models. Subsequently, we scale up DeepSeekMoE to 16B parameters and show that it achieves comparable performance with LLaMA2 7B, with only about 40% of computations. Further, our preliminary efforts to scale up DeepSeekMoE to 145B parameters consistently validate its substantial advantages over the GShard architecture, and show its performance comparable with DeepSeek 67B, using only 28.5% (maybe even 18.2%) of computations.

相關內容

Performer

Processing（編程語言） · MoDELS · 可約的 · Performer · INFORMS ·

2024 年 2 月 23 日

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

Jeong Hun Yeo,Seunghee Han,Minsu Kim,Yong Man Ro

In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements. For example, homophenes, words that share identical lip movements but produce different sounds, can be distinguished by considering the context. In this paper, we propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM), to maximize the context modeling ability by bringing the overwhelming power of LLMs. Specifically, VSP-LLM is designed to perform multi-tasks of visual speech recognition and translation, where the given instructions control the type of task. The input video is mapped to the input latent space of a LLM by employing a self-supervised visual speech model. Focused on the fact that there is redundant information in input frames, we propose a novel deduplication method that reduces the embedded visual features by employing visual speech units. Through the proposed deduplication and Low Rank Adaptors (LoRA), VSP-LLM can be trained in a computationally efficient manner. In the translation dataset, the MuAViC benchmark, we demonstrate that VSP-LLM can more effectively recognize and translate lip movements with just 15 hours of labeled data, compared to the recent translation model trained with 433 hours of labeld data.

MoDELS · CoT · 大語言模型 · 語言模型化 · Prompt ·

2024 年 2 月 23 日

Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models

Che Zhang,Zhenyang Xiao,Chengcheng Han,Yixin Lian,Yuejian Fang

Large language models (LLMs) have made significant strides in reasoning capabilities, with ongoing efforts to refine their reasoning through self-correction. However, recent studies suggest that self-correction can be limited or even counterproductive without external accurate knowledge, raising questions about the limits and effectiveness of self-correction. In this paper, we aim to enhance LLM's self-checking capabilities by meticulously designing training data, thereby improving the accuracy of self-correction. We conduct a detailed analysis of error types in mathematical reasoning and develop a tailored prompt, termed "Step CoT Check". Then we construct a checking-correction dataset for training models. After integrating the original CoT data and checking-correction data for training, we observe that models could improve their self-checking capabilities, thereby enhancing their self-correction capacity and eliminating the need for external feedback or ground truth labels to ascertain the endpoint of correction. We compare the performance of models fine-tuned with the "Step CoT Check" prompt against those refined using other promps within the context of checking-correction data. The "Step CoT Check" outperforms the other two check formats in model with lager parameters, providing more precise feedback thus achieving a higher rate of correctness. For reproducibility, all the datasets and codes are provided in //github.com/bammt/Learn-to-check.

MoDELS · Learning · 可理解性 · 組合性 · 置信度 ·

2024 年 2 月 22 日

ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

Maitreya Patel,Tejas Gokhale,Chitta Baral,Yezhou Yang

from arxiv, Accepted at AAAI'24 | Project page: //conceptbed.github.io

The ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision. Recent advances in text-to-image (T2I) models have lead to high definition and realistic image quality generation by learning from large databases of images and their descriptions. However, the evaluation of T2I models has focused on photorealism and limited qualitative measures of visual understanding. To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a.k.a. personalized T2I), we introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts, and 33K composite text prompts. Along with the dataset, we propose an evaluation metric, Concept Confidence Deviation (CCD), that uses the confidence of oracle concept classifiers to measure the alignment between concepts generated by T2I generators and concepts contained in target images. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our human study shows that CCD is highly correlated with human understanding of concepts. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome. The data, code, and interactive demo is available at: //conceptbed.github.io/

大語言模型 · MoDELS · Nuance · Performer · 線性的 ·

2024 年 2 月 22 日

CriticBench: Benchmarking LLMs for Critique-Correct Reasoning

Zicheng Lin,Zhibin Gou,Tian Liang,Ruilin Luo,Haowei Liu,Yujiu Yang

The ability of Large Language Models (LLMs) to critique and refine their reasoning is crucial for their application in evaluation, feedback provision, and self-improvement. This paper introduces CriticBench, a comprehensive benchmark designed to assess LLMs' abilities to critique and rectify their reasoning across a variety of tasks. CriticBench encompasses five reasoning domains: mathematical, commonsense, symbolic, coding, and algorithmic. It compiles 15 datasets and incorporates responses from three LLM families. Utilizing CriticBench, we evaluate and dissect the performance of 17 LLMs in generation, critique, and correction reasoning, i.e., GQC reasoning. Our findings reveal: (1) a linear relationship in GQC capabilities, with critique-focused training markedly enhancing performance; (2) a task-dependent variation in correction effectiveness, with logic-oriented tasks being more amenable to correction; (3) GQC knowledge inconsistencies that decrease as model size increases; and (4) an intriguing inter-model critiquing dynamic, where stronger models are better at critiquing weaker ones, while weaker models can surprisingly surpass stronger ones in their self-critique. We hope these insights into the nuanced critique-correct reasoning of LLMs will foster further research in LLM critique and self-improvement.

TOOLS · 回合 · Agent · Middleware · 大語言模型 ·

2024 年 2 月 22 日

Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments

Yu Gu,Yiheng Shu,Hao Yu,Xiao Liu,Yuxiao Dong,Jie Tang,Jayanth Srinivasa,Hugo Latapie,Yu Su

from arxiv, 16 pages, 8 figures, 4 tables

The applications of large language models (LLMs) have expanded well beyond the confines of text processing, signaling a new era where LLMs are envisioned as generalist language agents capable of operating within complex real-world environments. These environments are often highly expansive, making it impossible for the LLM to process them within its short-term memory. Motivated by recent research on extending the capabilities of LLMs with tools, this paper investigates the intriguing potential of tools to augment LLMs in handling such complexity. To this end, we design customized tools to aid in the proactive exploration within these massive environments. Such tools can serve as a middleware layer shielding the LLM from environmental complexity. In two representative complex environments -- knowledge bases (KBs) and databases -- we demonstrate the significant potential of augmenting language agents with tools in complex environments. Notably, equipped with these tools, GPT-4 achieves 2.8X the performance of the best baseline in tasks requiring access to database content and 2.2X in KB tasks. Our findings illuminate the path for advancing language agents in complex real-world applications.

語言模型化 · API · 泛函 · 大語言模型 · MoDELS ·

2024 年 2 月 22 日

Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning

Yinger Zhang,Hui Cai,Xeirui Song,Yicheng Chen,Rui Sun,Jing Zheng

While enabling large language models to implement function calling (known as APIs) can greatly enhance the performance of Large Language Models (LLMs), function calling is still a challenging task due to the complicated relations between different APIs, especially in a context-learning setting without fine-tuning. This paper introduces ``Reverse Chain'', a controllable, target-driven approach designed to empower LLMs with the capability to operate external APIs only via prompts. Recognizing that most LLMs have limited tool-use capabilities, Reverse Chain limits LLMs to executing simple tasks, e.g., API Selection and Argument Completion. Furthermore, to manage a controllable multi-function calling, Reverse Chain adopts a generic rule based on a backward reasoning process. This rule determines when to do API selection or Argument completion. To evaluate the multi-tool-use capability of LLMs, we have released a compositional multi-tool task dataset, available at \url{//anonymous.4open.science/r/reverse-chain-8681}. Extensive numerical experiments validate the remarkable proficiency of Reverse Chain in managing multiple API calls.

詞元分析器 · Attention · MoDELS · Stable Diffusion · CC ·

2024 年 2 月 21 日

ToDo: Token Downsampling for Efficient Generation of High-Resolution Images

Ethan Smith,Nayan Saxena,Aninda Saha

Attention mechanism has been crucial for image diffusion models, however, their quadratic computational complexity limits the sizes of images we can process within reasonable time and memory constraints. This paper investigates the importance of dense attention in generative image models, which often contain redundant features, making them suitable for sparser attention mechanisms. We propose a novel training-free method ToDo that relies on token downsampling of key and value tokens to accelerate Stable Diffusion inference by up to 2x for common sizes and up to 4.5x or more for high resolutions like 2048x2048. We demonstrate that our approach outperforms previous methods in balancing efficient throughput and fidelity.

MoDELS · 評論員 · 語言模型化 · 可約的 · Continuity ·

2023 年 12 月 19 日

Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment

Lingling Xu,Haoran Xie,Si-Zhao Joe Qin,Xiaohui Tao,Fu Lee Wang

from arxiv, 20 pages, 4 figures

With the continuous growth in the number of parameters of transformer-based pretrained language models (PLMs), particularly the emergence of large language models (LLMs) with billions of parameters, many natural language processing (NLP) tasks have demonstrated remarkable success. However, the enormous size and computational demands of these models pose significant challenges for adapting them to specific downstream tasks, especially in environments with limited computational resources. Parameter Efficient Fine-Tuning (PEFT) offers an effective solution by reducing the number of fine-tuning parameters and memory usage while achieving comparable performance to full fine-tuning. The demands for fine-tuning PLMs, especially LLMs, have led to a surge in the development of PEFT methods, as depicted in Fig. 1. In this paper, we present a comprehensive and systematic review of PEFT methods for PLMs. We summarize these PEFT methods, discuss their applications, and outline future directions. Furthermore, we conduct experiments using several representative PEFT methods to better understand their effectiveness in parameter efficiency and memory efficiency. By offering insights into the latest advancements and practical applications, this survey serves as an invaluable resource for researchers and practitioners seeking to navigate the challenges and opportunities presented by PEFT in the context of PLMs.

語言模型化 · Performer · Agent · MoDELS · Learning ·

2023 年 5 月 19 日

Introspective Tips: Large Language Model for In-Context Decision Making

Liting Chen,Lu Wang,Hang Dong,Yali Du,Jie Yan,Fangkai Yang,Shuang Li,Pu Zhao,Si Qin,Saravan Rajmohan,Qingwei Lin,Dongmei Zhang

from arxiv, 22 pages, 4 figures

The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks. In this study, we employ ``Introspective Tips" to facilitate LLMs in self-optimizing their decision-making. By introspectively examining trajectories, LLM refines its policy by generating succinct and valuable tips. Our method enhances the agent's performance in both few-shot and zero-shot learning situations by considering three essential scenarios: learning from the agent's past experiences, integrating expert demonstrations, and generalizing across diverse games. Importantly, we accomplish these improvements without fine-tuning the LLM parameters; rather, we adjust the prompt to generalize insights from the three aforementioned situations. Our framework not only supports but also emphasizes the advantage of employing LLM in in-contxt decision-making. Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.

任務對話系統 · INFORMS · 圖 · Networking · entity ·

2020 年 8 月 11 日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Xiaoze Jiang,Siyi Du,Zengchang Qin,Yajing Sun,Jing Yu

from arxiv, Accepted by the 28th ACM International Conference on Multimedia (ACM MM 2020)

Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts. Classical approaches pay more attention to the integration of the current question, vision knowledge and text knowledge, despising the heterogeneous semantic gaps between the cross-modal information. In the meantime, the concatenation operation has become de-facto standard to the cross-modal information fusion, which has a limited ability in information retrieval. In this paper, we propose a novel Knowledge-Bridge Graph Network (KBGN) model by using graph to bridge the cross-modal semantic relations between vision and text knowledge in fine granularity, as well as retrieving required knowledge via an adaptive information selection mode. Moreover, the reasoning clues for visual dialogue can be clearly drawn from intra-modal entities and inter-modal bridges. Experimental results on VisDial v1.0 and VisDial-Q datasets demonstrate that our model outperforms exiting models with state-of-the-art results.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

語言模型化

知識 (knowledge)

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tr id='GoDx7'><strong id='EMRCQ'></strong><small id='3Ubqp'></small><button id='FRohZ'></button><li id='Yqed8'><noscript id='HaNEe'><big id='zLqU8'></big><dt id='YJrp6'></dt></noscript></li></tr><ol id='lyrcM'><option id='NaphL'><table id='Cezpf'><blockquote id='qjzAz'><tbody id='D9myq'></tbody></blockquote></table></option></ol><u id='rDqtA'></u><kbd id='6z9mN'><kbd id='A9cBu'></kbd></kbd>

<code id='QsdU3'><strong id='u8e3p'></strong></code>

<fieldset id='RMTRx'></fieldset>

<span id='nxc6X'></span>

<ins id='reP4v'></ins>

<acronym id='cTep0'><em id='eM2tZ'></em><td id='68Abi'><div id='sfbpx'></div></td></acronym><address id='ECtEI'><big id='C1G8P'><big id='P84xv'></big><legend id='QmwbS'></legend></big></address>

<i id='v8ZxW'><div id='OfA1m'><ins id='bqJ1X'></ins></div></i>

<i id='G4obd'></i>