曰本中文字幕一区二区三区高清_91资源电影网站_国语自产拍在线观看国产精品_精品国产一区二区三区四不卡在线_欧洲三级片一区二区三区_国产毛片视频一区_99RE在线精品视频播放

Large language models (LLMs) have demonstrated remarkable performance and tremendous potential across a wide range of tasks. However, deploying these models has been challenging due to the astronomical amount of model parameters, which requires a demand for large memory capacity and high memory bandwidth. In this paper, we propose an effective approach that can make the deployment of LLMs more efficiently. We support an automatic INT4 weight-only quantization flow and design a special LLM runtime with highly-optimized kernels to accelerate the LLM inference on CPUs. We demonstrate the general applicability of our approach on popular LLMs including Llama2, Llama, GPT-NeoX, and showcase the extreme inference efficiency on CPUs. The code is publicly available at: //github.com/intel/intel-extension-for-transformers.

相關內容

大語言模型

關注 56

大語言(yan)模(mo)(mo)型(xing)是基于海量(liang)文(wen)(wen)本(ben)數據訓練(lian)的(de)(de)深度(du)學習模(mo)(mo)型(xing)。它不(bu)(bu)僅能(neng)夠生成(cheng)(cheng)自然語言(yan)文(wen)(wen)本(ben)，還(huan)能(neng)夠深入(ru)理解(jie)(jie)文(wen)(wen)本(ben)含義，處理各種(zhong)自然語言(yan)任務(wu)(wu)，如文(wen)(wen)本(ben)摘(zhai)要、問答、翻譯(yi)等(deng)。2023年，大語言(yan)模(mo)(mo)型(xing)及(ji)其(qi)在(zai)人工智(zhi)(zhi)能(neng)領域的(de)(de)應用已成(cheng)(cheng)為(wei)(wei)全球科(ke)技(ji)研究的(de)(de)熱點，其(qi)在(zai)規模(mo)(mo)上的(de)(de)增長尤為(wei)(wei)引人注目，參數量(liang)已從(cong)最(zui)初的(de)(de)十幾億(yi)躍(yue)升到如今的(de)(de)一萬億(yi)。參數量(liang)的(de)(de)提升使得模(mo)(mo)型(xing)能(neng)夠更(geng)加(jia)精細地捕(bu)捉(zhuo)人類(lei)(lei)語言(yan)微妙之處，更(geng)加(jia)深入(ru)地理解(jie)(jie)人類(lei)(lei)語言(yan)的(de)(de)復雜性(xing)。在(zai)過(guo)去的(de)(de)一年里，大語言(yan)模(mo)(mo)型(xing)在(zai)吸納(na)新知識、分解(jie)(jie)復雜任務(wu)(wu)以及(ji)圖文(wen)(wen)對(dui)齊等(deng)多(duo)方面都有顯著(zhu)提升。隨著(zhu)技(ji)術(shu)的(de)(de)不(bu)(bu)斷(duan)成(cheng)(cheng)熟(shu)，它將不(bu)(bu)斷(duan)拓(tuo)展其(qi)應用范(fan)圍，為(wei)(wei)人類(lei)(lei)提供更(geng)加(jia)智(zhi)(zhi)能(neng)化和個性(xing)化的(de)(de)服(fu)務(wu)(wu)，進一步改善人們的(de)(de)生活(huo)和生產方式。

INFORMS · 知識 (knowledge) · 總回報 · Extensibility · MoDELS ·

2024 年 1 月 29 日

Corrective Retrieval Augmented Generation

Shi-Qi Yan,Jia-Chen Gu,Yun Zhu,Zhen-Hua Ling

Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval results. Besides, a decompose-then-recompose algorithm is designed for retrieved documents to selectively focus on key information and filter out irrelevant information in them. CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches. Experiments on four datasets covering short- and long-form generation tasks show that CRAG can significantly improve the performance of RAG-based approaches.

圖 · 邊 · binary · 離散化 · 周期的 ·

2024 年 1 月 28 日

Parity Games on Temporal Graphs

Pete Austin,Sougata Bose,Patrick Totzke

Temporal graphs are a popular modelling mechanism for dynamic complex systems that extend ordinary graphs with discrete time. Simply put, time progresses one unit per step and the availability of edges can change with time. We consider the complexity of solving $\omega$-regular games played on temporal graphs where the edge availability is ultimately periodic and fixed a priori. We show that solving parity games on temporal graphs is decidable in PSPACE, only assuming the edge predicate itself is in PSPACE. A matching lower bound already holds for what we call punctual reachability games on static graphs, where one player wants to reach the target at a given, binary encoded, point in time. We further study syntactic restrictions that imply more efficient procedures. In particular, if the edge predicate is in $P$ and is monotonically increasing for one player and decreasing for the other, then the complexity of solving games is only polynomially increased compared to static graphs.

多峰值 · GROUP · Integration · data integrity · INTERACT ·

2024 年 1 月 26 日

Multimodality in Group Communication Research

Robin Lange,Brooke Foucault Welles,Gyanendra Sharma,Richard J. Radke,Javier O. Garcia,Christoph Riedl

from arxiv, 27 pages, 3 figures

Team interactions are often multisensory, requiring members to pick up on verbal, visual, spatial and body language cues. Multimodal research, research that captures multiple modes of communication such as audio and visual signals, is therefore integral to understanding these multisensory group communication processes. This type of research has gained traction in biomedical engineering and neuroscience, but it is unclear the extent to which communication and management researchers conduct multimodal research. Our study finds that despite its' utility, multimodal research is underutilized in the communication and management literature's. This paper then covers introductory guidelines for creating new multimodal research including considerations for sensors, data integration and ethical considerations.

任務對話系統 · CGI · MoDELS · 大語言模型 · 知識 (knowledge) ·

2024 年 1 月 26 日

ChemDFM: Dialogue Foundation Model for Chemistry

Zihan Zhao,Da Ma,Lu Chen,Liangtai Sun,Zihao Li,Hongshen Xu,Zichen Zhu,Su Zhu,Shuai Fan,Guodong Shen,Xin Chen,Kai Yu

from arxiv, 10 pages, 12 figures, 13 tables. Under Review

Large language models (LLMs) have established great success in the general domain of natural language processing. Their emerging task generalization and free-form dialogue capabilities can greatly help to design Chemical General Intelligence (CGI) to assist real-world research in chemistry. However, the existence of specialized language and knowledge in the field of chemistry, such as the highly informative SMILES notation, hinders the performance of general-domain LLMs in chemistry. To this end, we develop ChemDFM, the first LLM towards CGI. ChemDFM-13B is trained on 34B tokens from chemical literature, textbooks, and instructions as well as various data from the general domain. Therefore, it can store, understand, and reason over chemical knowledge and languages while still possessing advanced free-form language comprehension capabilities. Extensive quantitative evaluation shows that ChemDFM can significantly outperform the representative open-sourced LLMs. Moreover, ChemDFM can also surpass GPT-4 on a great portion of chemical tasks, despite the significant size difference. Further qualitative evaluations demonstrate the efficiency and effectiveness of ChemDFM in real-world research scenarios. We will open-source the ChemDFM model soon.

MoDELS · 分離的 · 大語言模型 · 可約的 · Extensibility ·

2024 年 1 月 25 日

An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training

Youshao Xiao,Weichang Wu,Zhenglei Zhou,Fagui Mao,Shangchun Zhao,Lin Ju,Lei Liang,Xiaolu Zhang,Jun Zhou

Recently, ChatGPT or InstructGPT like large language models (LLM) has made a significant impact in the AI world. Many works have attempted to reproduce the complex InstructGPT's training pipeline, namely Reinforcement Learning with Human Feedback (RLHF). However, the mainstream distributed RLHF training methods typically adopt a fixed model placement strategy, referred to as the Flattening strategy. This strategy treats all four interdependent models involved in RLHF as a single entity, distributing them across all devices and applying parallelism techniques designed for a single model, regardless of the different workloads inherent to each model. As a result, this strategy exacerbates the generation bottlenecks in the RLHF training and degrades the overall training efficiency. To address these issues, we propose an adaptive model placement framework that offers two flexible model placement strategies. The Interleaving strategy helps reduce memory redundancy and communication costs of RLHF training by placing models without dependencies on exclusive devices with careful orchestration. On the other hand, the Separation strategy improves the throughput of model training by separating the training and inference runtime of the RLHF pipeline with additional shadow models. Furthermore, our framework provides a simple user interface and allows for the agile allocation of models across devices in a fine-grained manner for various training scenarios, involving models of varying sizes and devices of different scales. Extensive experiments have demonstrated that our Interleaving and Separation strategies can achieve notable improvements up to 11X, compared to the current SOTA approaches. The results highlight the effectiveness and adaptability of our approaches in accelerating the training of distributed RLHF.

Prompt · MoDELS · TOOLS · Continuity · INTERACT ·

2023 年 11 月 21 日

Prompting Frameworks for Large Language Models: A Survey

Xiaoxia Liu,Jingyi Wang,Jun Sun,Xiaohan Yuan,Guoliang Dong,Peng Di,Wenhai Wang,Dongxia Wang

Since the launch of ChatGPT, a powerful AI Chatbot developed by OpenAI, large language models (LLMs) have made significant advancements in both academia and industry, bringing about a fundamental engineering paradigm shift in many areas. While LLMs are powerful, it is also crucial to best use their power where "prompt'' plays a core role. However, the booming LLMs themselves, including excellent APIs like ChatGPT, have several inherent limitations: 1) temporal lag of training data, and 2) the lack of physical capabilities to perform external actions. Recently, we have observed the trend of utilizing prompt-based tools to better utilize the power of LLMs for downstream tasks, but a lack of systematic literature and standardized terminology, partly due to the rapid evolution of this field. Therefore, in this work, we survey related prompting tools and promote the concept of the "Prompting Framework" (PF), i.e. the framework for managing, simplifying, and facilitating interaction with large language models. We define the lifecycle of the PF as a hierarchical structure, from bottom to top, namely: Data Level, Base Level, Execute Level, and Service Level. We also systematically depict the overall landscape of the emerging PF field and discuss potential future research and challenges. To continuously track the developments in this area, we maintain a repository at //github.com/lxx0628/Prompting-Framework-Survey, which can be a useful resource sharing platform for both academic and industry in this field.

MoDELS · Taxonomy · 語言模型化 · 可理解性 · Performance ·

2023 年 9 月 2 日

Explainability for Large Language Models: A Survey

Haiyan Zhao,Hanjie Chen,Fan Yang,Ninghao Liu,Huiqi Deng,Hengyi Cai,Shuaiqiang Wang,Dawei Yin,Mengnan Du

Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this paper, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations, and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional machine learning models.

圖形處理器 · Weight · 學成 · 遷移學習 · Performer ·

2021 年 7 月 20 日

Adaptive Transfer Learning on Graph Neural Networks

Xueting Han,Zhenhuan Huang,Bang An,Jing Bai

Graph neural networks (GNNs) is widely used to learn a powerful representation of graph-structured data. Recent work demonstrates that transferring knowledge from self-supervised tasks to downstream tasks could further improve graph representation. However, there is an inherent gap between self-supervised tasks and downstream tasks in terms of optimization objective and training data. Conventional pre-training methods may be not effective enough on knowledge transfer since they do not make any adaptation for downstream tasks. To solve such problems, we propose a new transfer learning paradigm on GNNs which could effectively leverage self-supervised tasks as auxiliary tasks to help the target task. Our methods would adaptively select and combine different auxiliary tasks with the target task in the fine-tuning stage. We design an adaptive auxiliary loss weighting model to learn the weights of auxiliary tasks by quantifying the consistency between auxiliary tasks and the target task. In addition, we learn the weighting model through meta-learning. Our methods can be applied to various transfer learning approaches, it performs well not only in multi-task learning but also in pre-training and fine-tuning. Comprehensive experiments on multiple downstream tasks demonstrate that the proposed methods can effectively combine auxiliary tasks with the target task and significantly improve the performance compared to state-of-the-art methods.

Capsule · 圖 · INFORMS · Networking · entity ·

2020 年 12 月 16 日

Hierarchical Graph Capsule Network

Jinyu Yang,Peilin Zhao,Yu Rong,Chaochao Yan,Chunyuan Li,Hehuan Ma,Junzhou Huang

from arxiv, AAAI 2021

Graph Neural Networks (GNNs) draw their strength from explicitly modeling the topological information of structured data. However, existing GNNs suffer from limited capability in capturing the hierarchical graph representation which plays an important role in graph classification. In this paper, we innovatively propose hierarchical graph capsule network (HGCN) that can jointly learn node embeddings and extract graph hierarchies. Specifically, disentangled graph capsules are established by identifying heterogeneous factors underlying each node, such that their instantiation parameters represent different properties of the same entity. To learn the hierarchical representation, HGCN characterizes the part-whole relationship between lower-level capsules (part) and higher-level capsules (whole) by explicitly considering the structure information among the parts. Experimental studies demonstrate the effectiveness of HGCN and the contribution of each component.

圖 · MoDELS · Continuity · 圖形處理器 · 隱藏層 ·

2020 年 6 月 7 日

Principal Neighbourhood Aggregation for Graph Nets

Gabriele Corso,Luca Cavalleri,Dominique Beaini,Pietro Liò,Petar Veli?kovi?

Graph Neural Networks (GNNs) have been shown to be effective models for different predictive tasks on graph-structured data. Recent work on their expressive power has focused on isomorphism tasks and countable feature spaces. We extend this theoretical framework to include continuous features - which occur regularly in real-world input domains and within the hidden layers of GNNs - and we demonstrate the requirement for multiple aggregation functions in this context. Accordingly, we propose Principal Neighbourhood Aggregation (PNA), a novel architecture combining multiple aggregators with degree-scalers (which generalize the sum aggregator). Finally, we compare the capacity of different models to capture and exploit the graph structure via a novel benchmark containing multiple tasks taken from classical graph theory, alongside existing benchmarks from real-world domains, all of which demonstrate the strength of our model. With this work, we hope to steer some of the GNN research towards new aggregation methods which we believe are essential in the search for powerful and robust models.