嘘嘘中国免费观看网站_欧美亚洲视频一区二区三区老牛_香蕉国产精品偷在线_香蕉网久久综合影院_国产91不卡精品视频_狂撞无码人妻在线播放视频_天堂AV国产AV在线AV

Large Language Models (LLMs) and Multimodal Large language models (MLLMs) have taken the world by storm with impressive abilities in complex reasoning and linguistic comprehension. Meanwhile there are plethora of works related to Vietnamese Large Language Models, the lack of high-quality resources in multimodality limits the progress of Vietnamese MLLMs. In this paper, we pioneer in address this by introducing LaVy, a state-of-the-art Vietnamese MLLM, and we also introduce LaVy-Bench benchmark designated for evaluating MLLMs's understanding on Vietnamese visual language tasks. All code and model weights are public at //github.com/baochi0212/LaVy

相關內容

多峰值(zhi)

關注 2

語言模型化 · Performer · BASIC · MoDELS · 大語言模型 ·

2024 年 5 月 21 日

CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model

Yang Lei,Jiangtong Li,Dawei Cheng,Zhijun Ding,Changjun Jiang

from arxiv, 12 pages, 4 figures

Large language models (LLMs) have demonstrated great potential in the financial domain. Thus, it becomes important to assess the performance of LLMs in the financial tasks. In this work, we introduce CFBenchmark, to evaluate the performance of LLMs for Chinese financial assistant. The basic version of CFBenchmark is designed to evaluate the basic ability in Chinese financial text processing from three aspects~(\emph{i.e.} recognition, classification, and generation) including eight tasks, and includes financial texts ranging in length from 50 to over 1,800 characters. We conduct experiments on several LLMs available in the literature with CFBenchmark-Basic, and the experimental results indicate that while some LLMs show outstanding performance in specific tasks, overall, there is still significant room for improvement in basic tasks of financial text processing with existing models. In the future, we plan to explore the advanced version of CFBenchmark, aiming to further explore the extensive capabilities of language models in more profound dimensions as a financial assistant in Chinese. Our codes are released at //github.com/TongjiFinLab/CFBenchmark.

Chiplet · MoDELS · Performer · 語言模型化 · Continuity ·

2024 年 5 月 20 日

Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models

Huwan Peng,Scott Davidson,Richard Shi,Shuaiwen Leon Song,Michael Taylor

Large language models (LLMs) such as OpenAI's ChatGPT and Google's Gemini have demonstrated unprecedented capabilities of autoregressive AI models across multiple tasks triggering disruptive technology innovations around the world. However, as models continue to grow the cost to serve these models also continues to grow threatening the democratization of LLMs. To address this issue, we propose Chiplet Cloud, a chiplet-based ASIC LLM-supercomputer architecture whose goal is to optimize the total cost of ownership (TCO) per generated token. This architecture is a highly parameterizable ASIC and server-level architecture leveraging thousands of replicated accelerator modules collaborating to scale-up the performance of LLMs at cloud-scale. To determine specific parameterizations of the Chiplet Cloud architecture, we implemented a two-phase hardware-software co-design methodology that can search the massive design space and fine tune the architecture across a collection of LLMs based on an accurate inference simulation. A common bottleneck for LLMs is the memory access performance therefore we introduce CC-MEM, a scalable on-chip memory system for Chiplet Cloud architectures. Using the CC-MEM, Chiplet Clouds can be built using only SRAMs for design points where the power and performance of memory access is critical. The CC-MEM also includes a compression decoder module to add support for sparse models without impacting the compute units using a Store-as-Compressed, Load-as-Dense mechanism. We evaluate Chiplet Cloud architectures across eight popular LLMs. Using fine tuned Chiplet Cloud servers we are able to achieve $97\times$ and $18\times$ improvement in TCO/Token over rented GPU and TPU clouds, or a $8.3\times$ and $3.7\times$ improvement over fabricated GPU and TPU clouds respectively. Chiplet Cloud can also support $1.7\times$ larger models with a sparsity of 60\%.

MoDELS · 多峰值 · 縮放 · state-of-the-art · 設計 ·

2024 年 5 月 20 日

Imp: Highly Capable Large Multimodal Models for Mobile Devices

Zhenwei Shao,Zhou Yu,Jun Yu,Xuecheng Ouyang,Lihao Zheng,Zhenbiao Gai,Mingyang Wang,Jiajun Ding

from arxiv, 19 pages, 6 figures

By harnessing the capabilities of large language models (LLMs), recent large multimodal models (LMMs) have shown remarkable versatility in open-world multimodal understanding. Nevertheless, they are usually parameter-heavy and computation-intensive, thus hindering their applicability in resource-constrained scenarios. To this end, several lightweight LMMs have been proposed successively to maximize the capabilities under constrained scale (e.g., 3B). Despite the encouraging results achieved by these methods, most of them only focus on one or two aspects of the design space, and the key design choices that influence model capability have not yet been thoroughly investigated. In this paper, we conduct a systematic study for lightweight LMMs from the aspects of model architecture, training strategy, and training data. Based on our findings, we obtain Imp -- a family of highly capable LMMs at the 2B-4B scales. Notably, our Imp-3B model steadily outperforms all the existing lightweight LMMs of similar size, and even surpasses the state-of-the-art LMMs at the 13B scale. With low-bit quantization and resolution reduction techniques, our Imp model can be deployed on a Qualcomm Snapdragon 8Gen3 mobile chip with a high inference speed of about 13 tokens/s.

MoDELS · 語言模型化 · 大語言模型 · 數據集 · Performer ·

2024 年 5 月 17 日

FOLIO: Natural Language Reasoning with First-Order Logic

Simeng Han,Hailey Schoelkopf,Yilun Zhao,Zhenting Qi,Martin Riddell,Wenfei Zhou,James Coady,David Peng,Yujie Qiao,Luke Benson,Lucy Sun,Alex Wardle-Solano,Hannah Szabo,Ekaterina Zubova,Matthew Burtell,Jonathan Fan,Yixin Liu,Brian Wong,Malcolm Sailor,Ansong Ni,Linyong Nan,Jungo Kasai,Tao Yu,Rui Zhang,Alexander R. Fabbri,Wojciech Kryscinski,Semih Yavuz,Ye Liu,Xi Victoria Lin,Shafiq Joty,Yingbo Zhou,Caiming Xiong,Rex Ying,Arman Cohan,Dragomir Radev

Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FOLIO consists of 1,430 examples (unique conclusions), each paired with one of 487 sets of premises used to deductively reason for the validity of each conclusion. The logical correctness of the premises and conclusions is ensured by their FOL annotations, which are automatically verified by an FOL inference engine. In addition to the main NL reasoning task, NL-FOL pairs in FOLIO constitute a new NL-FOL translation dataset. Our experiments on FOLIO systematically evaluate the FOL reasoning ability of supervised fine-tuning on medium-sized language models. For both NL reasoning and NL-FOL translation, we benchmark multiple state-of-the-art language models. Our results show that a subset of FOLIO presents a challenge for one of the most capable {Large Language Model (LLM)} publicly available, GPT-4.

語言模型化 · 大語言模型 · MoDELS · Performer · 數據集 ·

2024 年 5 月 16 日

LFED: A Literary Fiction Evaluation Dataset for Large Language Models

Linhao Yu,Qun Liu,Deyi Xiong

The rapid evolution of large language models (LLMs) has ushered in the need for comprehensive assessments of their performance across various dimensions. In this paper, we propose LFED, a Literary Fiction Evaluation Dataset, which aims to evaluate the capability of LLMs on the long fiction comprehension and reasoning. We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries. We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions. Additionally, we conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations. Through a series of experiments with various state-of-the-art LLMs, we demonstrate that these models face considerable challenges in effectively addressing questions related to literary fictions, with ChatGPT reaching only 57.08% under the zero-shot setting. The dataset will be publicly available at //github.com/tjunlp-lab/LFED.git

可約的 · 知識 (knowledge) · MoDELS · INFORMS · 信息先驗 ·

2024 年 5 月 15 日

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Nathaniel Li,Alexander Pan,Anjali Gopal,Summer Yue,Daniel Berrios,Alice Gatti,Justin D. Li,Ann-Kathrin Dombrowski,Shashwat Goel,Long Phan,Gabriel Mukobi,Nathan Helm-Burger,Rassin Lababidi,Lennart Justen,Andrew B. Liu,Michael Chen,Isabelle Barrass,Oliver Zhang,Xiaoyuan Zhu,Rishub Tamirisa,Bhrugu Bharathi,Adam Khoja,Zhenqi Zhao,Ariel Herbert-Voss,Cort B. Breuer,Samuel Marks,Oam Patel,Andy Zou,Mantas Mazeika,Zifan Wang,Palash Oswal,Weiran Lin,Adam A. Hunt,Justin Tienken-Harder,Kevin Y. Shih,Kemper Talley,John Guan,Russell Kaplan,Ian Steneker,David Campbell,Brad Jokubaitis,Alex Levinson,Jean Wang,William Qian,Kallol Krishna Karmakar,Steven Basart,Stephen Fitz,Mindy Levine,Ponnurangam Kumaraguru,Uday Tupakula,Vijay Varadharajan,Ruoyu Wang,Yan Shoshitaishvili,Jimmy Ba,Kevin M. Esvelt,Alexandr Wang,Dan Hendrycks

from arxiv, See the project page at //wmdp.ai

The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at //wmdp.ai

MoDELS · Performer · 語言模型化 · 基準 · Extensibility ·

2024 年 5 月 14 日

SpeechVerse: A Large-scale Generalizable Audio Language Model

Nilaksh Das,Saket Dingliwal,Srikanth Ronanki,Rohit Paturi,David Huang,Prashant Mathur,Jie Yuan,Dhanush Bekal,Xing Niu,Sai Muralidhar Jayanthi,Xilai Li,Karel Mundnich,Monica Sunkara,Sundararajan Srinivasan,Kyu J Han,Katrin Kirchhoff

from arxiv, Single Column, 13 page

Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore develop SpeechVerse, a robust multi-task training and curriculum learning framework that combines pre-trained speech and text foundation models via a small set of learnable parameters, while keeping the pre-trained models frozen during training. The models are instruction finetuned using continuous latent representations extracted from the speech foundation model to achieve optimal zero-shot performance on a diverse range of speech processing tasks using natural language instructions. We perform extensive benchmarking that includes comparing our model performance against traditional baselines across several datasets and tasks. Furthermore, we evaluate the model's capability for generalized instruction following by testing on out-of-domain datasets, novel prompts, and unseen tasks. Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.

多峰值 · Extensibility · Agent · AI Agent · 多樣性 ·

2024 年 2 月 23 日

Large Multimodal Agents: A Survey

Junlin Xie,Zhihong Chen,Ruifei Zhang,Xiang Wan,Guanbin Li

from arxiv, 15 pages, 4 figures

Large language models (LLMs) have achieved superior performance in powering text-based AI agents, endowing them with decision-making and reasoning abilities akin to humans. Concurrently, there is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This extension enables AI agents to interpret and respond to diverse multimodal user queries, thereby handling more intricate and nuanced tasks. In this paper, we conduct a systematic review of LLM-driven multimodal agents, which we refer to as large multimodal agents ( LMAs for short). First, we introduce the essential components involved in developing LMAs and categorize the current body of research into four distinct types. Subsequently, we review the collaborative frameworks integrating multiple LMAs , enhancing collective efficacy. One of the critical challenges in this field is the diverse evaluation methods used across existing studies, hindering effective comparison among different LMAs . Therefore, we compile these evaluation methodologies and establish a comprehensive framework to bridge the gaps. This framework aims to standardize evaluations, facilitating more meaningful comparisons. Concluding our review, we highlight the extensive applications of LMAs and propose possible future research directions. Our discussion aims to provide valuable insights and guidelines for future research in this rapidly evolving field. An up-to-date resource list is available at //github.com/jun0wanan/awesome-large-multimodal-agents.

大語言模型 · Performer · MoDELS · LLaMA · 語言模型化 ·

2024 年 2 月 9 日

Large Language Models: A Survey

Shervin Minaee,Tomas Mikolov,Narjes Nikzad,Meysam Chenaghlu,Richard Socher,Xavier Amatriain,Jianfeng Gao

from arxiv, arXiv admin note: substantial text overlap with arXiv:2401.14423

Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.

Prompt · MoDELS · TOOLS · Continuity · INTERACT ·

2023 年 11 月 21 日

Prompting Frameworks for Large Language Models: A Survey

Xiaoxia Liu,Jingyi Wang,Jun Sun,Xiaohan Yuan,Guoliang Dong,Peng Di,Wenhai Wang,Dongxia Wang

Since the launch of ChatGPT, a powerful AI Chatbot developed by OpenAI, large language models (LLMs) have made significant advancements in both academia and industry, bringing about a fundamental engineering paradigm shift in many areas. While LLMs are powerful, it is also crucial to best use their power where "prompt'' plays a core role. However, the booming LLMs themselves, including excellent APIs like ChatGPT, have several inherent limitations: 1) temporal lag of training data, and 2) the lack of physical capabilities to perform external actions. Recently, we have observed the trend of utilizing prompt-based tools to better utilize the power of LLMs for downstream tasks, but a lack of systematic literature and standardized terminology, partly due to the rapid evolution of this field. Therefore, in this work, we survey related prompting tools and promote the concept of the "Prompting Framework" (PF), i.e. the framework for managing, simplifying, and facilitating interaction with large language models. We define the lifecycle of the PF as a hierarchical structure, from bottom to top, namely: Data Level, Base Level, Execute Level, and Service Level. We also systematically depict the overall landscape of the emerging PF field and discuss potential future research and challenges. To continuously track the developments in this area, we maintain a repository at //github.com/lxx0628/Prompting-Framework-Survey, which can be a useful resource sharing platform for both academic and industry in this field.