99视频在线播放喷射,国产高清一区二区在线影院,亚洲日韩精品欧美1区2区3区在线观看,一本久道国产在线线观看,中文字幕无码专区日韩精品

Despite the impressive performance across numerous tasks, large language models (LLMs) often fail in solving simple decision-making tasks due to the misalignment of the knowledge in LLMs with environments. On the contrary, reinforcement learning (RL) agents learn policies from scratch, which makes them always align with environments but difficult to incorporate prior knowledge for efficient explorations. To narrow the gap, we propose TWOSOME, a novel general online framework that deploys LLMs as decision-making agents to efficiently interact and align with embodied environments via RL without requiring any prepared datasets or prior knowledge of the environments. Firstly, we query the joint probabilities of each valid action with LLMs to form behavior policies. Then, to enhance the stability and robustness of the policies, we propose two normalization methods and summarize four prompt design principles. Finally, we design a novel parameter-efficient training architecture where the actor and critic share one frozen LLM equipped with low-rank adapters (LoRA) updated by PPO. We conduct extensive experiments to evaluate TWOSOME. i) TWOSOME exhibits significantly better sample efficiency and performance compared to the conventional RL method, PPO, and prompt tuning method, SayCan, in both classical decision-making environment, Overcooked, and simulated household environment, VirtualHome. ii) Benefiting from LLMs' open-vocabulary feature, TWOSOME shows superior generalization ability to unseen tasks. iii) Under our framework, there is no significant loss of the LLMs' original ability during online PPO finetuning.

相關內容

回合

關注 3

多峰值 · 離散化 · MoDELS · 大語言模型 · 語言模型化 ·

2024 年 3 月 7 日

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Jun Zhan,Junqi Dai,Jiasheng Ye,Yunhua Zhou,Dong Zhang,Zhigeng Liu,Xin Zhang,Ruibin Yuan,Ge Zhang,Linyang Li,Hang Yan,Jie Fu,Tao Gui,Tianxiang Sun,Yugang Jiang,Xipeng Qiu

from arxiv, 28 pages, 16 figures, under review, work in progress

We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music. AnyGPT can be trained stably without any alterations to the current large language model (LLM) architecture or training paradigms. Instead, it relies exclusively on data-level preprocessing, facilitating the seamless integration of new modalities into LLMs, akin to the incorporation of new languages. We build a multimodal text-centric dataset for multimodal alignment pre-training. Utilizing generative models, we synthesize the first large-scale any-to-any multimodal instruction dataset. It consists of 108k samples of multi-turn conversations that intricately interweave various modalities, thus equipping the model to handle arbitrary combinations of multimodal inputs and outputs. Experimental results demonstrate that AnyGPT is capable of facilitating any-to-any multimodal conversation while achieving performance comparable to specialized models across all modalities, proving that discrete representations can effectively and conveniently unify multiple modalities within a language model. Demos are shown in //junzhan2000.github.io/AnyGPT.github.io/

估計/估計量 · Analysis · 統計量 · 線性的 · 可理解性 ·

2024 年 3 月 7 日

Extract Mechanisms from Heterogeneous Effects: Identification Strategy for Mediation Analysis

Jiawei Fu

Understanding causal mechanisms is essential for explaining and generalizing empirical phenomena. Causal mediation analysis offers statistical techniques to quantify mediation effects. However, existing methods typically require strong identification assumptions or sophisticated research designs. We develop a new identification strategy that simplifies these assumptions, enabling the simultaneous estimation of causal and mediation effects. The strategy is based on a novel decomposition of total treatment effects, which transforms the challenging mediation problem into a simple linear regression problem. The new method establishes a new link between causal mediation and causal moderation. We discuss several research designs and estimators to increase the usability of our identification strategy for a variety of empirical studies. We demonstrate the application of our method by estimating the causal mediation effect in experiments concerning common pool resource governance and voting information. Additionally, we have created statistical software to facilitate the implementation of our method.

Markov · INFORMS · Learning · 回合 · 情景 ·

2024 年 3 月 6 日

Markov Persuasion Processes: Learning to Persuade from Scratch

Francesco Bacchiocchi,Francesco Emanuele Stradi,Matteo Castiglioni,Alberto Marchesi,Nicola Gatti

In Bayesian persuasion, an informed sender strategically discloses information to a receiver so as to persuade them to undertake desirable actions. Recently, a growing attention has been devoted to settings in which sender and receivers interact sequentially. Recently, Markov persuasion processes (MPPs) have been introduced to capture sequential scenarios where a sender faces a stream of myopic receivers in a Markovian environment. The MPPs studied so far in the literature suffer from issues that prevent them from being fully operational in practice, e.g., they assume that the sender knows receivers' rewards. We fix such issues by addressing MPPs where the sender has no knowledge about the environment. We design a learning algorithm for the sender, working with partial feedback. We prove that its regret with respect to an optimal information-disclosure policy grows sublinearly in the number of episodes, as it is the case for the loss in persuasiveness cumulated while learning. Moreover, we provide a lower bound for our setting matching the guarantees of our algorithm.

語言模型化 · MoDELS · 詞元分析器 · binary · Performer ·

2024 年 3 月 6 日

Arithmetic with Language Models: from Memorization to Computation

Davide Maltoni,Matteo Ferrara

A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.

優化器 · 正例 · MoDELS · Notability · 可約的 ·

2024 年 3 月 6 日

Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization

Shitong Duan,Xiaoyuan Yi,Peng Zhang,Tun Lu,Xing Xie,Ning Gu

Large language models (LLMs) have revolutionized the role of AI, yet also pose potential risks of propagating unethical content. Alignment technologies have been introduced to steer LLMs towards human preference, gaining increasing attention. Despite notable breakthroughs in this direction, existing methods heavily rely on high-quality positive-negative training pairs, suffering from noisy labels and the marginal distinction between preferred and dispreferred response data. Given recent LLMs' proficiency in generating helpful responses, this work pivots towards a new research focus: achieving alignment using solely human-annotated negative samples, preserving helpfulness while reducing harmfulness. For this purpose, we propose Distributional Dispreference Optimization (D$^2$O), which maximizes the discrepancy between the generated responses and the dispreferred ones to effectively eschew harmful information. We theoretically demonstrate that D$^2$O is equivalent to learning a distributional instead of instance-level preference model reflecting human dispreference against the distribution of negative responses. Besides, D$^2$O integrates an implicit Jeffrey Divergence regularization to balance the exploitation and exploration of reference policies and converges to a non-negative one during training. Extensive experiments demonstrate that our method achieves comparable generation quality and surpasses the latest baselines in producing less harmful and more informative responses with better training stability and faster convergence.

多峰值 · 語言模型化 · 大語言模型 · MoDELS · 推斷 ·

2024 年 3 月 5 日

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

Gen Luo,Yiyi Zhou,Yuxin Zhang,Xiawu Zheng,Xiaoshuai Sun,Rongrong Ji

Despite remarkable progress, existing multimodal large language models (MLLMs) are still inferior in granular visual recognition. Contrary to previous works, we study this problem from the perspective of image resolution, and reveal that a combination of low- and high-resolution visual features can effectively mitigate this shortcoming. Based on this observation, we propose a novel and efficient method for MLLMs, termed Mixture-of-Resolution Adaptation (MRA). In particular, MRA adopts two visual pathways for images with different resolutions, where high-resolution visual information is embedded into the low-resolution pathway via the novel mixture-of-resolution adapters (MR-Adapters). This design also greatly reduces the input sequence length of MLLMs. To validate MRA, we apply it to a recent MLLM called LLaVA, and term the new model LLaVA-HR. We conduct extensive experiments on 11 vision-language (VL) tasks, which show that LLaVA-HR outperforms existing MLLMs on 8 VL tasks, e.g., +9.4% on TextVQA. More importantly, both training and inference of LLaVA-HR remain efficient with MRA, e.g., 20 training hours and 3$\times$ inference speed than LLaVA-1.5. Source codes are released at: //github.com/luogen1996/LLaVA-HR.

MoDELS · 狀態空間 · 語言模型化 · 大語言模型 · INFORMS ·

2024 年 3 月 5 日

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

Wei He,Kai Han,Yehui Tang,Chengcheng Wang,Yujie Yang,Tianyu Guo,Yunhe Wang

Large language models (LLMs) face a daunting challenge due to the excessive computational and memory requirements of the commonly used Transformer architecture. While state space model (SSM) is a new type of foundational network architecture offering lower computational complexity, their performance has yet to fully rival that of Transformers. This paper introduces DenseSSM, a novel approach to enhance the flow of hidden information between layers in SSMs. By selectively integrating shallowlayer hidden states into deeper layers, DenseSSM retains fine-grained information crucial for the final output. Dense connections enhanced DenseSSM still maintains the training parallelizability and inference efficiency. The proposed method can be widely applicable to various SSM types like RetNet and Mamba. With similar model size, DenseSSM achieves significant improvements, exemplified by DenseRetNet outperforming the original RetNet with up to 5% accuracy improvement on public benchmarks. code is avalaible at //github.com/WailordHe/DenseSSM

MoDELS · Automator · HTTPS · CASE · Projection ·

2024 年 3 月 5 日

AlloyInEcore: Embedding of First-Order Relational Logic into Meta-Object Facility for Automated Model Reasoning

Ferhat Erata,Arda Goknil,Ivan Kurtev,Bedir Tekinerdogan

from arxiv, Published in ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

We present AlloyInEcore, a tool for specifying metamodels with their static semantics to facilitate automated, formal reasoning on models. Software development projects require that software systems be specified in various models (e.g., requirements models, architecture models, test models, and source code). It is crucial to reason about those models to ensure the correct and complete system specifications. AlloyInEcore allows the user to specify metamodels with their static semantics, while, using the semantics, it automatically detects inconsistent models, and completes partial models. It has been evaluated on three industrial case studies in the automotive domain (//modelwriter.github.io/AlloyInEcore/).

語言模型化 · MoDELS · 知識 (knowledge) · 大語言模型 · CC ·

2024 年 3 月 5 日

Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora

Zhaoye Fei,Yunfan Shao,Linyang Li,Zhiyuan Zeng,Conghui He,Hang Yan,Dahua Lin,Xipeng Qiu

from arxiv, We have released the full data (total of 735GB) in //huggingface.co/datasets/Query-of-CC/knowledge_pile_full and partial data (about 40GB) in //huggingface.co/datasets/Query-of-CC/knowledge_pile

Large language models have demonstrated remarkable potential in various tasks, however, there remains a significant scarcity of open-source models and data for specific domains. Previous works have primarily focused on manually specifying resources and collecting high-quality data on specific domains, which significantly consume time and effort. To address this limitation, we propose an efficient data collection method $\textit{Query of CC}$ based on large language models. This method bootstraps seed information through a large language model and retrieves related data from public corpora. It not only collects knowledge-related data for specific domains but unearths the data with potential reasoning procedures. Through the application of this method, we have curated a high-quality dataset called KNOWLEDGE PILE, encompassing four major domains, including stem and humanities sciences, among others. Experimental results demonstrate that KNOWLEDGE PILE significantly improves the performance of large language models in mathematical and knowledge-related reasoning ability tests. To facilitate academic sharing, we open-source our dataset and code, providing valuable support to the academic community.

大語言模型 · 語言模型化 · MoDELS · AI · INTERACT ·

2023 年 12 月 23 日

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Xupeng Miao,Gabriele Oliaro,Zhihao Zhang,Xinhao Cheng,Hongyi Jin,Tianqi Chen,Zhihao Jia

In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency, particularly in scenarios demanding low latency and high throughput. This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective, standing at the crux of advanced AI innovations and practical system optimizations. We provide in-depth analysis, covering a spectrum of solutions, ranging from cutting-edge algorithmic modifications to groundbreaking changes in system designs. The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving, offering valuable insights for researchers and practitioners in overcoming the barriers of effective LLM deployment, thereby reshaping the future of AI.