91婷婷国产精选国产色_啊灬啊灬啊灬快灬深用两性_亚洲欧美伊人一区色综网_亚洲AV国产精品无码_扒开粉嫩小泬的图片_中文字幕乱码亚洲影视亚洲曰本_亚洲制服丝袜无码第3页

Large language models (LLMs) enhanced by retrieval-augmented generation (RAG) provide effective solutions in various application scenarios. However, developers face challenges in integrating RAG-enhanced LLMs into software systems, due to lack of interface specification, requirements from software context, and complicated system management. In this paper, we manually studied 100 open-source applications that incorporate RAG-enhanced LLMs, and their issue reports. We have found that more than 98% of applications contain multiple integration defects that harm software functionality, efficiency, and security. We have also generalized 19 defect patterns and proposed guidelines to tackle them. We hope this work could aid LLM-enabled software development and motivate future research.

相關內容

Integration

關注 7

Integration：Integration, the VLSI Journal。 Explanation：集成，VLSI雜(za)志。 Publisher：Elsevier。 SIT：

正則化項 · 估計/估計量 · Guidance · 模型評估 · 稀疏 ·

2024 年 8 月 22 日

MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

Yuqun Wu,Jae Yong Lee,Chuhang Zou,Shenlong Wang,Derek Hoiem

The latest regularized Neural Radiance Field (NeRF) approaches produce poor geometry and view extrapolation for large scale sparse view scenes, such as ETH3D. Density-based approaches tend to be under-constrained, while surface-based approaches tend to miss details. In this paper, we take a density-based approach, sampling patches instead of individual rays to better incorporate monocular depth and normal estimates and patch-based photometric consistency constraints between training views and sampled virtual views. Loosely constraining densities based on estimated depth aligned to sparse points further improves geometric accuracy. While maintaining similar view synthesis quality, our approach significantly improves geometric accuracy on the ETH3D benchmark, e.g. increasing the F1@2cm score by 4x-8x compared to other regularized density-based approaches, with much lower training and inference time than other approaches.

INFORMS · Prompt · Performer · 可辨認的 · 穩健性 ·

2024 年 8 月 20 日

Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information

Ming Jiang,Tingting Huang,Biao Guo,Yao Lu,Feng Zhang

In recent years, Large language models (LLMs) have garnered significant attention due to their superior performance in complex reasoning tasks. However, recent studies may diminish their reasoning capabilities markedly when problem descriptions contain irrelevant information, even with the use of advanced prompting techniques. To further investigate this issue, a dataset of primary school mathematics problems containing irrelevant information, named GSMIR, was constructed. Testing prominent LLMs and prompting techniques on this dataset revealed that while LLMs can identify irrelevant information, they do not effectively mitigate the interference it causes once identified. A novel automatic construction method, ATF, which enhances the ability of LLMs to identify and self-mitigate the influence of irrelevant information, is proposed to address this shortcoming. This method operates in two steps: first, analysis of irrelevant information, followed by its filtering. The ATF method, as demonstrated by experimental results, significantly improves the reasoning performance of LLMs and prompting techniques, even in the presence of irrelevant information on the GSMIR dataset.

MoDELS · Attention · SimPLe · 混合專家模型 · Performance ·

2024 年 8 月 15 日

BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

Qizhen Zhang,Nikolas Gritsch,Dwaraknath Gnaneshwar,Simon Guo,David Cairuz,Bharat Venkitesh,Jakob Foerster,Phil Blunsom,Sebastian Ruder,Ahmet Ustun,Acyr Locatelli

The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive. Existing methods mitigate this by pre-training multiple dense expert models independently and using them to initialize an MoE. This is done by using experts' feed-forward network (FFN) to initialize the MoE's experts while merging other parameters. However, this method limits the reuse of dense model parameters to only the FFN layers, thereby constraining the advantages when "upcycling" these models into MoEs. We propose BAM (Branch-Attend-Mix), a simple yet effective method that addresses this shortcoming. BAM makes full use of specialized dense models by not only using their FFN to initialize the MoE layers but also leveraging experts' attention parameters fully by initializing them into a soft-variant of Mixture of Attention (MoA) layers. We explore two methods for upcycling attention parameters: 1) initializing separate attention experts from dense models including all attention parameters for the best model performance; and 2) sharing key and value parameters across all experts to facilitate for better inference efficiency. To further improve efficiency, we adopt a parallel attention transformer architecture to MoEs, which allows the attention experts and FFN experts to be computed concurrently. Our experiments on seed models ranging from 590 million to 2 billion parameters demonstrate that BAM surpasses baselines in both perplexity and downstream task performance, within the same computational and data constraints.

語音識別 · INFORMS · 大語言模型 · MoDELS · Performer ·

2024 年 8 月 15 日

Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words

Kento Nozawa,Takashi Masuko,Toru Taniguchi

from arxiv, 13 pages, 1 figure, and 7 tables

We develop a large language model (LLM) based automatic speech recognition (ASR) system that can be contextualized by providing keywords as prior information in text prompts. We adopt decoder-only architecture and use our in-house LLM, PLaMo-100B, pre-trained from scratch using datasets dominated by Japanese and English texts as the decoder. We adopt a pre-trained Whisper encoder as an audio encoder, and the audio embeddings from the audio encoder are projected to the text embedding space by an adapter layer and concatenated with text embeddings converted from text prompts to form inputs to the decoder. By providing keywords as prior information in the text prompts, we can contextualize our LLM-based ASR system without modifying the model architecture to transcribe ambiguous words in the input audio accurately. Experimental results demonstrate that providing keywords to the decoder can significantly improve the recognition performance of rare and ambiguous words.

推斷 · 語言模型化 · 模型評估 · 可約的 · 詞元分析器 ·

2024 年 8 月 14 日

SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context

Hongjun An,Yifan Chen,Zhe Sun,Xuelong Li

from arxiv, update the article

Current large language models (LLMs) primarily utilize next-token prediction method for inference, which significantly impedes their processing speed. In this paper, we introduce a novel inference methodology termed next-sentence prediction, aiming at enhancing the inference efficiency of LLMs. We present Sentence Variational Autoencoder (SentenceVAE), which includes a Sentence Encoder to compress multiple tokens in a sentence into a single token, and a Sentence Decoder to reconstruct it. By integrating SentenceVAE into the input and output layers of LLMs, we develop Sentence-level LLMs (SLLMs) that employ a sentence-by-sentence inference method. In addition, the SentenceVAE module of SLLMs can maintain the integrity of the original semantic content by segmenting the context into sentences, thereby improving accuracy while boosting inference speed. Moreover, compared to previous LLMs, SLLMs process fewer tokens over equivalent context length, significantly reducing memory demands for self-attention computation and facilitating the handling of longer context. Extensive experiments on Wanjuan dataset have revealed that the proposed method can accelerate inference speed by 204~365%, reduce perplexity (PPL) to 46~75% of its original metric, and decrease memory overhead by 86~91% for the equivalent context length, compared to previous token-by-token methods.

正則化項 · MoDELS · 語言模型化 · 多樣性 · 有偏 ·

2024 年 8 月 8 日

Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models

Yupeng Chang,Yi Chang,Yuan Wu

from arxiv, Work in progress

Large language models (LLMs) have exhibited remarkable proficiency across a diverse array of natural language processing (NLP) tasks. However, adapting LLMs to downstream applications typically necessitates computationally intensive and memory-demanding fine-tuning procedures. To mitigate these burdens, parameter-efficient fine-tuning (PEFT) techniques have emerged as a promising approach to tailor LLMs with minimal computational overhead. While PEFT methods offer substantial advantages, they do not fully address the pervasive issue of bias propagation from pre-training data. In this work, we introduce Bias-Aware Low-Rank Adaptation (BA-LoRA), a novel PEFT method designed to counteract bias inheritance. BA-LoRA incorporates three distinct regularization terms: (1) consistency regularizer, (2) diversity regularizer, and (3) singular vector decomposition regularizer. These regularizers collectively aim to improve the generative models' consistency, diversity, and generalization capabilities during the fine-tuning process. Through extensive experiments on a variety of natural language understanding (NLU) and natural language generation (NLG) tasks, employing prominent LLMs such as LLaMA, Mistral, and Gemma, we demonstrate that BA-LoRA surpasses the performance of LoRA and its state-of-the-art variants. Moreover, our method effectively mitigates the deleterious effects of pre-training bias, leading to more reliable and robust model outputs. The code is available at //github.com/cyp-jlu-ai/BA-LoRA.

MoDELS · 判別器 · 情景 · 有向 · 分離的 ·

2024 年 8 月 7 日

D2PO: Discriminator-Guided DPO with Response Evaluation Models

Prasann Singhal,Nathan Lambert,Scott Niekum,Tanya Goyal,Greg Durrett

from arxiv, 20 pages, 12 figures, Accepted to COLM 2024

Varied approaches for aligning language models have been proposed, including supervised fine-tuning, RLHF, and direct optimization methods such as DPO. Although DPO has rapidly gained popularity due to its straightforward training process and competitive results, there is an open question of whether there remain practical advantages of using a discriminator, like a reward model, to evaluate responses. We propose D2PO, discriminator-guided DPO, an approach for the online setting where preferences are being collected throughout learning. As we collect gold preferences, we use these not only to train our policy, but to train a discriminative response evaluation model to silver-label even more synthetic data for policy training. We explore this approach across a set of diverse tasks, including a realistic chat setting, we find that our approach leads to higher-quality outputs compared to DPO with the same data budget, and greater efficiency in terms of preference data requirements. Furthermore, we show conditions under which silver labeling is most helpful: it is most effective when training the policy with DPO, outperforming traditional PPO, and benefits from maintaining a separate discriminator from the policy model.

Performer · 大語言模型 · AI · 生成式人工智能 · 講稿 ·

2024 年 8 月 5 日

Development of REGAI: Rubric Enabled Generative Artificial Intelligence

Zach Johnson,Jeremy Straub

This paper presents and evaluates a new retrieval augmented generation (RAG) and large language model (LLM)-based artificial intelligence (AI) technique: rubric enabled generative artificial intelligence (REGAI). REGAI uses rubrics, which can be created manually or automatically by the system, to enhance the performance of LLMs for evaluation purposes. REGAI improves on the performance of both classical LLMs and RAG-based LLM techniques. This paper describes REGAI, presents data regarding its performance and discusses several possible application areas for the technology.

Taxonomy · 設計 · MoDELS · Responsible AI · motivation ·

2024 年 8 月 5 日

Towards AI-Safety-by-Design: A Taxonomy of Runtime Guardrails in Foundation Model based Systems

Md Shamsujjoha,Qinghua Lu,Dehai Zhao,Liming Zhu

from arxiv, 15 Pages

The rapid advancement and widespread deployment of foundation model (FM) based systems have revolutionized numerous applications across various domains. However, the fast-growing capabilities and autonomy have also raised significant concerns about responsible AI and AI safety. Recently, there have been increasing attention toward implementing guardrails to ensure the runtime behavior of FM-based systems is safe and responsible. Given the early stage of FMs and their applications (such as agents), the design of guardrails have not yet been systematically studied. It remains underexplored which software qualities should be considered when designing guardrails and how these qualities can be ensured from a software architecture perspective. Therefore, in this paper, we present a taxonomy for guardrails to classify and compare the characteristics and design options of guardrails. Our taxonomy is organized into three main categories: the motivation behind adopting runtime guardrails, the quality attributes to consider, and the design options available. This taxonomy provides structured and concrete guidance for making architectural design decisions when designing guardrails and highlights trade-offs arising from the design decisions.

推斷 · MoDELS · 語言模型化 · 原點 · 可約的 ·

2024 年 8 月 2 日

SentenceVAE: Faster, Longer and More Accurate Inference with Next-sentence Prediction for Large Language Models

Hongjun An,Yifan Chen,Xiaozhen Qiao,Zhe Sun,Xuelong Li

from arxiv, Modified some of the expression details and optimized the charts

Contemporary large language models (LLMs) primarily rely on next-token prediction method for inference, which significantly impedes their processing speed. In this paper, we introduce a novel inference methodology termed next-sentence prediction, aimed at enhancing the inference efficiency of LLMs. We present Sentence Variational Autoencoder (SentenceVAE), a tiny model consisting of a Sentence Encoder and a Sentence Decoder. The encoder effectively condenses the information within a sentence into a singular token, while the decoder reconstructs this compressed data back into its original sentential form. By integrating SentenceVAE into the input and output layers of LLMs, we develop Sentence-level LLMs (SLLMs) that employ a sentence-by-sentence inference approach, markedly accelerating inference speeds. SentenceVAE also maintains the integrity of the original semantic content by segmenting the text into sentences, thereby improving accuracy while boosting inference speeds. Compared to published LLMs, SLLMs process fewer tokens over equivalent context lengths, significantly reducing memory demands for self-attention computations and facilitating the handling of longer contexts. Our experimental findings reveal that this method can accelerate inference speeds by 204~365%, reduce perplexity (PPL) to 46~75% of its original metric, and decrease memory overhead by 86~91% for the same context length, compared to the token-by-token method. Moreover, the benefits of this approach become even more pronounced as model parameters increase.