国产一国产一级毛片A久久久,国精产品W灬源码网站1688,成人精品久久久久久一区二区,熟女肥婆一区二区三区四区

Recently, natural language generation (NLG) evaluation has shifted from a single-aspect to a multi-aspect paradigm, allowing for a more accurate assessment. Large language models (LLMs) achieve superior performance on various NLG evaluation tasks. However, current work often employs the LLM to independently evaluate different aspects, which largely ignores the rich correlation between various aspects. To fill this research gap, in this work, we propose an NLG evaluation metric called CoAScore. Powered by LLMs, the CoAScore utilizes multi-aspect knowledge through a CoA (\textbf{C}hain-\textbf{o}f-\textbf{A}spects) prompting framework when assessing the quality of a certain aspect. Specifically, for a given aspect to evaluate, we first prompt the LLM to generate a chain of aspects that are relevant to the target aspect and could be useful for the evaluation. We then collect evaluation scores for each generated aspect, and finally, leverage the knowledge of these aspects to improve the evaluation of the target aspect. We evaluate CoAScore across five NLG evaluation tasks (e.g., summarization, dialog response generation, etc) and nine aspects (e.g., overall quality, relevance, coherence, etc). Our experimental findings highlight that, in comparison to individual aspect evaluation, CoAScore exhibits a higher correlation with human judgments. This improvement significantly outperforms existing unsupervised evaluation metrics, whether for assessing overall quality or other aspects. We also conducted extensive ablation studies to validate the effectiveness of the three stages within the CoAScore framework and conducted case studies to show how the LLM performs in these stages. Our code and scripts are available.

相關內容

大語言模型

關注 56

大語言(yan)(yan)模型(xing)是(shi)基(ji)于海量(liang)文本(ben)數據訓練(lian)的(de)深(shen)度學習模型(xing)。它(ta)不僅能(neng)(neng)夠(gou)生(sheng)成(cheng)自然語言(yan)(yan)文本(ben)，還能(neng)(neng)夠(gou)深(shen)入(ru)理解文本(ben)含義(yi)，處(chu)理各種自然語言(yan)(yan)任務，如文本(ben)摘要、問答、翻譯等(deng)。2023年，大語言(yan)(yan)模型(xing)及(ji)其(qi)(qi)在人(ren)(ren)(ren)工智能(neng)(neng)領域的(de)應用(yong)已(yi)成(cheng)為全球科技(ji)(ji)研究的(de)熱點，其(qi)(qi)在規模上的(de)增(zeng)長尤為引人(ren)(ren)(ren)注目(mu)，參數量(liang)已(yi)從最初(chu)的(de)十(shi)幾億(yi)躍升(sheng)到如今的(de)一(yi)萬億(yi)。參數量(liang)的(de)提(ti)升(sheng)使得模型(xing)能(neng)(neng)夠(gou)更(geng)(geng)加精(jing)細(xi)地(di)捕捉(zhuo)人(ren)(ren)(ren)類語言(yan)(yan)微妙之(zhi)處(chu)，更(geng)(geng)加深(shen)入(ru)地(di)理解人(ren)(ren)(ren)類語言(yan)(yan)的(de)復(fu)雜性。在過去的(de)一(yi)年里(li)，大語言(yan)(yan)模型(xing)在吸納新知(zhi)識(shi)、分解復(fu)雜任務以及(ji)圖文對齊等(deng)多(duo)方(fang)面都(dou)有(you)顯著(zhu)提(ti)升(sheng)。隨著(zhu)技(ji)(ji)術(shu)的(de)不斷(duan)成(cheng)熟，它(ta)將不斷(duan)拓展(zhan)其(qi)(qi)應用(yong)范(fan)圍，為人(ren)(ren)(ren)類提(ti)供更(geng)(geng)加智能(neng)(neng)化和個性化的(de)服務，進(jin)一(yi)步改善(shan)人(ren)(ren)(ren)們的(de)生(sheng)活和生(sheng)產方(fang)式。

有偏 · 語言模型化 · 大語言模型 · MoDELS · 可辨認的 ·

2024 年 2 月 6 日

The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs

Tianyang Han,Qing Lian,Rui Pan,Renjie Pi,Jipeng Zhang,Shizhe Diao,Yong Lin,Tong Zhang

Large language models (LLMs) have recently experienced remarkable progress, where the advent of multi-modal large language models (MLLMs) has endowed LLMs with visual capabilities, leading to impressive performances in various multi-modal tasks. However, those powerful MLLMs such as GPT-4V still fail spectacularly when presented with certain image and text inputs. In this paper, we identify a typical class of inputs that baffles MLLMs, which consist of images that are highly relevant but inconsistent with answers, causing MLLMs to suffer from hallucination. To quantify the effect, we propose CorrelationQA, the first benchmark that assesses the hallucination level given spurious images. This benchmark contains 7,308 text-image pairs across 13 categories. Based on the proposed CorrelationQA, we conduct a thorough analysis on 9 mainstream MLLMs, illustrating that they universally suffer from this instinctive bias to varying degrees. We hope that our curated benchmark and evaluation results aid in better assessments of the MLLMs' robustness in the presence of misleading images. The resource is available in //github.com/MasaiahHan/CorrelationQA.

圖 · MoDELS · Performer · Graph Transformer · Integration ·

2024 年 2 月 6 日

Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation

Lingxiao Zhao,Xueying Ding,Leman Akoglu

from arxiv, Diffusion Model on Graphs

Graph generation has been dominated by autoregressive models due to their simplicity and effectiveness, despite their sensitivity to ordering. Yet diffusion models have garnered increasing attention, as they offer comparable performance while being permutation-invariant. Current graph diffusion models generate graphs in a one-shot fashion, but they require extra features and thousands of denoising steps to achieve optimal performance. We introduce PARD, a Permutation-invariant Auto Regressive Diffusion model that integrates diffusion models with autoregressive methods. PARD harnesses the effectiveness and efficiency of the autoregressive model while maintaining permutation invariance without ordering sensitivity. Specifically, we show that contrary to sets, elements in a graph are not entirely unordered and there is a unique partial order for nodes and edges. With this partial order, PARD generates a graph in a block-by-block, autoregressive fashion, where each block's probability is conditionally modeled by a shared diffusion model with an equivariant network. To ensure efficiency while being expressive, we further propose a higher-order graph transformer, which integrates transformer with PPGN. Like GPT, we extend the higher-order graph transformer to support parallel training of all blocks. Without any extra features, PARD achieves state-of-the-art performance on molecular and non-molecular datasets, and scales to large datasets like MOSES containing 1.9M molecules.

Conformer · MoDELS · 語言模型化 · 大語言模型 · 可約的 ·

2024 年 2 月 5 日

C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models

Mintong Kang,Nezihe Merve Gürel,Ning Yu,Dawn Song,Bo Li

Despite the impressive capabilities of large language models (LLMs) across diverse applications, they still suffer from trustworthiness issues, such as hallucinations and misalignments. Retrieval-augmented language models (RAG) have been proposed to enhance the credibility of generations by grounding external knowledge, but the theoretical understandings of their generation risks remains unexplored. In this paper, we answer: 1) whether RAG can indeed lead to low generation risks, 2) how to provide provable guarantees on the generation risks of RAG and vanilla LLMs, and 3) what sufficient conditions enable RAG models to reduce generation risks. We propose C-RAG, the first framework to certify generation risks for RAG models. Specifically, we provide conformal risk analysis for RAG models and certify an upper confidence bound of generation risks, which we refer to as conformal generation risk. We also provide theoretical guarantees on conformal generation risks for general bounded risk functions under test distribution shifts. We prove that RAG achieves a lower conformal generation risk than that of a single LLM when the quality of the retrieval model and transformer is non-trivial. Our intensive empirical results demonstrate the soundness and tightness of our conformal generation risk guarantees across four widely-used NLP datasets on four state-of-the-art retrieval models.

MoDELS · 樣本 · state-of-the-art · 去噪 · 點云 ·

2024 年 2 月 5 日

DexDiffuser: Generating Dexterous Grasps with Diffusion Models

Zehang Weng,Haofei Lu,Danica Kragic,Jens Lundell

from arxiv, 8 pages

We introduce DexDiffuser, a novel dexterous grasping method that generates, evaluates, and refines grasps on partial object point clouds. DexDiffuser includes the conditional diffusion-based grasp sampler DexSampler and the dexterous grasp evaluator DexEvaluator. DexSampler generates high-quality grasps conditioned on object point clouds by iterative denoising of randomly sampled grasps. We also introduce two grasp refinement strategies: Evaluator-Guided Diffusion (EGD) and Evaluator-based Sampling Refinement (ESR). Our simulation and real-world experiments on the Allegro Hand consistently demonstrate that DexDiffuser outperforms the state-of-the-art multi-finger grasp generation method FFHNet with an, on average, 21.71--22.20\% higher grasp success rate.

大語言模型 · MoDELS · 推斷 · API · state-of-the-art ·

2024 年 2 月 2 日

APIServe: Efficient API Support for Large-Language Model Inferencing

Reyna Abhyankar,Zijian He,Vikranth Srivatsa,Hao Zhang,Yiying Zhang

Large language models are increasingly integrated with external tools and APIs like ChatGPT plugins to extend their capability beyond language-centric tasks. However, today's LLM inference systems are designed for standalone LLMs. They treat API calls as new requests, causing unnecessary recomputation of already computed contexts, which accounts for 37-40% of total model forwarding time. This paper presents APIServe, the first LLM inference framework targeting API-augmented LLMs. APISERVE minimizes the GPU resource waste caused by API calls and dedicates saved memory for serving more requests. APISERVE improves the overall serving throughput by 1.6x and completes 2x more requests per second compared to the state-of-the-art LLM inference systems.

tuning · 大語言模型 · MoDELS · 語言模型化 · 泛化理論 ·

2024 年 2 月 2 日

LLM-Detector: Improving AI-Generated Chinese Text Detection with Open-Source LLM Instruction Tuning

Rongsheng Wang,Haoming Chen,Ruizhe Zhou,Han Ma,Yaofei Duan,Yanlan Kang,Songhua Yang,Baoyu Fan,Tao Tan

from arxiv, 17 pages, 13 tables, 7 figures

ChatGPT and other general large language models (LLMs) have achieved remarkable success, but they have also raised concerns about the misuse of AI-generated texts. Existing AI-generated text detection models, such as based on BERT and RoBERTa, are prone to in-domain over-fitting, leading to poor out-of-domain (OOD) detection performance. In this paper, we first collected Chinese text responses generated by human experts and 9 types of LLMs, for which to multiple domains questions, and further created a dataset that mixed human-written sentences and sentences polished by LLMs. We then proposed LLM-Detector, a novel method for both document-level and sentence-level text detection through Instruction Tuning of LLMs. Our method leverages the wealth of knowledge LLMs acquire during pre-training, enabling them to detect the text they generate. Instruction tuning aligns the model's responses with the user's expected text detection tasks. Experimental results show that previous methods struggle with sentence-level AI-generated text detection and OOD detection. In contrast, our proposed method not only significantly outperforms baseline methods in both sentence-level and document-level text detection but also demonstrates strong generalization capabilities. Furthermore, since LLM-Detector is trained based on open-source LLMs, it is easy to customize for deployment.

大語言模型 · 語言模型化 · MoDELS · 圖 · INFORMS ·

2024 年 2 月 1 日

Large Language Models on Graphs: A Comprehensive Survey

Bowen Jin,Gang Liu,Chi Han,Meng Jiang,Heng Ji,Jiawei Han

from arxiv, 24 pages

Large language models (LLMs), such as GPT4 and LLaMA, are creating significant advancements in natural language processing, due to their strong text encoding/decoding ability and newly found emergent capability (e.g., reasoning). While LLMs are mainly designed to process pure texts, there are many real-world scenarios where text data is associated with rich structure information in the form of graphs (e.g., academic networks, and e-commerce networks) or scenarios where graph data is paired with rich textual information (e.g., molecules with descriptions). Besides, although LLMs have shown their pure text-based reasoning ability, it is underexplored whether such ability can be generalized to graphs (i.e., graph-based reasoning). In this paper, we provide a systematic review of scenarios and techniques related to large language models on graphs. We first summarize potential scenarios of adopting LLMs on graphs into three categories, namely pure graphs, text-attributed graphs, and text-paired graphs. We then discuss detailed techniques for utilizing LLMs on graphs, including LLM as Predictor, LLM as Encoder, and LLM as Aligner, and compare the advantages and disadvantages of different schools of models. Furthermore, we discuss the real-world applications of such methods and summarize open-source codes and benchmark datasets. Finally, we conclude with potential future research directions in this fast-growing field. The related source can be found at //github.com/PeterGriffinJin/Awesome-Language-Model-on-Graphs.

Performer · MoDELS · 推斷 · 混合專家模型 · 語言模型化 ·

2024 年 2 月 1 日

BlackMamba: Mixture of Experts for State-Space Models

Quentin Anthony,Yury Tokpanov,Paolo Glorioso,Beren Millidge

State-space models (SSMs) have recently demonstrated competitive performance to transformers at large-scale language modeling benchmarks while achieving linear time and memory complexity as a function of sequence length. Mamba, a recently released SSM model, shows impressive performance in both language modeling and long sequence processing tasks. Simultaneously, mixture-of-expert (MoE) models have shown remarkable performance while significantly reducing the compute and latency costs of inference at the expense of a larger memory footprint. In this paper, we present BlackMamba, a novel architecture that combines the Mamba SSM with MoE to obtain the benefits of both. We demonstrate that BlackMamba performs competitively against both Mamba and transformer baselines, and outperforms in inference and training FLOPs. We fully train and open-source 340M/1.5B and 630M/2.8B BlackMamba models on 300B tokens of a custom dataset. We show that BlackMamba inherits and combines both of the benefits of SSM and MoE architectures, combining linear-complexity generation from SSM with cheap and fast inference from MoE. We release all weights, checkpoints, and inference code open-source. Inference code at: //github.com/Zyphra/BlackMamba

語言模型化 · Performer · Agent · MoDELS · Learning ·

2023 年 5 月 19 日

Introspective Tips: Large Language Model for In-Context Decision Making

Liting Chen,Lu Wang,Hang Dong,Yali Du,Jie Yan,Fangkai Yang,Shuang Li,Pu Zhao,Si Qin,Saravan Rajmohan,Qingwei Lin,Dongmei Zhang

from arxiv, 22 pages, 4 figures

The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks. In this study, we employ ``Introspective Tips" to facilitate LLMs in self-optimizing their decision-making. By introspectively examining trajectories, LLM refines its policy by generating succinct and valuable tips. Our method enhances the agent's performance in both few-shot and zero-shot learning situations by considering three essential scenarios: learning from the agent's past experiences, integrating expert demonstrations, and generalizing across diverse games. Importantly, we accomplish these improvements without fine-tuning the LLM parameters; rather, we adjust the prompt to generalize insights from the three aforementioned situations. Our framework not only supports but also emphasizes the advantage of employing LLM in in-contxt decision-making. Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.

語言表示 · 知識神經元 · MoDELS · 圖 · 知識圖譜 ·

2019 年 9 月 17 日

K-BERT: Enabling Language Representation with Knowledge Graph

Weijie Liu,Peng Zhou,Zhe Zhao,Zhiruo Wang,Qi Ju,Haotang Deng,Ping Wang

from arxiv, 8 pages, 20190917

Pre-trained language representation models, such as BERT, capture a general language representation from large-scale corpora, but lack domain-specific knowledge. When reading a domain text, experts make inferences with relevant knowledge. For machines to achieve this capability, we propose a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge. However, too much knowledge incorporation may divert the sentence from its correct meaning, which is called knowledge noise (KN) issue. To overcome KN, K-BERT introduces soft-position and visible matrix to limit the impact of knowledge. K-BERT can easily inject domain knowledge into the models by equipped with a KG without pre-training by-self because it is capable of loading model parameters from the pre-trained BERT. Our investigation reveals promising results in twelve NLP tasks. Especially in domain-specific tasks (including finance, law, and medicine), K-BERT significantly outperforms BERT, which demonstrates that K-BERT is an excellent choice for solving the knowledge-driven problems that require experts.