销魂美女一区二区三区AV_日韩一区二区三区免费视_日韩国产精品一卡二卡3卡四卡2_日韩无码兔费黄色视频_成人性午夜视频在线观看欧美日韩_18禁美女裸体无遮挡啪啪_亚洲中文字幕一区二区三区色婷婷

This paper introduces the Decomposed Requirements Following Ratio (DRFR), a new metric for evaluating Large Language Models' (LLMs) ability to follow instructions. Addressing a gap in current methodologies, DRFR breaks down complex instructions into simpler criteria, facilitating a detailed analysis of LLMs' compliance with various aspects of tasks. Alongside this metric, we present InFoBench, a benchmark comprising 500 diverse instructions and 2,250 decomposed questions across multiple constraint categories. Our experiments compare DRFR with traditional scoring methods and explore annotation sources, including human experts, crowd-sourced workers, and GPT-4. The findings demonstrate DRFR's higher reliability and the effectiveness of using GPT-4 as a cost-efficient annotator. The evaluation of several advanced LLMs using this framework reveals their strengths and areas needing improvement, particularly in complex instruction-following. This study contributes a novel metric and benchmark, offering insights for future LLM development and evaluation.

相關內容

大語言模型

關注 56

大語(yu)(yu)言模型是基于海量(liang)(liang)文本數(shu)據訓(xun)練(lian)的(de)深(shen)度學習模型。它不僅能(neng)夠(gou)(gou)生(sheng)成自然語(yu)(yu)言文本，還(huan)能(neng)夠(gou)(gou)深(shen)入理解文本含義，處理各種(zhong)自然語(yu)(yu)言任(ren)務，如文本摘要、問(wen)答、翻譯等(deng)。2023年，大語(yu)(yu)言模型及其(qi)在(zai)人(ren)工(gong)智能(neng)領域的(de)應用已(yi)成為(wei)全(quan)球科技研究的(de)熱點(dian)，其(qi)在(zai)規模上的(de)增長尤為(wei)引人(ren)注目，參數(shu)量(liang)(liang)已(yi)從最初(chu)的(de)十幾億躍升到如今的(de)一(yi)萬億。參數(shu)量(liang)(liang)的(de)提(ti)(ti)升使得模型能(neng)夠(gou)(gou)更(geng)加(jia)(jia)精細地(di)捕捉人(ren)類(lei)(lei)語(yu)(yu)言微妙之處，更(geng)加(jia)(jia)深(shen)入地(di)理解人(ren)類(lei)(lei)語(yu)(yu)言的(de)復雜(za)性。在(zai)過去的(de)一(yi)年里，大語(yu)(yu)言模型在(zai)吸納新知(zhi)識、分解復雜(za)任(ren)務以(yi)及圖文對(dui)齊(qi)等(deng)多方面(mian)都有顯著(zhu)提(ti)(ti)升。隨著(zhu)技術的(de)不斷成熟，它將不斷拓展(zhan)其(qi)應用范(fan)圍，為(wei)人(ren)類(lei)(lei)提(ti)(ti)供(gong)更(geng)加(jia)(jia)智能(neng)化和(he)個(ge)性化的(de)服(fu)務，進(jin)一(yi)步改(gai)善人(ren)們的(de)生(sheng)活和(he)生(sheng)產(chan)方式。

MoDELS · Performer · MASS · state-of-the-art · 等變 ·

2024 年 2 月 21 日

PC-JeDi: Diffusion for Particle Cloud Generation in High Energy Physics

Matthew Leigh,Debajyoti Sengupta,Guillaume Quétant,John Andrew Raine,Knut Zoch,Tobias Golling

from arxiv, 30 pages, 25 figures, 5 tables

In this paper, we present a new method to efficiently generate jets in High Energy Physics called PC-JeDi. This method utilises score-based diffusion models in conjunction with transformers which are well suited to the task of generating jets as particle clouds due to their permutation equivariance. PC-JeDi achieves competitive performance with current state-of-the-art methods across several metrics that evaluate the quality of the generated jets. Although slower than other models, due to the large number of forward passes required by diffusion models, it is still substantially faster than traditional detailed simulation. Furthermore, PC-JeDi uses conditional generation to produce jets with a desired mass and transverse momentum for two different particles, top quarks and gluons.

MIMO · Analysis · 優化器 · Performer · LDPC ·

2024 年 2 月 21 日

URLLC in IRS-Aided MIMO Systems: Finite Blocklength Analysis and Design

Xin Zhang,Shenghui Song

from arxiv, 8 pages, 3 figures, accepted by Asilomar Conference on Signals, Systems, and Computers 2023. arXiv admin note: text overlap with arXiv:2210.08832

This paper investigates the ultra reliable and low latency communication (URLLC) performance of the IRS-aided MIMO system. The upper and lower bounds of the optimal average error probability (OAEP) for the coding rate 1/sqrt(Mn) of the capacity are derived, where n and M represent the blocklength and the number of transmit antennas, respectively. To achieve this goal, a new central limit theorem (CLT) for the mutual information density over the IRS-aided MIMO system is derived in the asymptotic regime where the block-length, the IRS size, and number of the antennas go to infinity with the same pace. The CLT is then utilized to derive the closed form upper and lower bounds for the OAEP. Based on the analysis result, a gradient-based algorithm is proposed to minimize the lower bound of the OAEP by optimizing the phase shift of the IRS. Simulation results validate the fitness of the CLT and the effectiveness of the proposed algorithm in optimizing the theoretical bound, as well as the performance of practical LDPC code.

可理解性 · Vision · tuning · MoDELS · 語言模型化 ·

2024 年 2 月 20 日

CoLLaVO: Crayon Large Language and Vision mOdel

Byung-Kwan Lee,Beomchan Park,Chae Won Kim,Yong Man Ro

The remarkable success of Large Language Models (LLMs) and instruction tuning drives the evolution of Vision Language Models (VLMs) towards a versatile general-purpose model. Yet, it remains unexplored whether current VLMs genuinely possess quality object-level image understanding capabilities determined from `what objects are in the image?' or `which object corresponds to a specified bounding box?'. Our findings reveal that the image understanding capabilities of current VLMs are strongly correlated with their zero-shot performance on vision language (VL) tasks. This suggests that prioritizing basic image understanding is crucial for VLMs to excel at VL tasks. To enhance object-level image understanding, we propose Crayon Large Language and Vision mOdel(CoLLaVO), which incorporates instruction tuning with Crayon Prompt as a new visual prompt tuning scheme based on panoptic color maps. Furthermore, we present a learning strategy of Dual QLoRA to preserve object-level image understanding without forgetting it during visual instruction tuning, thereby achieving a significant leap in numerous VL benchmarks in a zero-shot setting.

NLP · 可約的 · Analysis · 周期的 · 近似 ·

2024 年 2 月 19 日

Citation Amnesia: NLP and Other Academic Fields Are in a Citation Age Recession

Jan Philip Wahle,Terry Ruas,Mohamed Abdalla,Bela Gipp,Saif M. Mohammad

This study examines the tendency to cite older work across 20 fields of study over 43 years (1980--2023). We put NLP's propensity to cite older work in the context of these 20 other fields to analyze whether NLP shows similar temporal citation patterns to these other fields over time or whether differences can be observed. Our analysis, based on a dataset of approximately 240 million papers, reveals a broader scientific trend: many fields have markedly declined in citing older works (e.g., psychology, computer science). We term this decline a 'citation age recession', analogous to how economists define periods of reduced economic activity. The trend is strongest in NLP and ML research (-12.8% and -5.5% in citation age from previous peaks). Our results suggest that citing more recent works is not directly driven by the growth in publication rates (-3.4% across fields; -5.2% in humanities; -5.5% in formal sciences) -- even when controlling for an increase in the volume of papers. Our findings raise questions about the scientific community's engagement with past literature, particularly for NLP, and the potential consequences of neglecting older but relevant research. The data and a demo showcasing our results are publicly available.

MoDELS · Analysis · 數據集 · 論文 · Guidance ·

2024 年 2 月 19 日

Analysis of Multidomain Abstractive Summarization Using Salience Allocation

Tohida Rehman,Raghubir Bose,Soumik Dey,Samiran Chattopadhyay

from arxiv, 11 pages, 1 figure, 4 tables

This paper explores the realm of abstractive text summarization through the lens of the SEASON (Salience Allocation as Guidance for Abstractive SummarizatiON) technique, a model designed to enhance summarization by leveraging salience allocation techniques. The study evaluates SEASON's efficacy by comparing it with prominent models like BART, PEGASUS, and ProphetNet, all fine-tuned for various text summarization tasks. The assessment is conducted using diverse datasets including CNN/Dailymail, SAMSum, and Financial-news based Event-Driven Trading (EDT), with a specific focus on a financial dataset containing a substantial volume of news articles from 2020/03/01 to 2021/05/06. This paper employs various evaluation metrics such as ROUGE, METEOR, BERTScore, and MoverScore to evaluate the performance of these models fine-tuned for generating abstractive summaries. The analysis of these metrics offers a thorough insight into the strengths and weaknesses demonstrated by each model in summarizing news dataset, dialogue dataset and financial text dataset. The results presented in this paper not only contribute to the evaluation of the SEASON model's effectiveness but also illuminate the intricacies of salience allocation techniques across various types of datasets.

情景 · state-of-the-art · Packing · 圖 · SCAN ·

2024 年 2 月 18 日

CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers

Brian Wheatman,Randal Burns,Ayd?n Bulu?,Helen Xu

This paper introduces the batch-parallel Compressed Packed Memory Array (CPMA), a compressed, dynamic, ordered set data structure based on the Packed Memory Array (PMA). Traditionally, batch-parallel sets are built on pointer-based data structures such as trees because pointer-based structures enable fast parallel unions via pointer manipulation. When compared with cache-optimized trees, PMAs were slower to update but faster to scan. The batch-parallel CPMA overcomes this tradeoff between updates and scans by optimizing for cache-friendliness. On average, the CPMA achieves 3x faster batch-insert throughput and 4x faster range-query throughput compared with compressed PaC-trees, a state-of-the-art batch-parallel set library based on cache-optimized trees. We further evaluate the CPMA compared with compressed PaC-trees and Aspen, a state-of-the-art system, on a real-world application of dynamic-graph processing. The CPMA is on average 1.2x faster on a suite of graph algorithms and 2x faster on batch inserts when compared with compressed PaC-trees. Furthermore, the CPMA is on average 1.3x faster on graph algorithms and 2x faster on batch inserts compared with Aspen.

MoDELS · 噪聲分布 · 通道 · Analysis · 噪聲 ·

2024 年 2 月 17 日

Characterizing First Arrival Position Channels: Noise Distribution and Capacity Analysis

Yen-Chi Lee,Yun-Feng Lo,Jen-Ming Wu,Min-Hsiu Hsieh

from arxiv, 16 pages, 9 figures, 1 table; this manuscript (v3) is submitted to IEEE Transactions on Communications and currently is under minor revision

This paper introduces a novel mathematical model for Molecular Communication (MC) systems, utilizing First Arrival Position (FAP) as a fundamental mode of information transmission. We address two critical challenges: the characterization of FAP density and the establishment of capacity bounds for channels with vertically-drifted FAP. Our method relate macroscopic Partial Differential Equation (PDE) models to microscopic Stochastic Differential Equation (SDE) models, resulting in a precise expression that links FAP density with elliptic-type Green's function. This formula is distinguished by its wide applicability across any spatial dimensions, any drift directions, and various receiver geometries. We demonstrate the practicality of our model through case studies: 2D and 3D planar receivers. The accuracy of our formula is also validated by particle-based simulations. Advancing further, the explicit FAP density forms enable us to establish closed-form upper and lower bounds for the capacity of vertically-drifted FAP channels under a second-moment constraint, significantly advancing the understanding of FAP channels in MC systems.

Networking · 簇 · Networks · DOT · 點積 ·

2024 年 2 月 14 日

Signed Diverse Multiplex Networks: Clustering and Inference

Marianna Pensky

from arxiv, 8 figures

The paper introduces a Signed Generalized Random Dot Product Graph (SGRDPG) model, which is a variant of the Generalized Random Dot Product Graph (GRDPG), where, in addition, edges can be positive or negative. The setting is extended to a multiplex version, where all layers have the same collection of nodes and follow the SGRDPG. The only common feature of the layers of the network is that they can be partitioned into groups with common subspace structures, while otherwise all matrices of connection probabilities can be all different. The setting above is extremely flexible and includes a variety of existing multiplex network models as its particular cases. The paper fulfills two objectives. First, it shows that keeping signs of the edges in the process of network construction leads to a better precision of estimation and clustering and, hence, is beneficial for tackling real world problems such as analysis of brain networks. Second, by employing novel algorithms, our paper ensures equivalent or superior accuracy than has been achieved in simpler multiplex network models. In addition to theoretical guarantees, both of those features are demonstrated using numerical simulations and a real data example.

知識 (knowledge) · 圖 · 知識圖譜 · 數據集 · Vine ·

2023 年 5 月 22 日

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

Yuqi Zhu,Xiaohan Wang,Jing Chen,Shuofei Qiao,Yixin Ou,Yunzhi Yao,Shumin Deng,Huajun Chen,Ningyu Zhang

from arxiv, Work in progress

This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We employ eight distinct datasets that encompass aspects including entity, relation and event extraction, link prediction, and question answering. Empirically, our findings suggest that GPT-4 outperforms ChatGPT in the majority of tasks and even surpasses fine-tuned models in certain reasoning and question-answering datasets. Moreover, our investigation extends to the potential generalization ability of LLMs for information extraction, which culminates in the presentation of the Virtual Knowledge Extraction task and the development of the VINE dataset. Drawing on these empirical findings, we further propose AutoKG, a multi-agent-based approach employing LLMs for KG construction and reasoning, which aims to chart the future of this field and offer exciting opportunities for advancement. We anticipate that our research can provide invaluable insights for future undertakings of KG\footnote{Code and datasets will be available in //github.com/zjunlp/AutoKG.

深度學習 · 問答系統 · 閱讀理解 ·

2019 年 10 月 18 日

[付費5元查看完(wan)整內容]FlowQA: Grasping Flow in History for Conversational Machine Comprehension

專知會員服務

專知，提供專業可信的知識分發服務，讓認知協作更快更好！

《FlowQA: Grasping Flow in History for Conversational Machine Comprehension.》Hsin-YuanHuang, Eunsol Choi,Wen-tauYih [ICLR] (2019)

會話機(ji)器理解需要對會話歷史(shi)有深刻的(de)理解，為了使傳統的(de)單(dan)圈模(mo)型能(neng)(neng)夠進行(xing)全(quan)面編(bian)碼，作者引入Flow機(ji)制，該機(ji)制可以通(tong)過交替并(bing)行(xing)處(chu)理結(jie)構合并(bing)在回答先(xian)前問題的(de)過程中生(sheng)成的(de)中間表示。與先(xian)前的(de)將問題/答案(an)作為輸(shu)入的(de)方法相(xiang)比，Flow更(geng)深入地整合了歷史(shi)對話的(de)潛在語義(yi)。其性能(neng)(neng)也(ye)優于SCONE中的(de)所有三個領域中的(de)最佳模(mo)型，準確性提(ti)高了2.6%

Github項目地址：//github.com/momohuang/FlowQA

付費5元查看完整內容