又黄又爽又色的视频免费,亚洲成A人片在线观看网站黄

Retrieval-augmented Generation (RAG) systems have been actively studied and deployed across various industries to query on domain-specific knowledge base. However, evaluating these systems presents unique challenges due to the scarcity of domain-specific queries and corresponding ground truths, as well as a lack of systematic approaches to diagnosing the cause of failure cases -- whether they stem from knowledge deficits or issues related to system robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising two key elements: 1) a data generation process that leverages relational databases and LLMs to efficiently produce scalable query-answer pairs. This method facilitates the separation of query logic from linguistic variations for enhanced debugging capabilities; and 2) an evaluation framework that differentiates knowledge gaps from robustness and enables the identification of defective modules. Our empirical results underscore the limitations of current reference-free evaluation approaches and the reliability of GRAMMAR to accurately identify model vulnerabilities.

相關內容

知識 (knowledge)

關注 12

通過學習、實踐或探索所獲得的認識、判斷或技能。

FAST · MoDELS · FPS · 變換 · Performance ·

2024 年 7 月 9 日

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

Yangguang Li,Bin Huang,Zeren Chen,Yufeng Cui,Feng Liang,Mingzhu Shen,Fenggang Liu,Enze Xie,Lu Sheng,Wanli Ouyang,Jing Shao

from arxiv, arXiv admin note: text overlap with arXiv:2301.07870

Recently, perception task based on Bird's-Eye View (BEV) representation has drawn more and more attention, and BEV representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception. However, most existing BEV solutions either require considerable resources to execute on-vehicle inference or suffer from modest performance. This paper proposes a simple yet effective framework, termed Fast-BEV , which is capable of performing faster BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive transformer based transformation nor depth representation. Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference. We further introduce (4) a strong data augmentation strategy for both image and BEV space to avoid over-fitting, (5) a multi-frame feature fusion mechanism to leverage the temporal information. Through experiments, on 2080Ti platform, our R50 model can run 52.6 FPS with 47.3% NDS on the nuScenes validation set, exceeding the 41.3 FPS and 47.5% NDS of the BEVDepth-R50 model and 30.2 FPS and 45.7% NDS of the BEVDet4D-R50 model. Our largest model (R101@900x1600) establishes a competitive 53.5% NDS on the nuScenes validation set. We further develop a benchmark with considerable accuracy and efficiency on current popular on-vehicle chips. The code is released at: //github.com/Sense-GVT/Fast-BEV.

Performance · 數據選擇 · MoDELS · SOTA · 數據集 ·

2024 年 7 月 9 日

Efficient-Empathy: Towards Efficient and Effective Selection of Empathy Data

Linzhuang Sun,Hao Liang,Jingxuan Wei,Linkun Sun,Bihui Yu,Bin Cui,Wentao Zhang

In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capability has become a crucial prerequisite. Consequently, managing and understanding large-scale video datasets has gained increasing importance. However, empathetic data are typically trained without any quality selection, leading to inefficient data usage and wasted computational resources. Additionally, using raw data can result in low performance in empathetic dialogues. In this work, we present Efficient-Empathy, a sensibility and rationality score-based data selection algorithm that automatically selects sensibility and rationality data while discarding low-quality data. With only the sensibility data (59% of the full dataset), our trained sensibility model efficiently achieves state-of-the-art (SoTA) performance. Furthermore, with multiple data selection hyperparameters, the sensibility model demonstrates SoTA performance, showcasing the robustness of our method. By integrating sensibility and rationality data with a MoE structure, we achieve even higher performance, demonstrating the effectiveness of our Efficient-Empathy algorithm.

Networking · xgboost · 模型評估 · 可辨認的 · 預測準確率 ·

2024 年 7 月 9 日

Early Detection of Network Service Degradation: An Intra-Flow Approach

Balint Bicski,Adrian Pekar

from arxiv, 9 pages, 9 figures, 1 table

This research presents a novel method for predicting service degradation (SD) in computer networks by leveraging early flow features. Our approach focuses on the observable (O) segments of network flows, particularly analyzing Packet Inter-Arrival Time (PIAT) values and other derived metrics, to infer the behavior of non-observable (NO) segments. Through a comprehensive evaluation, we identify an optimal O/NO split threshold of 10 observed delay samples, balancing prediction accuracy and resource utilization. Evaluating models including Logistic Regression, XGBoost, and Multi-Layer Perceptron, we find XGBoost outperforms others, achieving an F1-score of 0.74, balanced accuracy of 0.84, and AUROC of 0.97. Our findings highlight the effectiveness of incorporating comprehensive early flow features and the potential of our method to offer a practical solution for monitoring network traffic in resource-constrained environments. This approach ensures enhanced user experience and network performance by preemptively addressing potential SD, providing the basis for a robust framework for maintaining high-quality network services.

DNN · MS · IDC · Networking · Neural Networks ·

2024 年 7 月 8 日

TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural Networks

Amin Abbasishahkoo,Mahboubeh Dadkhah,Lionel Briand,Dayi Lin

Successful deployment of Deep Neural Networks (DNNs) requires their validation with an adequate test set to ensure a sufficient degree of confidence in test outcomes. Although well-established test adequacy assessment techniques have been proposed for DNNs, we still need to investigate their application within a comprehensive methodology for accurately predicting the fault detection ability of test sets and thus assessing their adequacy. In this paper, we propose and evaluate TEASMA, a comprehensive and practical methodology designed to accurately assess the adequacy of test sets for DNNs. In practice, TEASMA allows engineers to decide whether they can trust high-accuracy test results and thus validate the DNN before its deployment. Based on a DNN model's training set, TEASMA provides a procedure to build accurate DNN-specific prediction models of the Fault Detection Rate (FDR) of a test set using an existing adequacy metric, thus enabling its assessment. We evaluated TEASMA with four state-of-the-art test adequacy metrics: Distance-based Surprise Coverage (DSC), Likelihood-based Surprise Coverage (LSC), Input Distribution Coverage (IDC), and Mutation Score (MS). Our extensive empirical evaluation across multiple DNN models and input sets such as ImageNet, reveals a strong linear correlation between the predicted and actual FDR values derived from MS, DSC, and IDC, with minimum R^2 values of 0.94 for MS and 0.90 for DSC and IDC. Furthermore, a low average Root Mean Square Error (RMSE) of 9% between actual and predicted FDR values across all subjects, when relying on regression analysis and MS, demonstrates the latter's superior accuracy when compared to DSC and IDC, with RMSE values of 0.17 and 0.18, respectively. Overall, these results suggest that TEASMA provides a reliable basis for confidently deciding whether to trust test results for DNN models.

可理解性 · MoDELS · NLP · CASES · 真實值 ·

2024 年 7 月 7 日

IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning

Abhinav Joshi,Shounak Paul,Akshat Sharma,Pawan Goyal,Saptarshi Ghosh,Ashutosh Modi

from arxiv, Accepted at ACL 2024 Main Conference; 40 Pages (9 Pages + References + Appendix)

Legal systems worldwide are inundated with exponential growth in cases and documents. There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents to streamline the legal system. However, evaluating and comparing various NLP models designed specifically for the legal domain is challenging. This paper addresses this challenge by proposing IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning. IL-TUR contains monolingual (English, Hindi) and multi-lingual (9 Indian languages) domain-specific tasks that address different aspects of the legal system from the point of view of understanding and reasoning over Indian legal documents. We present baseline models (including LLM-based) for each task, outlining the gap between models and the ground truth. To foster further research in the legal domain, we create a leaderboard (available at: //exploration-lab.github.io/IL-TUR/) where the research community can upload and compare legal text understanding systems.

Integration · Networking · 有向 · 語言模型化 · 大語言模型 ·

2024 年 7 月 5 日

Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions

Shumaila Javaid,Ruhul Amin Khalil,Nasir Saeed,Bin He,Mohamed-Slim Alouini

Integrated satellite, aerial, and terrestrial networks (ISATNs) represent a sophisticated convergence of diverse communication technologies to ensure seamless connectivity across different altitudes and platforms. This paper explores the transformative potential of integrating Large Language Models (LLMs) into ISATNs, leveraging advanced Artificial Intelligence (AI) and Machine Learning (ML) capabilities to enhance these networks. We outline the current architecture of ISATNs and highlight the significant role LLMs can play in optimizing data flow, signal processing, and network management to advance 5G/6G communication technologies through advanced predictive algorithms and real-time decision-making. A comprehensive analysis of ISATN components is conducted, assessing how LLMs can effectively address traditional data transmission and processing bottlenecks. The paper delves into the network management challenges within ISATNs, emphasizing the necessity for sophisticated resource allocation strategies, traffic routing, and security management to ensure seamless connectivity and optimal performance under varying conditions. Furthermore, we examine the technical challenges and limitations associated with integrating LLMs into ISATNs, such as data integration for LLM processing, scalability issues, latency in decision-making processes, and the design of robust, fault-tolerant systems. The study also identifies key future research directions for fully harnessing LLM capabilities in ISATNs, which is crucial for enhancing network reliability, optimizing performance, and achieving a truly interconnected and intelligent global network system.

大語言模型 · binary · 語言模型化 · 情景 · 得分 ·

2024 年 7 月 5 日

Evaluating Quality of Answers for Retrieval-Augmented Generation: A Strong LLM Is All You Need

Yang Wang,Alberto Garcia Hernandez,Roman Kyslyi,Nicholas Kersting

from arxiv, 13 pages, 8 figures, 12 tables

We present a comprehensive study of answer quality evaluation in Retrieval-Augmented Generation (RAG) applications using vRAG-Eval, a novel grading system that is designed to assess correctness, completeness, and honesty. We further map the grading of quality aspects aforementioned into a binary score, indicating an accept or reject decision, mirroring the intuitive "thumbs-up" or "thumbs-down" gesture commonly used in chat applications. This approach suits factual business settings where a clear decision opinion is essential. Our assessment applies vRAG-Eval to two Large Language Models (LLMs), evaluating the quality of answers generated by a vanilla RAG application. We compare these evaluations with human expert judgments and find a substantial alignment between GPT-4's assessments and those of human experts, reaching 83% agreement on accept or reject decisions. This study highlights the potential of LLMs as reliable evaluators in closed-domain, closed-ended settings, particularly when human evaluations require significant resources.

估計/估計量 · NOMA · 通道 · 多樣性 · Networking ·

2024 年 7 月 5 日

Channel State Acquisition in Uplink NOMA for Cellular-Connected UAV: Exploitation of Doppler and Modulation Diversities

Donatella Darsena,Ivan Iudice,Francesco Verde

from arxiv, 17 pages, 8 figures, 1 Table, submitted to IEEE Open Journal of the Communications Society

Integration of unmanned aerial vehicles (UAVs) for surveillance or monitoring applications into fifth generation (5G) New Radio (NR) cellular networks is an intriguing problem that has recently tackled a lot of interest in both academia and industry. For an efficient spectrum usage, we consider a recently-proposed sky-ground nonorthogonal multiple access (NOMA) scheme, where a cellular-connected UAV acting as aerial user (AU) and a static terrestrial user (TU) are paired to simultaneously transmit their uplink signals to a base station (BS) in the same time-frequency resource blocks. In such a case, due to the highly dynamic nature of the UAV, the signal transmitted by the AU experiences both time dispersion due to multipath propagation effects and frequency dispersion caused by Doppler shifts. On the other hand, for a static ground network, frequency dispersion of the signal transmitted by the TU is negligible and only multipath effects have to be taken into account. To decode the superposed signals at the BS through successive interference cancellation, accurate estimates of both the AU and TU channels are needed. In this paper, we propose channel estimation procedures that suitably exploit the different circular/noncircular modulation formats (modulation diversity) and the different almost-cyclostationarity features (Doppler diversity) of the AU and TU by means of widely-linear time-varying processing. Our estimation approach is semi-blind since Doppler shifts and time delays of the AU are estimated based on the received data only, whereas the remaining relevant parameters of the AU and TU channels are acquired relying also on the available training symbols, which are transmitted by the AU and TU in a nonorthogonal manner.

優化器 · 樣本 · 推斷 · 去噪 · MoDELS ·

2024 年 7 月 5 日

DISCO: Efficient Diffusion Solver for Large-Scale Combinatorial Optimization Problems

Kexiong Yu,Hang Zhao,Yuhang Huang,Renjiao Yi,Kai Xu,Chenyang Zhu

Combinatorial Optimization (CO) problems are fundamentally crucial in numerous practical applications across diverse industries, characterized by entailing enormous solution space and demanding time-sensitive response. Despite significant advancements made by recent neural solvers, their limited expressiveness does not conform well to the multi-modal nature of CO landscapes. While some research has pivoted towards diffusion models, they require simulating a Markov chain with many steps to produce a sample, which is time-consuming and does not meet the efficiency requirement of real applications, especially at scale. We propose DISCO, an efficient DIffusion Solver for Combinatorial Optimization problems that excels in both solution quality and inference speed. DISCO's efficacy is two-pronged: Firstly, it achieves rapid denoising of solutions through an analytically solvable form, allowing for direct sampling from the solution space with very few reverse-time steps, thereby drastically reducing inference time. Secondly, DISCO enhances solution quality by restricting the sampling space to a more constrained, meaningful domain guided by solution residues, while still preserving the inherent multi-modality of the output probabilistic distributions. DISCO achieves state-of-the-art results on very large Traveling Salesman Problems with 10000 nodes and challenging Maximal Independent Set benchmarks, with its per-instance denoising time up to 44.8 times faster. Through further combining a divide-and-conquer strategy, DISCO can be generalized to solve arbitrary-scale problem instances off the shelf, even outperforming models trained specifically on corresponding scales.

INFORMS · 圖 · 可約的 · 知識圖譜 · 可辨認的 ·

2018 年 8 月 29 日

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Yi Luan,Luheng He,Mari Ostendorf,Hannaneh Hajishirzi

We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.