亚洲综合蜜桃久久丁香婷,国产乱人弄视频免费观看,在线免费看日韩AV网站,欧美一区二区AA一级片

We present a comprehensive study of answer quality evaluation in Retrieval-Augmented Generation (RAG) applications using vRAG-Eval, a novel grading system that is designed to assess correctness, completeness, and honesty. We further map the grading of quality aspects aforementioned into a binary score, indicating an accept or reject decision, mirroring the intuitive "thumbs-up" or "thumbs-down" gesture commonly used in chat applications. This approach suits factual business contexts where a clear decision opinion is essential. Our assessment applies vRAG-Eval to two Large Language Models (LLMs), evaluating the quality of answers generated by a vanilla RAG application. We compare these evaluations with human expert judgments and find a substantial alignment between GPT-4's assessments and those of human experts, reaching 83% agreement on accept or reject decisions. This study highlights the potential of LLMs as reliable evaluators in closed-domain, closed-ended settings, particularly when human evaluations require significant resources.

相關內容

大語言模型

關注 55

大語言模型是基于海量文本數據訓練的深度學習模型。它不僅能夠生成自然語言文本，還能夠深入理解文本含義，處理各種自然語言任務，如文本摘要、問答、翻譯等。2023年，大語言模型及其在人工智能領域的應用已成為全球科技研究的熱點，其在規模上的增長尤為引人注目，參數量已從最初的十幾億躍升到如今的一萬億。參數量的提升使得模型能夠更加精細地捕捉人類語言微妙之處，更加深入地理解人類語言的復雜性。在過去的一年里，大語言模型在吸納新知識、分解復雜任務以及圖文對齊等多方面都有顯著提升。隨著技術的不斷成熟，它將不斷拓展其應用范圍，為人類提供更加智能化和個性化的服務，進一步改善人們的生活和生產方式。

優化器 · Prompt · 語言模型化 · MoDELS · 大語言模型 ·

2024 年 12 月 19 日

A Comparative Study of DSPy Teleprompter Algorithms for Aligning Large Language Models Evaluation Metrics to Human Evaluation

Bhaskarjit Sarmah,Kriti Dutta,Anna Grigoryan,Sachin Tiwari,Stefano Pasquali,Dhagash Mehta

from arxiv, 7 pages, 10 tables, two-column format

We argue that the Declarative Self-improving Python (DSPy) optimizers are a way to align the large language model (LLM) prompts and their evaluations to the human annotations. We present a comparative analysis of five teleprompter algorithms, namely, Cooperative Prompt Optimization (COPRO), Multi-Stage Instruction Prompt Optimization (MIPRO), BootstrapFewShot, BootstrapFewShot with Optuna, and K-Nearest Neighbor Few Shot, within the DSPy framework with respect to their ability to align with human evaluations. As a concrete example, we focus on optimizing the prompt to align hallucination detection (using LLM as a judge) to human annotated ground truth labels for a publicly available benchmark dataset. Our experiments demonstrate that optimized prompts can outperform various benchmark methods to detect hallucination, and certain telemprompters outperform the others in at least these experiments.

穩健性 · INFORMS · Neural Networks · ReLU · PDE ·

2024 年 12 月 19 日

Numerical Robustness of PINNs for Multiscale Transport Equations

Alexander Jesser,Kai Krycki,Ryan G. McClarren,Martin Frank

We investigate the numerical solution of multiscale transport equations using Physics Informed Neural Networks (PINNs) with ReLU activation functions. Therefore, we study the analogy between PINNs and Least-Squares Finite Elements (LSFE) which lies in the shared approach to reformulate the PDE solution as a minimization of a quadratic functional. We prove that in the diffusive regime, the correct limit is not reached, in agreement with known results for first-order LSFE. A diffusive scaling is introduced that can be applied to overcome this, again in full agreement with theoretical results for LSFE. We provide numerical results in the case of slab geometry that support our theoretical findings.

MoDELS · Learning · 代碼 · Automator · CASES ·

2024 年 12 月 18 日

Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation

Benjamin Steenhoek,Michele Tufano,Neel Sundaresan,Alexey Svyatkovskiy

from arxiv, Accepted to DeepTest 2025 (ICSE Workshop). arXiv admin note: text overlap with arXiv:2310.02368

Software testing is a crucial but time-consuming aspect of software development, and recently, Large Language Models (LLMs) have gained popularity for automated test case generation. However, because LLMs are trained on vast amounts of open-source code, they often generate test cases that do not adhere to best practices and may even contain test smells (anti-patterns). To address this issue, we propose Reinforcement Learning from Static Quality Metrics (RLSQM), wherein we utilize Reinforcement Learning to generate high-quality unit tests based on static analysis-based quality metrics. First, we analyzed LLM-generated tests and show that LLMs frequently do generate undesirable test smells -- up to 37% of the time. Then, we implemented lightweight static analysis-based reward model and trained LLMs using this reward model to optimize for five code quality metrics. Our experimental results demonstrate that the RL-optimized Codex model consistently generated higher-quality test cases than the base LLM, improving quality metrics by up to 23%, and generated nearly 100% syntactically-correct code. RLSQM also outperformed GPT-4 on all code quality metrics, in spite of training a substantially cheaper Codex model. We provide insights into how reliably utilize RL to improve test generation quality and show that RLSQM is a significant step towards enhancing the overall efficiency and reliability of automated software testing. Our data are available at //doi.org/10.6084/m9.figshare.25983166.

3D · 模態 · 數據集 · Principle · 回合 ·

2024 年 12 月 18 日

MobiFuse: A High-Precision On-device Depth Perception System with Multi-Data Fusion

Jinrui Zhang,Deyu Zhang,Tingting Long,Wenxin Chen,Ju Ren,Yunxin Liu,Yudong Zhao,Yaoxue Zhang,Youngki Lee

We present MobiFuse, a high-precision depth perception system on mobile devices that combines dual RGB and Time-of-Flight (ToF) cameras. To achieve this, we leverage physical principles from various environmental factors to propose the Depth Error Indication (DEI) modality, characterizing the depth error of ToF and stereo-matching. Furthermore, we employ a progressive fusion strategy, merging geometric features from ToF and stereo depth maps with depth error features from the DEI modality to create precise depth maps. Additionally, we create a new ToF-Stereo depth dataset, RealToF, to train and validate our model. Our experiments demonstrate that MobiFuse excels over baselines by significantly reducing depth measurement errors by up to 77.7%. It also showcases strong generalization across diverse datasets and proves effectiveness in two downstream tasks: 3D reconstruction and 3D segmentation. The demo video of MobiFuse in real-life scenarios is available at the de-identified YouTube link(//youtu.be/jy-Sp7T1LVs).

卷積 · 步幅 · 層 · Neural Networks · Networking ·

2024 年 12 月 18 日

USEFUSE: Utile Stride for Enhanced Performance in Fused Layer Architecture of Deep Neural Networks

Muhammad Sohail Ibrahim,Muhammad Usman,Jeong-A Lee

Convolutional Neural Networks (CNNs) are crucial in various applications, but their deployment on resource-constrained edge devices poses challenges. This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic to minimize response time and enhance overall performance. The study proposes a methodology for fusing multiple convolution layers to reduce off-chip memory communication and increase overall performance. An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption without compromising accuracy. Furthermore, efficient tile movement guarantees uniform access to the fusion pyramid. An analysis demonstrates the utile stride strategy improves operational intensity. Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency. This approach notably reduced redundant computations, improving the efficiency of CNN deployment on edge devices.

Integration · Learning · MoDELS · 語言模型化 · 聯邦學習 ·

2024 年 12 月 18 日

Federated Learning and RAG Integration: A Scalable Approach for Medical Large Language Models

Jincheol Jung,Hongju Jeong,Eui-Nam Huh

This study analyzes the performance of domain-specific Large Language Models (LLMs) for the medical field by integrating Retrieval-Augmented Generation (RAG) systems within a federated learning framework. Leveraging the inherent advantages of federated learning, such as preserving data privacy and enabling distributed computation, this research explores the integration of RAG systems with models trained under varying client configurations to optimize performance. Experimental results demonstrate that the federated learning-based models integrated with RAG systems consistently outperform their non-integrated counterparts across all evaluation metrics. This study highlights the potential of combining federated learning and RAG systems for developing domain-specific LLMs in the medical field, providing a scalable and privacy-preserving solution for enhancing text generation capabilities.

語言模型化 · 大語言模型 · MoDELS · Integration · 模型評估 ·

2024 年 4 月 17 日

A Survey on Retrieval-Augmented Text Generation for Large Language Models

Yizheng Huang,Jimmy Huang

from arxiv, Ongoing work

Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enhancing the accuracy and reliability of their outputs through the use of real-world data. As RAG grows in complexity and incorporates multiple concepts that can influence its performance, this paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation, offering a detailed perspective from the retrieval viewpoint. It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies. Additionally, the paper introduces evaluation methods for RAG, addressing the challenges faced and proposing future research directions. By offering an organized framework and categorization, the study aims to consolidate existing research on RAG, clarify its technological underpinnings, and highlight its potential to broaden the adaptability and applications of LLMs.

語言模型化 · MoDELS · 知識 (knowledge) · 蒸餾 · 大語言模型 ·

2024 年 2 月 20 日

A Survey on Knowledge Distillation of Large Language Models

Xiaohan Xu,Ming Li,Chongyang Tao,Tao Shen,Reynold Cheng,Jinyang Li,Can Xu,Dacheng Tao,Tianyi Zhou

from arxiv, 43 pages

This survey presents an in-depth exploration of knowledge distillation (KD) techniques within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in transferring sophisticated capabilities from proprietary giants such as GPT-4 to accessible, open-source models like LLaMA and Mistral. Amidst the evolving AI landscape, this work elucidates the critical disparities between proprietary and open-source LLMs, demonstrating how KD serves as an essential conduit for imbuing the latter with the former's advanced functionalities and nuanced understandings. Our survey is meticulously structured around three foundational pillars: algorithm, skill, and verticalization -- providing a comprehensive examination of KD mechanisms, the enhancement of specific cognitive abilities, and their practical implications across diverse fields. Crucially, the survey navigates the intricate interplay between data augmentation (DA) and KD, illustrating how DA emerges as a powerful paradigm within the KD framework to bolster LLMs' performance. By leveraging DA to generate context-rich, skill-specific training data, KD transcends traditional boundaries, enabling open-source models to approximate the contextual adeptness, ethical alignment, and deep semantic insights characteristic of their proprietary counterparts. This work aims to provide an insightful guide for researchers and practitioners, offering a detailed overview of current methodologies in knowledge distillation and proposing future research directions. By bridging the gap between proprietary and open-source LLMs, this survey underscores the potential for more accessible, efficient, and sustainable AI solutions, fostering a more inclusive and equitable landscape in AI advancements. An associated Github repository is available at //github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs.

MoDELS · Vision · 多樣性 · Extensibility · Performer ·

2024 年 1 月 16 日

Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities

Xu Yan,Haiming Zhang,Yingjie Cai,Jingming Guo,Weichao Qiu,Bin Gao,Kaiqiang Zhou,Yue Zhao,Huan Jin,Jiantao Gao,Zhen Li,Lihui Jiang,Wei Zhang,Hongbo Zhang,Dengxin Dai,Bingbing Liu

from arxiv, Github Repo: //github.com/zhanghm1995/Forge_VFM4AD

The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by extracting intricate patterns and performing effectively across diverse tasks, thereby serving as potent building blocks for a wide range of AI applications. Autonomous driving, a vibrant front in AI applications, remains challenged by the lack of dedicated vision foundation models (VFMs). The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs in this field. This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions. Through a systematic analysis of over 250 papers, we dissect essential techniques for VFM development, including data preparation, pre-training strategies, and downstream task adaptation. Moreover, we explore key advancements such as NeRF, diffusion models, 3D Gaussian Splatting, and world models, presenting a comprehensive roadmap for future research. To empower researchers, we have built and maintained //github.com/zhanghm1995/Forge_VFM4AD, an open-access repository constantly updated with the latest advancements in forging VFMs for autonomous driving.

Vision · 圖 · 變換 · Networking · 圖形處理器 ·

2022 年 9 月 27 日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Chaoqi Chen,Yushuang Wu,Qiyuan Dai,Hong-Yu Zhou,Mutian Xu,Sibei Yang,Xiaoguang Han,Yizhou Yu

from arxiv, Preprint

Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (\emph{e.g.,} social network analysis and recommender systems), computer vision (\emph{e.g.,} object detection and point cloud learning), and natural language processing (\emph{e.g.,} relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, \emph{i.e.,} 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.