精品自在线观看影片天天看,人人操人人干人人上,亚洲日本VA久久一区二区,日本一区二区三区在线免费观看视频

This paper presents a comparative analysis of hallucination detection systems for AI, focusing on automatic summarization and question answering tasks for Large Language Models (LLMs). We evaluate different hallucination detection systems using the diagnostic odds ratio (DOR) and cost-effectiveness metrics. Our results indicate that although advanced models can perform better they come at a much higher cost. We also demonstrate how an ideal hallucination detection system needs to maintain performance across different model sizes. Our findings highlight the importance of choosing a detection system aligned with specific application needs and resource constraints. Future research will explore hybrid systems and automated identification of underperforming components to enhance AI reliability and efficiency in detecting and mitigating hallucinations.

相關內容

Analysis

關注 2

MoDELS · 語言模型化 · Neural Networks · 長短期記憶網絡 · Performer ·

2024 年 12 月 20 日

BabyHGRN: Exploring RNNs for Sample-Efficient Training of Language Models

Patrick Haller,Jonas Golde,Alan Akbik

from arxiv, 7 pages, 7 figures and tables, Published in Proceedings of the BabyLM Challenge 2025

This paper explores the potential of recurrent neural networks (RNNs) and other subquadratic architectures as competitive alternatives to transformer-based models in low-resource language modeling scenarios. We utilize HGRN2 (Qin et al., 2024), a recently proposed RNN-based architecture, and comparatively evaluate its effectiveness against transformer-based baselines and other subquadratic architectures (LSTM, xLSTM, Mamba). Our experimental results show that BABYHGRN, our HGRN2 language model, outperforms transformer-based models in both the 10M and 100M word tracks of the challenge, as measured by their performance on the BLiMP, EWoK, GLUE and BEAR benchmarks. Further, we show the positive impact of knowledge distillation. Our findings challenge the prevailing focus on transformer architectures and indicate the viability of RNN-based models, particularly in resource-constrained environments.

Analysis · ASSETS · 可理解性 · 穩健性 · 可辨認的 ·

2024 年 12 月 20 日

Unveiling the Mechanisms of DAI: A Logic-Based Approach to Stablecoin Analysis

Francesco De Sclavis,Giuseppe Galano,Aldo Glielmo,Matteo Nardelli

Stablecoins are digital assets designed to maintain a stable value, typically pegged to traditional currencies. Despite their growing prominence, many stablecoins have struggled to consistently meet stability expectations, and their underlying mechanisms often remain opaque and challenging to analyze. This paper focuses on the DAI stablecoin, which combines crypto-collateralization and algorithmic mechanisms. We propose a formal logic-based framework for representing the policies and operations of DAI, implemented in Prolog and released as open-source software. Our framework enables detailed analysis and simulation of DAI's stability mechanisms, providing a foundation for understanding its robustness and identifying potential vulnerabilities.

泛化理論 · MoDELS · Learning · 集成 · Performer ·

2024 年 12 月 20 日

Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting

Jinning Li,Jiachen Li,Sangjae Bae,David Isele

Deep learning-based trajectory prediction models for autonomous driving often struggle with generalization to out-of-distribution (OOD) scenarios, sometimes performing worse than simple rule-based models. To address this limitation, we propose a novel framework, Adaptive Prediction Ensemble (APE), which integrates deep learning and rule-based prediction experts. A learned routing function, trained concurrently with the deep learning model, dynamically selects the most reliable prediction based on the input scenario. Our experiments on large-scale datasets, including Waymo Open Motion Dataset (WOMD) and Argoverse, demonstrate improvement in zero-shot generalization across datasets. We show that our method outperforms individual prediction models and other variants, particularly in long-horizon prediction and scenarios with a high proportion of OOD data. This work highlights the potential of hybrid approaches for robust and generalizable motion prediction in autonomous driving. More details can be found on the project page: //sites.google.com/view/ape-generalization.

MoDELS · 設計 · 生成模型 · 離散化 · 余弦 ·

2024 年 12 月 19 日

DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space

Mang Ning,Mingxiao Li,Jianlin Su,Haozhe Jia,Lanmiao Liu,Martin Bene?,Albert Ali Salah,Itir Onal Ertugrul

from arxiv, 23 pages

This paper explores image modeling from the frequency space and introduces DCTdiff, an end-to-end diffusion generative paradigm that efficiently models images in the discrete cosine transform (DCT) space. We investigate the design space of DCTdiff and reveal the key design factors. Experiments on different frameworks (UViT, DiT), generation tasks, and various diffusion samplers demonstrate that DCTdiff outperforms pixel-based diffusion models regarding generative quality and training efficiency. Remarkably, DCTdiff can seamlessly scale up to high-resolution generation without using the latent diffusion paradigm. Finally, we illustrate several intriguing properties of DCT image modeling. For example, we provide a theoretical proof of why `image diffusion can be seen as spectral autoregression', bridging the gap between diffusion and autoregressive models. The effectiveness of DCTdiff and the introduced properties suggest a promising direction for image modeling in the frequency space. The code is at \url{//github.com/forever208/DCTdiff}.

知識 (knowledge) · 蒸餾 · Processing（編程語言） · Learning · Guidance ·

2024 年 12 月 18 日

On Explaining Knowledge Distillation: Measuring and Visualising the Knowledge Transfer Process

Gereziher Adhane,Mohammad Mahdi Dehshibi,Dennis Vetter,David Masip,Gemma Roig

from arxiv, Accepted to 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV'25). Includes 5 pages of supplementary material

Knowledge distillation (KD) remains challenging due to the opaque nature of the knowledge transfer process from a Teacher to a Student, making it difficult to address certain issues related to KD. To address this, we proposed UniCAM, a novel gradient-based visual explanation method, which effectively interprets the knowledge learned during KD. Our experimental results demonstrate that with the guidance of the Teacher's knowledge, the Student model becomes more efficient, learning more relevant features while discarding those that are not relevant. We refer to the features learned with the Teacher's guidance as distilled features and the features irrelevant to the task and ignored by the Student as residual features. Distilled features focus on key aspects of the input, such as textures and parts of objects. In contrast, residual features demonstrate more diffused attention, often targeting irrelevant areas, including the backgrounds of the target objects. In addition, we proposed two novel metrics: the feature similarity score (FSS) and the relevance score (RS), which quantify the relevance of the distilled knowledge. Experiments on the CIFAR10, ASIRRA, and Plant Disease datasets demonstrate that UniCAM and the two metrics offer valuable insights to explain the KD process.

Networking · Learning · 對抗學習 · 可辨認的 · 講稿 ·

2024 年 12 月 18 日

A Review of the Duality of Adversarial Learning in Network Intrusion: Attacks and Countermeasures

Shalini Saini,Anitha Chennamaneni,Babatunde Sawyerr

from arxiv, 23 pages, 2 figures, 5 tables

Deep learning solutions are instrumental in cybersecurity, harnessing their ability to analyze vast datasets, identify complex patterns, and detect anomalies. However, malevolent actors can exploit these capabilities to orchestrate sophisticated attacks, posing significant challenges to defenders and traditional security measures. Adversarial attacks, particularly those targeting vulnerabilities in deep learning models, present a nuanced and substantial threat to cybersecurity. Our study delves into adversarial learning threats such as Data Poisoning, Test Time Evasion, and Reverse Engineering, specifically impacting Network Intrusion Detection Systems. Our research explores the intricacies and countermeasures of attacks to deepen understanding of network security challenges amidst adversarial threats. In our study, we present insights into the dynamic realm of adversarial learning and its implications for network intrusion. The intersection of adversarial attacks and defenses within network traffic data, coupled with advances in machine learning and deep learning techniques, represents a relatively underexplored domain. Our research lays the groundwork for strengthening defense mechanisms to address the potential breaches in network security and privacy posed by adversarial attacks. Through our in-depth analysis, we identify domain-specific research gaps, such as the scarcity of real-life attack data and the evaluation of AI-based solutions for network traffic. Our focus on these challenges aims to stimulate future research efforts toward the development of resilient network defense strategies.

秩 · 得分 · ROC · Machine Learning · Processing（編程語言） ·

2024 年 12 月 18 日

The Tile: A 2D Map of Ranking Scores for Two-Class Classification

Sébastien Piérard,Ana?s Halin,Anthony Cioppa,Adrien Deliège,Marc Van Droogenbroeck

In the computer vision and machine learning communities, as well as in many other research domains, rigorous evaluation of any new method, including classifiers, is essential. One key component of the evaluation process is the ability to compare and rank methods. However, ranking classifiers and accurately comparing their performances, especially when taking application-specific preferences into account, remains challenging. For instance, commonly used evaluation tools like Receiver Operating Characteristic (ROC) and Precision/Recall (PR) spaces display performances based on two scores. Hence, they are inherently limited in their ability to compare classifiers across a broader range of scores and lack the capability to establish a clear ranking among classifiers. In this paper, we present a novel versatile tool, named the Tile, that organizes an infinity of ranking scores in a single 2D map for two-class classifiers, including common evaluation scores such as the accuracy, the true positive rate, the positive predictive value, Jaccard's coefficient, and all F-beta scores. Furthermore, we study the properties of the underlying ranking scores, such as the influence of the priors or the correspondences with the ROC space, and depict how to characterize any other score by comparing them to the Tile. Overall, we demonstrate that the Tile is a powerful tool that effectively captures all the rankings in a single visualization and allows interpreting them.

知識 (knowledge) · MoDELS · 圖 · 知識圖譜 · AIM ·

2022 年 12 月 12 日

Reasoning over Different Types of Knowledge Graphs: Static, Temporal and Multi-Modal

Ke Liang,Lingyuan Meng,Meng Liu,Yue Liu,Wenxuan Tu,Siwei Wang,Sihang Zhou,Xinwang Liu,Fuchun Sun

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Knowledge graph reasoning (KGR), aiming to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs), has become a fast-growing research direction. It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering and recommendation systems, etc. According to the graph types, the existing KGR models can be roughly divided into three categories, \textit{i.e.,} static models, temporal models, and multi-modal models. The early works in this domain mainly focus on static KGR and tend to directly apply general knowledge graph embedding models to the reasoning task. However, these models are not suitable for more complex but practical tasks, such as inductive static KGR, temporal KGR, and multi-modal KGR. To this end, multiple works have been developed recently, but no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a survey for knowledge graph reasoning tracing from static to temporal and then to multi-modal KGs. Concretely, the preliminaries, summaries of KGR models, and typical datasets are introduced and discussed consequently. Moreover, we discuss the challenges and potential opportunities. The corresponding open-source repository is shared on GitHub: //github.com/LIANGKE23/Awesome-Knowledge-Graph-Reasoning.

知識 (knowledge) · Processing（編程語言） · 圖 · NLP · 知識圖譜 ·

2022 年 9 月 30 日

A Decade of Knowledge Graphs in Natural Language Processing: A Survey

Phillip Schneider,Tim Schopf,Juraj Vladika,Mikhail Galkin,Elena Simperl,Florian Matthes

from arxiv, Accepted to AACL-IJCNLP 2022

In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.

注意力機制 · MoDELS · 學成 · Performer · 卷積神經網絡 ·

2018 年 5 月 20 日

Attention U-Net: Learning Where to Look for the Pancreas

Ozan Oktay,Jo Schlemper,Loic Le Folgoc,Matthew Lee,Mattias Heinrich,Kazunari Misawa,Kensaku Mori,Steven McDonagh,Nils Y Hammerla,Bernhard Kainz,Ben Glocker,Daniel Rueckert

from arxiv, Accepted to published in MIDL'18 (Revised Version) / OpenReview link: //openreview.net/forum?id=Skft7cijM

We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.