黄色视频在线观看男人插女人的视频在线观看_国产高潮白浆调教福利在线视频_一本久久A 精品一区二区_日韩免费视频一二三四_日本黄色三级片尤物视频_2020日韩无码中文字幕_人人操很很插日日射

This article proposes a test procedure that can be used to test ML models and ML-based systems independently of the actual training process. In this way, the typical quality statements such as accuracy and precision of these models and system can be verified independently, taking into account their black box character and the immanent stochastic properties of ML models and their training data. The article presents first results from a set of test experiments and suggest extensions to existing test methods reflecting the stochastic nature of ML models and ML-based systems.

相關內容

相互獨立的

關注 1

MoDELS · 論文 · Principle · SimPLe · HTTPS ·

2024 年 3 月 12 日

Joint Modeling of Longitudinal Measurements and Time-to-event Outcomes Using BUGS

Taban Baghfalaki,Mojtaba Ganjali,Antoine Barbieri,Reza Hashemi,Hélène Jacqmin-Gadda

from arxiv, 43 pages, 10 figures

The objective of this paper is to provide an introduction to the principles of Bayesian joint modeling of longitudinal measurements and time-to-event outcomes, as well as model implementation using the BUGS language syntax. This syntax can be executed directly using OpenBUGS or by utilizing convenient functions to invoke OpenBUGS and JAGS from R software. In this paper, all details of joint models are provided, ranging from simple to more advanced models. The presentation started with the joint modeling of a Gaussian longitudinal marker and time-to-event outcome. The implementation of the Bayesian paradigm of the model is reviewed. The strategies for simulating data from the JM are also discussed. A proportional hazard model with various forms of baseline hazards, along with the discussion of all possible association structures between the two sub-models are taken into consideration. The paper covers joint models with multivariate longitudinal measurements, zero-inflated longitudinal measurements, competing risks, and time-to-event with cure fraction. The models are illustrated by the analyses of several real data sets. All simulated and real data and code are available at \url{//github.com/tbaghfalaki/JM-with-BUGS-and-JAGS}.

估計/估計量 · 核化 · Minimax · 相互獨立的 · 統計量 ·

2024 年 3 月 12 日

The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels

Florian Kalinke,Zoltan Szabo

Kernel techniques are among the most influential approaches in data science and statistics. Under mild conditions, the reproducing kernel Hilbert space associated to a kernel is capable of encoding the independence of $M\ge 2$ random variables. Probably the most widespread independence measure relying on kernels is the so-called Hilbert-Schmidt independence criterion (HSIC; also referred to as distance covariance in the statistics literature). Despite various existing HSIC estimators designed since its introduction close to two decades ago, the fundamental question of the rate at which HSIC can be estimated is still open. In this work, we prove that the minimax optimal rate of HSIC estimation on $\mathbb R^d$ for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is $\mathcal O\!\left(n^{-1/2}\right)$. Specifically, our result implies the optimality in the minimax sense of many of the most-frequently used estimators (including the U-statistic, the V-statistic, and the Nystr\"om-based one) on $\mathbb R^d$.

設計 · 可辨認的 · 講稿 · AVS · prototype ·

2024 年 3 月 11 日

Designing for Projection-based Communication between Autonomous Vehicles and Pedestrians

Trung Thanh Nguyen,Kai Hollander,Marius Hoggenmueller,Callum Parker,Martin Tomitsch

Recent studies have investigated new approaches for communicating an autonomous vehicle's (AV) intent and awareness to pedestrians. This paper adds to this body of work by presenting the design and evaluation of in-situ projections on the road. Our design combines common traffic light patterns with aesthetic visual elements. We describe the iterative design process and the prototyping methods used in each stage. The final design concept was represented as a virtual reality simulation and evaluated with 18 participants in four different street crossing scenarios, which included three scenarios that simulated various degrees of system errors. We found that different design elements were able to support participants' confidence in their decision even when the AV failed to correctly detect their presence. We also identified elements in our design that needed to be more clearly communicated. Based on these findings, the paper presents a series of design recommendations for projection-based communication between AVs and pedestrians.

圖像字幕 · 目標檢測 · Performer · MoDELS · Learning ·

2024 年 3 月 10 日

Transformer based Multitask Learning for Image Captioning and Object Detection

Debolena Basak,P. K. Srijith,Maunendra Sankar Desarkar

from arxiv, Accepted at PAKDD 2024

In several real-world scenarios like autonomous navigation and mobility, to obtain a better visual understanding of the surroundings, image captioning and object detection play a crucial role. This work introduces a novel multitask learning framework that combines image captioning and object detection into a joint model. We propose TICOD, Transformer-based Image Captioning and Object detection model for jointly training both tasks by combining the losses obtained from image captioning and object detection networks. By leveraging joint training, the model benefits from the complementary information shared between the two tasks, leading to improved performance for image captioning. Our approach utilizes a transformer-based architecture that enables end-to-end network integration for image captioning and object detection and performs both tasks jointly. We evaluate the effectiveness of our approach through comprehensive experiments on the MS-COCO dataset. Our model outperforms the baselines from image captioning literature by achieving a 3.65% improvement in BERTScore.

優化器 · 設計 · Processing（編程語言） · 最優化 · Extensibility ·

2024 年 3 月 9 日

Collaborative and Distributed Bayesian Optimization via Consensus: Showcasing the Power of Collaboration for Optimal Design

Xubo Yue,Raed Al Kontar,Albert S. Berahas,Yang Liu,Blake N. Johnson

from arxiv, 41 pages

Optimal design is a critical yet challenging task within many applications. This challenge arises from the need for extensive trial and error, often done through simulations or running field experiments. Fortunately, sequential optimal design, also referred to as Bayesian optimization when using surrogates with a Bayesian flavor, has played a key role in accelerating the design process through efficient sequential sampling strategies. However, a key opportunity exists nowadays. The increased connectivity of edge devices sets forth a new collaborative paradigm for Bayesian optimization. A paradigm whereby different clients collaboratively borrow strength from each other by effectively distributing their experimentation efforts to improve and fast-track their optimal design process. To this end, we bring the notion of consensus to Bayesian optimization, where clients agree (i.e., reach a consensus) on their next-to-sample designs. Our approach provides a generic and flexible framework that can incorporate different collaboration mechanisms. In lieu of this, we propose transitional collaborative mechanisms where clients initially rely more on each other to maneuver through the early stages with scant data, then, at the late stages, focus on their own objectives to get client-specific solutions. Theoretically, we show the sub-linear growth in regret for our proposed framework. Empirically, through simulated datasets and a real-world collaborative sensor design experiment, we show that our framework can effectively accelerate and improve the optimal design process and benefit all participants.

Prompt · Performer · Oracle · Analysis · 優化器 ·

2024 年 3 月 8 日

Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis

Sohee Yang,Jonghyeon Kim,Joel Jang,Seonghyeon Ye,Hyunji Lee,Minjoon Seo

from arxiv, TACL 2024 (Pre-MIT Press publication version)

Previous works in prompt engineering for large language models have introduced different gradient-free probability-based prompt selection methods that aim to choose the optimal prompt among the candidates for a given task but have failed to provide a comprehensive and fair comparison between each other. In this paper, we propose a unified framework to interpret and evaluate the existing probability-based prompt selection methods by performing extensive experiments on 13 common and diverse NLP tasks. We find that each of the existing methods can be interpreted as some variant of the method that maximizes mutual information between the input and the predicted output (MI). Utilizing this finding, we develop several other combinatorial variants of MI and increase the effectiveness of the oracle prompt selection method from 87.79% to 94.98%, measured as the ratio of the performance of the selected prompt to that of the optimal oracle prompt. Furthermore, considering that all the methods rely on the output probability distribution of the model that might be biased, we propose a novel calibration method called Calibration by Marginalization (CBM) that is orthogonal to the existing methods and helps increase the prompt selection effectiveness of the best method to 96.85%, achieving 99.44% of the oracle prompt F1 without calibration.

MoDELS · 大語言模型 · Performer · Nuance · 線性的 ·

2024 年 3 月 8 日

CriticBench: Benchmarking LLMs for Critique-Correct Reasoning

Zicheng Lin,Zhibin Gou,Tian Liang,Ruilin Luo,Haowei Liu,Yujiu Yang

from arxiv, Corrected computation errors in Tables 1, 7-11; updated corresponding figs

The ability of Large Language Models (LLMs) to critique and refine their reasoning is crucial for their application in evaluation, feedback provision, and self-improvement. This paper introduces CriticBench, a comprehensive benchmark designed to assess LLMs' abilities to critique and rectify their reasoning across a variety of tasks. CriticBench encompasses five reasoning domains: mathematical, commonsense, symbolic, coding, and algorithmic. It compiles 15 datasets and incorporates responses from three LLM families. Utilizing CriticBench, we evaluate and dissect the performance of 17 LLMs in generation, critique, and correction reasoning, i.e., GQC reasoning. Our findings reveal: (1) a linear relationship in GQC capabilities, with critique-focused training markedly enhancing performance; (2) a task-dependent variation in correction effectiveness, with logic-oriented tasks being more amenable to correction; (3) GQC knowledge inconsistencies that decrease as model size increases; and (4) an intriguing inter-model critiquing dynamic, where stronger models are better at critiquing weaker ones, while weaker models can surprisingly surpass stronger ones in their self-critique. We hope these insights into the nuanced critique-correct reasoning of LLMs will foster further research in LLM critique and self-improvement.

穩健性 · MoDELS · NLP · 可約的 · Performer ·

2024 年 3 月 8 日

The Impact of Quantization on the Robustness of Transformer-based Text Classifiers

Seyed Parsa Neshaei,Yasaman Boreshban,Gholamreza Ghassem-Sani,Seyed Abolghasem Mirroshandel

Transformer-based models have made remarkable advancements in various NLP areas. Nevertheless, these models often exhibit vulnerabilities when confronted with adversarial attacks. In this paper, we explore the effect of quantization on the robustness of Transformer-based models. Quantization usually involves mapping a high-precision real number to a lower-precision value, aiming at reducing the size of the model at hand. To the best of our knowledge, this work is the first application of quantization on the robustness of NLP models. In our experiments, we evaluate the impact of quantization on BERT and DistilBERT models in text classification using SST-2, Emotion, and MR datasets. We also evaluate the performance of these models against TextFooler, PWWS, and PSO adversarial attacks. Our findings show that quantization significantly improves (by an average of 18.68%) the adversarial accuracy of the models. Furthermore, we compare the effect of quantization versus that of the adversarial training approach on robustness. Our experiments indicate that quantization increases the robustness of the model by 18.80% on average compared to adversarial training without imposing any extra computational overhead during training. Therefore, our results highlight the effectiveness of quantization in improving the robustness of NLP models.

知識 (knowledge) · 語言模型化 · MoDELS · NLU · Learning ·

2022 年 11 月 17 日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Linmei Hu,Zeyi Liu,Ziwang Zhao,Lei Hou,Liqiang Nie,Juanzi Li

Pre-trained Language Models (PLMs) which are trained on large text corpus via self-supervised learning method, have yielded promising performance on various tasks in Natural Language Processing (NLP). However, though PLMs with huge parameters can effectively possess rich knowledge learned from massive training text and benefit downstream tasks at the fine-tuning stage, they still have some limitations such as poor reasoning ability due to the lack of external knowledge. Research has been dedicated to incorporating knowledge into PLMs to tackle these issues. In this paper, we present a comprehensive review of Knowledge-Enhanced Pre-trained Language Models (KE-PLMs) to provide a clear insight into this thriving field. We introduce appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight these two main tasks of NLP. For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG), and rule knowledge. The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods. Finally, we point out some promising future directions of KE-PLMs.

entity · Performer · 圖 · 知識圖譜 · MoDELS ·

2019 年 6 月 4 日

Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

Deepak Nathani,Jatin Chauhan,Charu Sharma,Manohar Kaul

from arxiv, accepted as long paper in ACL 2019

The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity's neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.