91婷婷国产精选国产色-日韩精品人妻中文字幕有码网址

Most machine learning models operate under the assumption that the training, testing and deployment data is independent and identically distributed (i.i.d.). This assumption doesn't generally hold true in a natural setting. Usually, the deployment data is subject to various types of distributional shifts. The magnitude of a model's performance is proportional to this shift in the distribution of the dataset. Thus it becomes necessary to evaluate a model's uncertainty and robustness to distributional shifts to get a realistic estimate of its expected performance on real-world data. Present methods to evaluate uncertainty and model's robustness are lacking and often fail to paint the full picture. Moreover, most analysis so far has primarily focused on classification tasks. In this paper, we propose more insightful metrics for general regression tasks using the Shifts Weather Prediction Dataset. We also present an evaluation of the baseline methods using these metrics.

相關內容

穩(wen)健性

關注 3

估計/估計量 · state-of-the-art · MoDELS · Performer · Better ·

2022 年 1 月 11 日

Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate

Lu Mi,Hao Wang,Yonglong Tian,Hao He,Nir Shavit

from arxiv, In proceedings of the 36th AAAI Conference on Artificial Intelligence

Uncertainty estimation is an essential step in the evaluation of the robustness for deep learning models in computer vision, especially when applied in risk-sensitive areas. However, most state-of-the-art deep learning models either fail to obtain uncertainty estimation or need significant modification (e.g., formulating a proper Bayesian treatment) to obtain it. Most previous methods are not able to take an arbitrary model off the shelf and generate uncertainty estimation without retraining or redesigning it. To address this gap, we perform a systematic exploration into training-free uncertainty estimation for dense regression, an unrecognized yet important problem, and provide a theoretical construction justifying such estimations. We propose three simple and scalable methods to analyze the variance of outputs from a trained network under tolerable perturbations: infer-transformation, infer-noise, and infer-dropout. They operate solely during the inference, without the need to re-train, re-design, or fine-tune the models, as typically required by state-of-the-art uncertainty estimation methods. Surprisingly, even without involving such perturbations in training, our methods produce comparable or even better uncertainty estimation when compared to training-required state-of-the-art methods.

INTERACT · MoDELS · 聯合分布 · INFORMS · Performer ·

2022 年 1 月 10 日

Evaluating Bayesian Model Visualisations

Sebastian Stein,John H. Williamson

Probabilistic models inform an increasingly broad range of business and policy decisions ultimately made by people. Recent algorithmic, computational, and software framework development progress facilitate the proliferation of Bayesian probabilistic models, which characterise unobserved parameters by their joint distribution instead of point estimates. While they can empower decision makers to explore complex queries and to perform what-if-style conditioning in theory, suitable visualisations and interactive tools are needed to maximise users' comprehension and rational decision making under uncertainty. In this paper, propose a protocol for quantitative evaluation of Bayesian model visualisations and introduce a software framework implementing this protocol to support standardisation in evaluation practice and facilitate reproducibility. We illustrate the evaluation and analysis workflow on a user study that explores whether making Boxplots and Hypothetical Outcome Plots interactive can increase comprehension or rationality and conclude with design guidelines for researchers looking to conduct similar studies in the future.

估計/估計量 · 采樣法 · 樣本 · Performer · 假陽性 ·

2022 年 1 月 10 日

Estimating SARS-CoV-2 Seroprevalence

Samuel Rosin,Bonnie E. Shook-Sa,Stephen R. Cole,Michael G. Hudgens

from arxiv, Main text: 18 pages, 4 figures. Appendix: 16 pages, 10 figures. Preprint

Governments and public health authorities use seroprevalence studies to guide their responses to the COVID-19 pandemic. These seroprevalence surveys estimate the proportion of persons within a given population who have detectable antibodies to SARS-CoV-2. However, serologic assays are prone to misclassification error due to false positives and negatives, and non-probability sampling methods may induce selection bias. In this paper, we consider nonparametric and parametric prevalence estimators that address both challenges by leveraging validation data and assuming equal probabilities of sample inclusion within covariate-defined strata. Both estimators are shown to be consistent and asymptotically normal, and consistent variance estimators are derived. Simulation studies are presented comparing the finite sample performance of the estimators over a range of assay characteristics and sampling scenarios. The methods are used to estimate SARS-CoV-2 seroprevalence in asymptomatic individuals in Belgium and North Carolina.

穩健性 · 估計/估計量 · Extensibility · 學成 · 深度學習 ·

2022 年 1 月 5 日

Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning

Zachary Nado,Neil Band,Mark Collier,Josip Djolonga,Michael W. Dusenberry,Sebastian Farquhar,Qixuan Feng,Angelos Filos,Marton Havasi,Rodolphe Jenatton,Ghassen Jerfel,Jeremiah Liu,Zelda Mariet,Jeremy Nixon,Shreyas Padhy,Jie Ren,Tim G. J. Rudner,Faris Sbahi,Yeming Wen,Florian Wenzel,Kevin Murphy,D. Sculley,Balaji Lakshminarayanan,Jasper Snoek,Yarin Gal,Dustin Tran

High-quality estimates of uncertainty and robustness are crucial for numerous real-world applications, especially for deep learning which underlies many deployed ML systems. The ability to compare techniques for improving these estimates is therefore very important for research and practice alike. Yet, competitive comparisons of methods are often lacking due to a range of reasons, including: compute availability for extensive tuning, incorporation of sufficiently many baselines, and concrete documentation for reproducibility. In this paper we introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks. As of this writing, the collection spans 19 methods across 9 tasks, each with at least 5 metrics. Each baseline is a self-contained experiment pipeline with easily reusable and extendable components. Our goal is to provide immediate starting points for experimentation with new methods or applications. Additionally we provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results. Code available at //github.com/google/uncertainty-baselines.

標注 · 在線 · Continuity · INFORMS · 估計/估計量 ·

2022 年 1 月 5 日

Online Adaptation to Label Distribution Shift

Ruihan Wu,Chuan Guo,Yi Su,Kilian Q. Weinberger

Machine learning models often encounter distribution shifts when deployed in the real world. In this paper, we focus on adaptation to label distribution shift in the online setting, where the test-time label distribution is continually changing and the model must dynamically adapt to it without observing the true label. Leveraging a novel analysis, we show that the lack of true label does not hinder estimation of the expected test loss, which enables the reduction of online label shift adaptation to conventional online learning. Informed by this observation, we propose adaptation algorithms inspired by classical online learning techniques such as Follow The Leader (FTL) and Online Gradient Descent (OGD) and derive their regret bounds. We empirically verify our findings under both simulated and real world label distribution shifts and show that OGD is particularly effective and robust to a variety of challenging label shift scenarios.

估計/估計量 · 可約的 · 方差 · 無偏 · 統計量 ·

2022 年 1 月 5 日

Reducing bias and variance in quantile estimates with an exponential model

Rohit Pandey

Percentiles and more generally, quantiles are commonly used in various contexts to summarize data. For most distributions, there is exactly one quantile that is unbiased. For distributions like the Gaussian that have the same mean and median, that becomes the medians. There are different ways to estimate quantiles from finite samples described in the literature and implemented in statistics packages. It is possible to leverage the memory-less property of the exponential distribution and design high quality estimators that are unbiased and have low variance and mean squared errors. Naturally, these estimators out-perform the ones in statistical packages when the underlying distribution is exponential. But, they also happen to generalize well when that assumption is violated.

MoDELS · Performer · Processing（編程語言） · 學成 · 穩健性 ·

2021 年 9 月 3 日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Paul Michel

from arxiv, PhD thesis

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.

文本分類 · Extensibility · domain shift · Neural Networks · MoDELS ·

2021 年 7 月 15 日

Uncertainty-Aware Reliable Text Classification

Yibo Hu,Latifur Khan

from arxiv, KDD 2021

Deep neural networks have significantly contributed to the success in predictive accuracy for classification tasks. However, they tend to make over-confident predictions in real-world settings, where domain shifting and out-of-distribution (OOD) examples exist. Most research on uncertainty estimation focuses on computer vision because it provides visual validation on uncertainty quality. However, few have been presented in the natural language process domain. Unlike Bayesian methods that indirectly infer uncertainty through weight uncertainties, current evidential uncertainty-based methods explicitly model the uncertainty of class probabilities through subjective opinions. They further consider inherent uncertainty in data with different root causes, vacuity (i.e., uncertainty due to a lack of evidence) and dissonance (i.e., uncertainty due to conflicting evidence). In our paper, we firstly apply evidential uncertainty in OOD detection for text classification tasks. We propose an inexpensive framework that adopts both auxiliary outliers and pseudo off-manifold samples to train the model with prior knowledge of a certain class, which has high vacuity for OOD samples. Extensive empirical experiments demonstrate that our model based on evidential uncertainty outperforms other counterparts for detecting OOD examples. Our approach can be easily deployed to traditional recurrent neural networks and fine-tuned pre-trained transformers.

鏈路預測 · entity · 二元關系 · binary · 負例 ·

2021 年 4 月 21 日

Link Prediction on N-ary Relational Data Based on Relatedness Evaluation

Saiping Guan,Xiaolong Jin,Jiafeng Guo,Yuanzhuo Wang,Xueqi Cheng

from arxiv, Accepted to TKDE 2021

With the overwhelming popularity of Knowledge Graphs (KGs), researchers have poured attention to link prediction to fill in missing facts for a long time. However, they mainly focus on link prediction on binary relational data, where facts are usually represented as triples in the form of (head entity, relation, tail entity). In practice, n-ary relational facts are also ubiquitous. When encountering such facts, existing studies usually decompose them into triples by introducing a multitude of auxiliary virtual entities and additional triples. These conversions result in the complexity of carrying out link prediction on n-ary relational data. It has even proven that they may cause loss of structure information. To overcome these problems, in this paper, we represent each n-ary relational fact as a set of its role and role-value pairs. We then propose a method called NaLP to conduct link prediction on n-ary relational data, which explicitly models the relatedness of all the role and role-value pairs in an n-ary relational fact. We further extend NaLP by introducing type constraints of roles and role-values without any external type-specific supervision, and proposing a more reasonable negative sampling mechanism. Experimental results validate the effectiveness and merits of the proposed methods.

SSL · 學成 · Performer · Taxonomy · 情景 ·

2021 年 4 月 1 日

A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification

Jong-Chyi Su,Zezhou Cheng,Subhransu Maji

from arxiv, CVPR 2021 (oral)

We evaluate the effectiveness of semi-supervised learning (SSL) on a realistic benchmark where data exhibits considerable class imbalance and contains images from novel classes. Our benchmark consists of two fine-grained classification datasets obtained by sampling classes from the Aves and Fungi taxonomy. We find that recently proposed SSL methods provide significant benefits, and can effectively use out-of-class data to improve performance when deep networks are trained from scratch. Yet their performance pales in comparison to a transfer learning baseline, an alternative approach for learning from a few examples. Furthermore, in the transfer setting, while existing SSL methods provide improvements, the presence of out-of-class is often detrimental. In this setting, standard fine-tuning followed by distillation-based self-training is the most robust. Our work suggests that semi-supervised learning with experts on realistic datasets may require different strategies than those currently prevalent in the literature.