国产白浆一区二区无码视频在线-精品人妻无码专区视频

Subpopulation shift wildly exists in many real-world machine learning applications, referring to the training and test distributions containing the same subpopulation groups but varying in subpopulation frequencies. Importance reweighting is a normal way to handle the subpopulation shift issue by imposing constant or adaptive sampling weights on each sample in the training dataset. However, some recent studies have recognized that most of these approaches fail to improve the performance over empirical risk minimization especially when applied to over-parameterized neural networks. In this work, we propose a simple yet practical framework, called uncertainty-aware mixup (Umix), to mitigate the overfitting issue in over-parameterized models by reweighting the "mixed" samples according to the sample uncertainty. The training-trajectories-based uncertainty estimation is equipped in the proposed Umix for each sample to flexibly characterize the subpopulation distribution. We also provide insightful theoretical analysis to verify that Umix achieves better generalization bounds over prior works. Further, we conduct extensive empirical studies across a wide range of tasks to validate the effectiveness of our method both qualitatively and quantitatively.

相關內容

Mixup

關注 0

Weight · 協變量偏移 · 核嶺回歸 · 嶺回歸 · 正則化項 ·

2022 年 10 月 26 日

Importance Weighting Correction of Regularized Least-Squares for Covariate and Target Shifts

Davit Gogolashvili

from arxiv, The paper can be significantly improved

In many real world problems, the training data and test data have different distributions. This situation is commonly referred as a dataset shift. The most common settings for dataset shift often considered in the literature are {\em covariate shift } and {\em target shift}. Importance weighting (IW) correction is a universal method for correcting the bias present in learning scenarios under dataset shift. The question one may ask is: does IW correction work equally well for different dataset shift scenarios? By investigating the generalization properties of the weighted kernel ridge regression (W-KRR) under covariate and target shifts we show that the answer is negative, except when IW is bounded and the model is wellspecified. In the latter cases, a minimax optimal rates are achieved by importance weighted kernel ridge regression (IW-KRR) in both, covariate and target shift scenarios. Slightly relaxing the boundedness condition of the IW we show that the IW-KRR still achieves the optimal rates under target shift while leading to slower rates for covariate shift. In the case of the model misspecification we show that the performance of the W-KRR under covariate shift could be substantially increased by designing an alternative reweighting function. The distinction between misspecified and wellspecified scenarios does not seem to be crucial in the learning problems under target shift.

Networking · Neural Networks · Principle · 模型評估 · 穩健性 ·

2022 年 10 月 26 日

On the uncertainty principle of neural networks

Jun-Jie Zhang,Dong-Xiao Zhang,Jian-Nan Chen,Long-Gang Pang,Deyu Meng

from arxiv, 8 pages, 5 figures

Despite the successes in many fields, it is found that neural networks are difficult to be both accurate and robust, i.e., high accuracy networks are often vulnerable. Various empirical and analytic studies have substantiated that there is more or less a trade-off between the accuracy and robustness of neural networks. If the property is inherent, applications based on the neural networks are vulnerable with untrustworthy predictions. To more deeply explore and understand this issue, in this study we show that the accuracy-robustness trade-off is an intrinsic property whose underlying mechanism is closely related to the uncertainty principle in quantum mechanics. By relating the loss function in neural networks to the wave function in quantum mechanics, we show that the inputs and their conjugates cannot be resolved by a neural network simultaneously. This work thus provides an insightful explanation for the inevitability of the accuracy-robustness dilemma for general deep networks from an entirely new perspective, and furthermore, reveals a potential possibility to study various properties of neural networks with the mature mathematical tools in quantum physics.

Learning · Networking · 估計/估計量 · 置信度 · Neural Networks ·

2022 年 10 月 26 日

Uncertainty-Aware Self-supervised Neural Network for Liver $T_{1ρ}$ Mapping with Relaxation Constraint

Chaoxing Huang,Yurui Qian,Simon Chun Ho Yu,Jian Hou,Baiyan Jiang,Queenie Chan,Vincent Wai-Sun Wong,Winnie Chiu-Wing Chu,Weitian Chen

from arxiv, Provisionally accepted by Physics in Medicine and Biology

$T_{1\rho}$ mapping is a promising quantitative MRI technique for the non-invasive assessment of tissue properties. Learning-based approaches can map $T_{1\rho}$ from a reduced number of $T_{1\rho}$ weighted images, but requires significant amounts of high quality training data. Moreover, existing methods do not provide the confidence level of the $T_{1\rho}$ estimation. To address these problems, we proposed a self-supervised learning neural network that learns a $T_{1\rho}$ mapping using the relaxation constraint in the learning process. Epistemic uncertainty and aleatoric uncertainty are modelled for the $T_{1\rho}$ quantification network to provide a Bayesian confidence estimation of the $T_{1\rho}$ mapping. The uncertainty estimation can also regularize the model to prevent it from learning imperfect data. We conducted experiments on $T_{1\rho}$ data collected from 52 patients with non-alcoholic fatty liver disease. The results showed that our method outperformed the existing methods for $T_{1\rho}$ quantification of the liver using as few as two $T_{1\rho}$-weighted images. Our uncertainty estimation provided a feasible way of modelling the confidence of the self-supervised learning based $T_{1\rho}$ estimation, which is consistent with the reality in liver $T_{1\rho}$ imaging.

MoDELS · AI · Continuity · state-of-the-art · 可理解性 ·

2022 年 10 月 25 日

Self-supervised Co-learning of Uncurated Images and Reports Enables Oversight AI in Radiology

Sangjoon Park,Eun Sun Lee,Kyung Sook Shin,Jeong Eun Lee,Jong Chul Ye

Oversight AI is an emerging concept in radiology where the AI forms a symbiosis with radiologists by continuously supporting radiologists in their decision-making. Recent advances in vision-language pre-training sheds a light on the long-standing problems of the oversight AI by the understanding of both visual and textual concepts and their semantic correspondences. However, there have been limited successes in the application of vision-language pre-training in the medical domain, as the current vision-language models and learning strategies for photographic images and captions are not optimal to process the medical data that are usually insufficient in the amount and the diversity. To address this, here we present medical X-VL, a self-supervised model tailored for efficient vision-language pre-training that exploits cross attention in the radiological images and reports' common feature space in a symmetric manner. We experimentally demonstrate that the pre-trained medical X-VL model outperforms the current state-of-the-art models in various vision-language tasks in medical domains. We finally demonstrate practical clinical usages of our oversight AI for monitoring human errors and in the diagnosis of newly emerging diseases, which suggests the potential of an oversight AI model for widespread applicability in different medical applications.

Conformer · 推斷 · 情景 · 在線 · 可交換的 ·

2022 年 10 月 24 日

Conformal Inference for Online Prediction with Arbitrary Distribution Shifts

Isaac Gibbs,Emmanuel Candès

from arxiv, 29 pages, 10 figures

Conformal inference is a powerful tool for quantifying the uncertainty around predictions made by black-box models (e.g. neural nets, random forests). Formally, this methodology guarantees that if the training and test data are exchangeable (e.g. i.i.d.) then we can construct a prediction set $C$ for the target $Y$ such that $P(Y \in C) = 1-\alpha$ for any target level $\alpha$. In this article, we extend this methodology to an online prediction setting where the distribution generating the data is allowed to vary over time. To account for the non-exchangeability, we develop a protective layer that lies on top of conformal inference and gradually re-calibrates its predictions to adapt to the observed changes in the environment. Our methods are highly flexible and can be used in combination with any predictive algorithm that produces estimates of the target or its conditional distribution and without any assumptions on the size or type of the distribution shift. We test our techniques on two real-world datasets aimed at predicting stock market volatility and COVID-19 case counts and find that they are robust and adaptive to real-world distribution shifts.

重要性采樣 · Learning · 樣本 · Extensibility · Continuity ·

2022 年 10 月 24 日

Off-Policy Correction for Actor-Critic Methods without Importance Sampling

Baturay Saglam,Dogan C. Cicek,Furkan B. Mutlu,Suleyman S. Kozat

Compared to on-policy policy gradient techniques, off-policy model-free deep reinforcement learning (RL) that uses previously gathered data can improve sampling efficiency. However, off-policy learning becomes challenging when the discrepancy between the distributions of the policy of interest and the policies that collected the data increases. Although the well-studied importance sampling and off-policy policy gradient techniques were proposed to compensate for this discrepancy, they usually require a collection of long trajectories that increases the computational complexity and induce additional problems such as vanishing/exploding gradients or discarding many useful experiences. Moreover, their generalization to continuous action domains is strictly limited as they require action probabilities, which is unsuitable for deterministic policies. To overcome these limitations, we introduce a novel policy similarity measure to mitigate the effects of such discrepancy. Our method offers an adequate single-step off-policy correction without any probability estimates, and theoretical results show that it can achieve a contraction mapping with a fixed unique point, which allows "safe" off-policy learning. An extensive set of empirical results indicate that our algorithm substantially improves the state-of-the-art and attains higher returns in fewer steps than the competing methods by efficiently scheduling the learning rate in Q-learning and policy optimization.

Machine Learning · 學成 · 可辨認的 · 統計量 · 話題 ·

2020 年 4 月 3 日

Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods

Eyke Hüllermeier,Willem Waegeman

from arxiv, 52 pages

The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often refereed to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of hitherto attempts at handling uncertainty in general and formalizing this distinction in particular.

簇 · Performer · 數據集 · MoDELS · DBSCAN ·

2019 年 10 月 30 日

Meta-Learning to Cluster

Yibo Jiang,Nakul Verma

Clustering is one of the most fundamental and wide-spread techniques in exploratory data analysis. Yet, the basic approach to clustering has not really changed: a practitioner hand-picks a task-specific clustering loss to optimize and fit the given data to reveal the underlying cluster structure. Some types of losses---such as k-means, or its non-linear version: kernelized k-means (centroid based), and DBSCAN (density based)---are popular choices due to their good empirical performance on a range of applications. Although every so often the clustering output using these standard losses fails to reveal the underlying structure, and the practitioner has to custom-design their own variation. In this work we take an intrinsically different approach to clustering: rather than fitting a dataset to a specific clustering loss, we train a recurrent model that learns how to cluster. The model uses as training pairs examples of datasets (as input) and its corresponding cluster identities (as output). By providing multiple types of training datasets as inputs, our model has the ability to generalize well on unseen datasets (new clustering tasks). Our experiments reveal that by training on simple synthetically generated datasets or on existing real datasets, we can achieve better clustering performance on unseen real-world datasets when compared with standard benchmark clustering techniques. Our meta clustering model works well even for small datasets where the usual deep learning models tend to perform worse.

多峰值 · 可辨認的 · MAML · 圖片分類 · 小樣本學習 ·

2019 年 10 月 30 日

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Risto Vuorio,Shao-Hua Sun,Hexiang Hu,Joseph J. Lim

Model-agnostic meta-learners aim to acquire meta-learned parameters from similar tasks to adapt to novel tasks from the same distribution with few gradient updates. With the flexibility in the choice of models, those frameworks demonstrate appealing performance on a variety of domains such as few-shot image classification and reinforcement learning. However, one important limitation of such frameworks is that they seek a common initialization shared across the entire task distribution, substantially limiting the diversity of the task distributions that they are able to learn from. In this paper, we augment MAML with the capability to identify the mode of tasks sampled from a multimodal task distribution and adapt quickly through gradient updates. Specifically, we propose a multimodal MAML (MMAML) framework, which is able to modulate its meta-learned prior parameters according to the identified mode, allowing more efficient fast adaptation. We evaluate the proposed model on a diverse set of few-shot learning tasks, including regression, image classification, and reinforcement learning. The results not only demonstrate the effectiveness of our model in modulating the meta-learned prior in response to the characteristics of tasks but also show that training on a multimodal distribution can produce an improvement over unimodal training.

樣本 · 類別 · 損失 · Performer · SimPLe ·

2019 年 1 月 16 日

Class-Balanced Loss Based on Effective Number of Samples

Yin Cui,Menglin Jia,Tsung-Yi Lin,Yang Song,Serge Belongie

from arxiv, Code is available at: //github.com/richardaecn/class-balanced-loss

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.