国产乱人弄视频免费观看_欧美日韩精品视频一区二区在线播_欧美亚洲日韩一区二区A级视频在线观看_日韩国产欧美综合_免费观看国产片激情视频_91久久人人爽澡人人澡中国_国产精品久久久久精品高清

We present a new finite-sample analysis of M-estimators of locations in $\mathbb{R}^d$ using the tool of the influence function. In particular, we show that the deviations of an M-estimator can be controlled thanks to its influence function (or its score function) and then, we use concentration inequality on M-estimators to investigate the robust estimation of the mean in high dimension in a corrupted setting (adversarial corruption setting) for bounded and unbounded score functions. For a sample of size $n$ and covariance matrix $\Sigma$, we attain the minimax speed $\sqrt{Tr(\Sigma)/n}+\sqrt{\|\Sigma\|_{op}\log(1/\delta)/n}$ with probability larger than $1-\delta$ in a heavy-tailed setting. One of the major advantages of our approach compared to others recently proposed is that our estimator is tractable and fast to compute even in very high dimension with a complexity of $O(nd\log(Tr(\Sigma)))$ where $n$ is the sample size and $\Sigma$ is the covariance matrix of the inliers. In practice, the code that we make available for this article proves to be very fast.

相關內容

協方差矩陣(zhen)

關注 3

在概率論和統計學(xue)中，協(xie)方(fang)(fang)差矩(ju)(ju)陣（也稱為自協(xie)方(fang)(fang)差矩(ju)(ju)陣，色(se)散矩(ju)(ju)陣，方(fang)(fang)差矩(ju)(ju)陣或(huo)方(fang)(fang)差-協(xie)方(fang)(fang)差矩(ju)(ju)陣）是平方(fang)(fang)矩(ju)(ju)陣，給出了給定隨機向量的(de)每對元(yuan)素(su)之間(jian)的(de)協(xie)方(fang)(fang)差。在矩(ju)(ju)陣對角(jiao)線中存(cun)在方(fang)(fang)差，即(ji)每個元(yuan)素(su)與其自身(shen)的(de)協(xie)方(fang)(fang)差。

估計/估計量 · 統計量 · 線性的 · 相互獨立的 · 平滑 ·

2021 年 11 月 23 日

Properties of linear spectral statistics of frequency-smoothed estimated spectral coherence matrix of high-dimensional Gaussian time series

Philippe Loubaton,Alexis Rosuel

from arxiv, Previously this version appeared as arXiv:2007.08806 which was submitted here as a new work by accident

The asymptotic behaviour of Linear Spectral Statistics (LSS) of the smoothed periodogram estimator of the spectral coherency matrix of a complex Gaussian high-dimensional time series $(\y_n)_{n \in \mathbb{Z}}$ with independent components is studied under the asymptotic regime where the sample size $N$ converges towards $+\infty$ while the dimension $M$ of $\y$ and the smoothing span of the estimator grow to infinity at the same rate in such a way that $\frac{M}{N} \rightarrow 0$. It is established that, at each frequency, the estimated spectral coherency matrix is close from the sample covariance matrix of an independent identically $\mathcal{N}_{\mathbb{C}}(0,\I_M)$ distributed sequence, and that its empirical eigenvalue distribution converges towards the Marcenko-Pastur distribution. This allows to conclude that each LSS has a deterministic behaviour that can be evaluated explicitly. Using concentration inequalities, it is shown that the order of magnitude of the supremum over the frequencies of the deviation of each LSS from its deterministic approximation is of the order of $\frac{1}{M} + \frac{\sqrt{M}}{N}+ (\frac{M}{N})^{3}$ where $N$ is the sample size. Numerical simulations supports our results.

貝葉斯推斷 · 可約的 · 推斷 · 估計/估計量 · 有偏 ·

2021 年 11 月 23 日

Removing the mini-batching error in Bayesian inference using Adaptive Langevin dynamics

Inass Sekkat,Gabriel Stoltz

Bayesian inference allows to obtain useful information on the parameters of models, either in computational statistics or more recently in the context of Bayesian Neural Networks. The computational cost of usual Monte Carlo methods for sampling a posteriori laws in Bayesian inference scales linearly with the number of data points. One option to reduce it to a fraction of this cost is to resort to mini-batching in conjunction with unadjusted discretizations of Langevin dynamics, in which case only a random fraction of the data is used to estimate the gradient. However, this leads to an additional noise in the dynamics and hence a bias on the invariant measure which is sampled by the Markov chain. We advocate using the so-called Adaptive Langevin dynamics, which is a modification of standard inertial Langevin dynamics with a dynamical friction which automatically corrects for the increased noise arising from mini-batching. We investigate the practical relevance of the assumptions underpinning Adaptive Langevin (constant covariance for the estimation of the gradient), which are not satisfied in typical models of Bayesian inference, and quantify the bias induced by minibatching in this case. We also show how to extend AdL in order to systematically reduce the bias on the posterior distribution by considering a dynamical friction depending on the current value of the parameter to sample.

Performer · 近似 · 泛函 · 可理解性 · 深度Q網絡 ·

2021 年 11 月 23 日

Understanding the Impact of Data Distribution on Q-learning with Function Approximation

Pedro P. Santos,Francisco S. Melo,Alberto Sardinha,Diogo S. Carvalho

In this work, we focus our attention on the study of the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a theoretical and empirical analysis as to why different properties of the data distribution can contribute to regulating sources of algorithmic instability. First, we revisit theoretical bounds on the performance of approximate dynamic programming algorithms. Second, we provide a novel four-state MDP that highlights the impact of the data distribution in the performance of a Q-learning algorithm with function approximation, both in online and offline settings. Finally, we experimentally assess the impact of the data distribution properties in the performance of an offline deep Q-network algorithm. Our results show that: (i) the data distribution needs to possess certain properties in order to robustly learn in an offline setting, namely low distance to the distributions induced by optimal policies of the MDP and high coverage over the state-action space; and (ii) high entropy data distributions can contribute to mitigating sources of algorithmic instability.

估計/估計量 · Performer · 得分 · Integration · 可約的 ·

2021 年 11 月 22 日

Density Ratio Estimation via Infinitesimal Classification

Kristy Choi,Chenlin Meng,Yang Song,Stefano Ermon

from arxiv, First two authors contributed equally

Density ratio estimation (DRE) is a fundamental machine learning technique for comparing two probability distributions. However, existing methods struggle in high-dimensional settings, as it is difficult to accurately compare probability distributions based on finite samples. In this work we propose DRE-\infty, a divide-and-conquer approach to reduce DRE to a series of easier subproblems. Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions. We then estimate the instantaneous rate of change of the bridge distributions indexed by time (the "time score") -- a quantity defined analogously to data (Stein) scores -- with a novel time score matching objective. Crucially, the learned time scores can then be integrated to compute the desired density ratio. In addition, we show that traditional (Stein) scores can be used to obtain integration paths that connect regions of high density in both distributions, improving performance in practice. Empirically, we demonstrate that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.

估計/估計量 · MoDELS · 學成 · domain shift · 深度學習 ·

2021 年 11 月 21 日

Calibrated Diffusion Tensor Estimation

Davood Karimi,Simon K. Warfield,Ali Gholipour

It is highly desirable to know how uncertain a model's predictions are, especially for models that are complex and hard to understand as in deep learning. Although there has been a growing interest in using deep learning methods in diffusion-weighted MRI, prior works have not addressed the issue of model uncertainty. Here, we propose a deep learning method to estimate the diffusion tensor and compute the estimation uncertainty. Data-dependent uncertainty is computed directly by the network and learned via loss attenuation. Model uncertainty is computed using Monte Carlo dropout. We also propose a new method for evaluating the quality of predicted uncertainties. We compare the new method with the standard least-squares tensor estimation and bootstrap-based uncertainty computation techniques. Our experiments show that when the number of measurements is small the deep learning method is more accurate and its uncertainty predictions are better calibrated than the standard methods. We show that the estimation uncertainties computed by the new method can highlight the model's biases, detect domain shift, and reflect the strength of noise in the measurements. Our study shows the importance and practical value of modeling prediction uncertainties in deep learning-based diffusion MRI analysis.

平方損失 · 樣本復雜度 · 預測器/決策函數 · 方陣 · 線性的 ·

2021 年 11 月 21 日

The Sample Complexity of Learning Linear Predictors with the Squared Loss

Ohad Shamir

from arxiv, Revised discussion to clarify that the lower bound is currently not fully matched by algorithms which must return linear predictors

In this short note, we provide a sample complexity lower bound for learning linear predictors with respect to the squared loss. Our focus is on an agnostic setting, where no assumptions are made on the data distribution. This contrasts with standard results in the literature, which either make distributional assumptions, refer to specific parameter settings, or use other performance measures.

近似 · 分段 · Integration · 泛函 · Performance ·

2021 年 11 月 20 日

Study of Polar Codes Based on Piecewise Gaussian Approximation

R. M. Oliveira,R. C. de Lamare

from arxiv, 9 figures, 6 pages

In this paper, we investigate the construction of polar codes by Gaussian approximation (GA) and develop an approach based on piecewise Gaussian approximation (PGA). In particular, with the piecewise approach we obtain a function that replaces the original GA function with a more accurate approximation, which results in significant gain in performance. The proposed PGA construction of polar codes is presented in its integral form as well as an alternative approximation that does not rely on the integral form. Simulations results show that the proposed PGA construction outperforms the standard GA for several examples of polar codes and rates.

估計/估計量 · 協方差矩陣 · 可辨認的 · 訓練數據 · Performer ·

2021 年 11 月 19 日

High-Dimensional Covariance Shrinkage for Signal Detection

Benjamin D. Robinson,Robert Malina,Alfred O. Hero III

In this paper, we consider the problem of determining the presence of a given signal in a high-dimensional observation with unknown covariance matrix by using an adaptive matched filter. Traditionally such filters are formed from the sample covariance matrix of some given training data, but, as is well-known, the performance of such filters is poor when the number of training data $n$ is not much larger than the data dimension $p$. We thus seek a covariance estimator to replace sample covariance. To account for the fact that $n$ and $p$ may be of comparable size, we adopt the "large-dimensional asymptotic model" in which $n$ and $p$ go to infinity in a fixed ratio. Under this assumption, we identify a covariance estimator that is asymptotically detection-theoretic optimal within a general shrinkage class inspired by C. Stein, and we give consistent estimates for conditional false-alarm and detection rate of the corresponding adaptive matched filter.

Batch Size · 優化器 · SGD · 自適應學習 · 學成 ·

2021 年 11 月 19 日

Adaptive Learning of the Optimal Batch Size of SGD

Motasem Alfarra,Slavomir Hanzely,Alyazeed Albasyoni,Bernard Ghanem,Peter Richtarik

from arxiv, Accepted to the 12th Annual Workshop on Optimization for Machine Learning (OPT2020)

Recent advances in the theoretical understanding of SGD led to a formula for the optimal batch size minimizing the number of effective data passes, i.e., the number of iterations times the batch size. However, this formula is of no practical value as it depends on the knowledge of the variance of the stochastic gradients evaluated at the optimum. In this paper we design a practical SGD method capable of learning the optimal batch size adaptively throughout its iterations for strongly convex and smooth functions. Our method does this provably, and in our experiments with synthetic and real data robustly exhibits nearly optimal behaviour; that is, it works as if the optimal batch size was known a-priori. Further, we generalize our method to several new batch strategies not considered in the literature before, including a sampling suitable for distributed implementations.

損失函數（機器學習） · 泛函 · 損失 · 查準率/準確率 · 超參數 ·

2020 年 10 月 23 日

A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection

Kemal Oksuz,Baris Can Cam,Emre Akbas,Sinan Kalkan

from arxiv, To appear in NeurIPS 2020 as spotlight

We propose \textit{average Localisation-Recall-Precision} (aLRP), a unified, bounded, balanced and ranking-based loss function for both classification and localisation tasks in object detection. aLRP extends the Localisation-Recall-Precision (LRP) performance metric (Oksuz et al., 2018) inspired from how Average Precision (AP) Loss extends precision to a ranking-based loss function for classification (Chen et al., 2020). aLRP has the following distinct advantages: (i) aLRP is the first ranking-based loss function for both classification and localisation tasks. (ii) Thanks to using ranking for both tasks, aLRP naturally enforces high-quality localisation for high-precision classification. (iii) aLRP provides provable balance between positives and negatives. (iv) Compared to on average $\sim$6 hyperparameters in the loss functions of state-of-the-art detectors, aLRP Loss has only one hyperparameter, which we did not tune in practice. On the COCO dataset, aLRP Loss improves its ranking-based predecessor, AP Loss, up to around $5$ AP points, achieves $48.9$ AP without test time augmentation and outperforms all one-stage detectors. Code available at: //github.com/kemaloksuz/aLRPLoss .