亚洲精品无码黄色网站在线观看,中文字幕AV一区二区精品

Statistical divergences (SDs), which quantify the dissimilarity between probability distributions, are a basic constituent of statistical inference and machine learning. A modern method for estimating those divergences relies on parametrizing an empirical variational form by a neural network (NN) and optimizing over parameter space. Such neural estimators are abundantly used in practice, but corresponding performance guarantees are partial and call for further exploration. In particular, there is a fundamental tradeoff between the two sources of error involved: approximation and empirical estimation. While the former needs the NN class to be rich and expressive, the latter relies on controlling complexity. We explore this tradeoff for an estimator based on a shallow NN by means of non-asymptotic error bounds, focusing on four popular $\mathsf{f}$-divergences -- Kullback-Leibler, chi-squared, squared Hellinger, and total variation. Our analysis relies on non-asymptotic function approximation theorems and tools from empirical process theory. The bounds reveal the tension between the NN size and the number of samples, and enable to characterize scaling rates thereof that ensure consistency. For compactly supported distributions, we further show that neural estimators with a slightly different NN growth-rate are near minimax rate-optimal, achieving the parametric convergence rate up to logarithmic factors.

相關內容

估計/估計量

關注 3

估計/估計量 · 方差 · 有偏 · 統計量 · 線性的 ·

2021 年 11 月 30 日

Martingale product estimators for sensitivity analysis in computational statistical physics

Petr Plechac,Gabriel Stoltz,Ting Wang

from arxiv, 34 pages, 4 figures

We introduce a new class of estimators for the linear response of steady states of stochastic dynamics. We generalize the likelihood ratio approach and formulate the linear response as a product of two martingales, hence the name "martingale product estimators". We present a systematic derivation of the martingale product estimator, and show how to construct such estimator so its bias is consistent with the weak order of the numerical scheme that approximates the underlying stochastic differential equation. Motivated by the estimation of transport properties in molecular systems, we present a rigorous numerical analysis of the bias and variance for these new estimators in the case of Langevin dynamics. We prove that the variance is uniformly bounded in time and derive a specific form of the estimator for second-order splitting schemes for Langevin dynamics. For comparison, we also study the bias and variance of a Green-Kubo estimator, motivated, in part, by its variance growing linearly in time. Presented analysis shows that the new martingale product estimators, having uniformly bounded variance in time, offer a competitive alternative to the traditional Green-Kubo estimator. We compare on illustrative numerical tests the new estimators with results obtained by the Green-Kubo method.

估計/估計量 · 可理解性 · Processing（編程語言） · MoDELS · Integration ·

2021 年 11 月 30 日

Analysis-aware defeaturing: problem setting and a posteriori estimation

Annalisa Buffa,Ondine Chanon,Rafael Vázquez

from arxiv, 37 pages, 15 figures, 4 tables

Defeaturing consists in simplifying geometrical models by removing the geometrical features that are considered not relevant for a given simulation. Feature removal and simplification of computer-aided design models enables faster simulations for engineering analysis problems, and simplifies the meshing problem that is otherwise often unfeasible. The effects of defeaturing on the analysis are then neglected and, as of today, there are basically very few strategies to quantitatively evaluate such an impact. Understanding well the effects of this process is an important step for automatic integration of design and analysis. We formalize the process of defeaturing by understanding its effect on the solution of Poisson equation defined on the geometrical model of interest containing a single feature, with Neumann boundary conditions on the feature itself. We derive an a posteriori estimator of the energy error between the solutions of the exact and the defeatured geometries in $\mathbb{R}^n$, $n\in\{2,3\}$, that is simple, reliable and efficient up to oscillations. The dependence of the estimator upon the size of the features is explicit.

估計/估計量 · 線性的 · 泛函 · 變換 · 可辨認的 ·

2021 年 11 月 29 日

Linear functional estimation under multiplicative measurement errors

Sergio Brenner Miguel,Fabienne Comte,Jan Johannes

from arxiv, 25 pages

We study the non-parametric estimation of the value ${\theta}(f )$ of a linear functional evaluated at an unknown density function f with support on $R_+$ based on an i.i.d. sample with multiplicative measurement errors. The proposed estimation procedure combines the estimation of the Mellin transform of the density $f$ and a regularisation of the inverse of the Mellin transform by a spectral cut-off. In order to bound the mean squared error we distinguish several scenarios characterised through different decays of the upcoming Mellin transforms and the smoothnes of the linear functional. In fact, we identify scenarios, where a non-trivial choice of the upcoming tuning parameter is necessary and propose a data-driven choice based on a Goldenshluger-Lepski method. Additionally, we show minimax-optimality over Mellin-Sobolev spaces of the estimator.

估計/估計量 · 最大后驗估計 · 最大后驗 · 統計量 · 泛函 ·

2021 年 11 月 29 日

Γ-convergence of Onsager-Machlup functionals. Part I: With applications to maximum a posteriori estimation in Bayesian inverse problems

Birzhan Ayanbayev,Ilja Klebanov,Han Cheng Lie,T. J. Sullivan

from arxiv, 30 pages

The Bayesian solution to a statistical inverse problem can be summarised by a mode of the posterior distribution, i.e. a MAP estimator. The MAP estimator essentially coincides with the (regularised) variational solution to the inverse problem, seen as minimisation of the Onsager-Machlup functional of the posterior measure. An open problem in the stability analysis of inverse problems is to establish a relationship between the convergence properties of solutions obtained by the variational approach and by the Bayesian approach. To address this problem, we propose a general convergence theory for modes that is based on the $\Gamma$-convergence of Onsager-Machlup functionals, and apply this theory to Bayesian inverse problems with Gaussian and edge-preserving Besov priors. Part II of this paper considers more general prior distributions.

Continuity · 估計/估計量 · 統計量 · 極大似然 · 極大似然估計 ·

2021 年 11 月 27 日

Nonparametric estimation of continuous DPPs with kernel methods

Micha?l Fanuel,Rémi Bardenet

from arxiv, 26 pages, 7 figures. To appear at NeurIPS 2021

Determinantal Point Process (DPPs) are statistical models for repulsive point patterns. Both sampling and inference are tractable for DPPs, a rare feature among models with negative dependence that explains their popularity in machine learning and spatial statistics. Parametric and nonparametric inference methods have been proposed in the finite case, i.e. when the point patterns live in a finite ground set. In the continuous case, only parametric methods have been investigated, while nonparametric maximum likelihood for DPPs -- an optimization problem over trace-class operators -- has remained an open question. In this paper, we show that a restricted version of this maximum likelihood (MLE) problem falls within the scope of a recent representer theorem for nonnegative functions in an RKHS. This leads to a finite-dimensional problem, with strong statistical ties to the original MLE. Moreover, we propose, analyze, and demonstrate a fixed point algorithm to solve this finite-dimensional problem. Finally, we also provide a controlled estimate of the correlation kernel of the DPP, thus providing more interpretability.

估計/估計量 · 統計量 · MoDELS · 推斷 · 視覺識別系統 ·

2021 年 11 月 25 日

Variational Gibbs inference for statistical model estimation from incomplete data

Vaidotas Simkus,Benjamin Rhodes,Michael U. Gutmann

Statistical models are central to machine learning with broad applicability across a range of downstream tasks. The models are typically controlled by free parameters that are estimated from data by maximum-likelihood estimation. However, when faced with real-world datasets many of the models run into a critical issue: they are formulated in terms of fully-observed data, whereas in practice the datasets are plagued with missing data. The theory of statistical model estimation from incomplete data is conceptually similar to the estimation of latent-variable models, where powerful tools such as variational inference (VI) exist. However, in contrast to standard latent-variable models, parameter estimation with incomplete data often requires estimating exponentially-many conditional distributions of the missing variables, hence making standard VI methods intractable. We address this gap by introducing variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data. We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models, VAEs and normalising flows, from incomplete data. The proposed method, whilst general-purpose, achieves competitive or better performance than existing model-specific estimation methods.

微分熵 · 估計/估計量 · Continuity · INFORMS · 概率密度函數 ·

2021 年 11 月 24 日

On the Estimation of Information Measures of Continuous Distributions

Georg Pichler,Pablo Piantanida,Günther Koliander

from arxiv, 20 pages

The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.

相互獨立的 · 近似 · INFORMS · 極小點 · 易處理的 ·

2021 年 11 月 24 日

On the Exponential Approximation of Type II Error Probability of Distributed Test of Independence

Sebastian Espinosa,Jorge F. Silva,Pablo Piantanida

This paper studies distributed binary test of statistical independence under communication (information bits) constraints. While testing independence is very relevant in various applications, distributed independence test is particularly useful for event detection in sensor networks where data correlation often occurs among observations of devices in the presence of a signal of interest. By focusing on the case of two devices because of their tractability, we begin by investigating conditions on Type I error probability restrictions under which the minimum Type II error admits an exponential behavior with the sample size. Then, we study the finite sample-size regime of this problem. We derive new upper and lower bounds for the gap between the minimum Type II error and its exponential approximation under different setups, including restrictions imposed on the vanishing Type I error probability. Our theoretical results shed light on the sample-size regimes at which approximations of the Type II error probability via error exponents became informative enough in the sense of predicting well the actual error probability. We finally discuss an application of our results where the gap is evaluated numerically, and we show that exponential approximations are not only tractable but also a valuable proxy for the Type II probability of error in the finite-length regime.

過擬合 · SimPLe · Principle · 模型評估 · 統計量 ·

2021 年 3 月 16 日

Deep learning: a statistical viewpoint

Peter L. Bartlett,Andrea Montanari,Alexander Rakhlin

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.

似然 · 估計/估計量 · 最大似然估計 · 極大似然 · MoDELS ·

2018 年 9 月 24 日

Implicit Maximum Likelihood Estimation

Ke Li,Jitendra Malik

from arxiv, 21 pages, 4 figures. In the interest of promoting discussion, we make the reviews available at //people.eecs.berkeley.edu/~ke.li/papers/imle_reviews.pdf

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.