宁毅静平公主小说免费阅读,亚洲日本午夜一区二区三区,欧美一级免费大片

Mixtures of product distributions are a powerful device for learning about heterogeneity within data populations. In this class of latent structure models, de Finetti's mixing measure plays the central role for describing the uncertainty about the latent parameters representing heterogeneity. In this paper posterior contraction theorems for de Finetti's mixing measure arising from finite mixtures of product distributions will be established, under the setting the number of exchangeable sequences of observed variables increases while sequence length(s) may be either fixed or varied. The role of both the number of sequences and the sequence lengths will be carefully examined. In order to obtain concrete rates of convergence, a first-order identifiability theory for finite mixture models and a family of sharp inverse bounds for mixtures of product distributions will be developed via a harmonic analysis of such latent structure models. This theory is applicable to broad classes of probability kernels composing the mixture model of product distributions for both continuous and discrete domain $\mathfrak{X}$. Examples of interest include the case the probability kernel is only weakly identifiable in the sense of Ho and Nguyen (2016), the case where the kernel is itself a mixture distribution as in hierarchical models, and the case the kernel may not have a density with respect to a dominating measure on an abstract domain $\mathfrak{X}$ such as Dirichlet processes.

相關內容

可交換的

關注 0

正則化項 · 置信度 · 估計/估計量 · 可辨認的 · 直徑 ·

2021 年 11 月 17 日

On Adaptive Confidence Sets for the Wasserstein Distances

Neil Deo,Thibault Randrianarisoa

from arxiv, 37 pages, 3 appendices

In the density estimation model, we investigate the problem of constructing adaptive honest confidence sets with radius measured in Wasserstein distance $W_p$, $p\geq1$, and for densities with unknown regularity measured on a Besov scale. As sampling domains, we focus on the $d-$dimensional torus $\mathbb{T}^d$, in which case $1\leq p\leq 2$, and $\mathbb{R}^d$, for which $p=1$. We identify necessary and sufficient conditions for the existence of adaptive confidence sets with diameters of the order of the regularity-dependent $W_p$-minimax estimation rate. Interestingly, it appears that the possibility of such adaptation of the diameter depends on the dimension of the underlying space. In low dimensions, $d\leq 4$, adaptation to any regularity is possible. In higher dimensions, adaptation is possible if and only if the underlying regularities belong to some interval of width at least $d/(d-4)$. This contrasts with the usual $L_p-$theory where, independently of the dimension, adaptation requires regularities to lie in a small fixed-width window. For configurations allowing these adaptive sets to exist, we explicitly construct confidence regions via the method of risk estimation, centred at adaptive estimators. Those are the first results in a statistical approach to adaptive uncertainty quantification with Wasserstein distances. Our analysis and methods extend more globally to weak losses such as Sobolev norm distances with negative smoothness indices.

隨機森林 · 子采樣 · 規范化的 · 監督學習 · Performer ·

2021 年 11 月 16 日

Asymptotic Distributions and Rates of Convergence for Random Forests via Generalized U-statistics

Wei Peng,Tim Coleman,Lucas Mentch

from arxiv, 76 pages, 7 figure

Random forests remain among the most popular off-the-shelf supervised learning algorithms. Despite their well-documented empirical success, however, until recently, few theoretical results were available to describe their performance and behavior. In this work we push beyond recent work on consistency and asymptotic normality by establishing rates of convergence for random forests and other supervised learning ensembles. We develop the notion of generalized U-statistics and show that within this framework, random forest predictions can potentially remain asymptotically normal for larger subsample sizes than previously established. We also provide Berry-Esseen bounds in order to quantify the rate at which this convergence occurs, making explicit the roles of the subsample size and the number of trees in determining the distribution of random forest predictions.

Continuity · 二階導數 · 泛函 · 近似 · 數值分析 ·

2021 年 11 月 16 日

The convergence of the Regula Falsi method

Trung Nguyen

Regula Falsi, or the method of false position, is a numerical method for finding an approximate solution to f(x) = 0 on a finite interval [a, b], where f is a real-valued continuous function on [a, b] and satisfies f(a)f(b) < 0. Previous studies proved the convergence of this method under certain assumptions about the function f, such as both the first and second derivatives of f do not change the sign on the interval [a, b]. In this paper, we remove those assumptions and prove the convergence of the method for all continuous functions.

可辨認的 · 估計/估計量 · CASE · MoDELS · 高斯混合（模型） ·

2021 年 11 月 16 日

Sharp Analysis of Expectation-Maximization for Weakly Identifiable Models

Raaz Dwivedi,Nhat Ho,Koulik Khamaru,Martin J. Wainwright,Michael I. Jordan,Bin Yu

from arxiv, 30 pages, 4 figures. The first three authors contributed equally to this work. To appear in AISTATS 2020

We study a class of weakly identifiable location-scale mixture models for which the maximum likelihood estimates based on $n$ i.i.d. samples are known to have lower accuracy than the classical $n^{- \frac{1}{2}}$ error. We investigate whether the Expectation-Maximization (EM) algorithm also converges slowly for these models. We provide a rigorous characterization of EM for fitting a weakly identifiable Gaussian mixture in a univariate setting where we prove that the EM algorithm converges in order $n^{\frac{3}{4}}$ steps and returns estimates that are at a Euclidean distance of order ${ n^{- \frac{1}{8}}}$ and ${ n^{-\frac{1} {4}}}$ from the true location and scale parameter respectively. Establishing the slow rates in the univariate setting requires a novel localization argument with two stages, with each stage involving an epoch-based argument applied to a different surrogate EM operator at the population level. We demonstrate several multivariate ($d \geq 2$) examples that exhibit the same slow rates as the univariate case. We also prove slow statistical rates in higher dimensions in a special case, when the fitted covariance is constrained to be a multiple of the identity.

概率密度函數 · 泛函 · 近似 · CC · 可約的 ·

2021 年 11 月 13 日

Computing f-Divergences and Distances of High-Dimensional Probability Density Functions -- Low-Rank Tensor Approximations

Alexander Litvinenko,Youssef Marzouk,Hermann G. Matthies,Marco Scavino,Alessio Spantini

from arxiv, 53 pages, 1 Diagram, 2 Figures, 9 Tables

Very often, in the course of uncertainty quantification tasks or data analysis, one has to deal with high-dimensional random variables (RVs). A high-dimensional RV can be described by its probability density (pdf) and/or by the corresponding probability characteristic functions (pcf), or by a polynomial chaos (PCE) or similar expansion. Here the interest is mainly to compute characterisations like the entropy, or relations between two distributions, like their Kullback-Leibler divergence. These are all computed from the pdf, which is often not available directly, and it is a computational challenge to even represent it in a numerically feasible fashion in case the dimension $d$ is even moderately large. In this regard, we propose to represent the density by a high order tensor product, and approximate this in a low-rank format. We show how to go from the pcf or functional representation to the pdf. This allows us to reduce the computational complexity and storage cost from an exponential to a linear. The characterisations such as entropy or the $f$-divergences need the possibility to compute point-wise functions of the pdf. This normally rather trivial task becomes more difficult when the pdf is approximated in a low-rank tensor format, as the point values are not directly accessible any more. The data is considered as an element of a high order tensor space. The considered algorithms are independent of the representation of the data as a tensor. All that we require is that the data can be considered as an element of an associative, commutative algebra with an inner product. Such an algebra is isomorphic to a commutative sub-algebra of the usual matrix algebra, allowing the use of matrix algorithms to accomplish the mentioned tasks.

向量化 · 均值 · 高斯分布 · 正則化項 · CASE ·

2021 年 11 月 13 日

Multi-Normex Distributions for the Sum of Random Vectors. Rates of Convergence

Marie Kratz,Evgeny Prokopenko

We build a sharp approximation of the whole distribution of the sum of iid heavy-tailed random vectors, combining mean and extreme behaviors. It extends the so-called 'normex' approach from a univariate to a multivariate framework. We propose two possible multi-normex distributions, named $d$-Normex and MRV-Normex. Both rely on the Gaussian distribution for describing the mean behavior, via the CLT, while the difference between the two versions comes from using the exact distribution or the EV theorem for the maximum. The main theorems provide the rate of convergence for each version of the multi-normex distributions towards the distribution of the sum, assuming second order regular variation property for the norm of the parent random vector when considering the MRV-normex case. Numerical illustrations and comparisons are proposed with various dependence structures on the parent random vector, using QQ-plots based on geometrical quantiles.

平滑 · 頻率主義學派 · 規范化的 · MoDELS · 貝葉斯估計 ·

2021 年 11 月 12 日

Wasserstein convergence in Bayesian deconvolution models

Judith Rousseau,Catia Scricciolo

We study the reknown deconvolution problem of recovering a distribution function from independent replicates (signal) additively contaminated with random errors (noise), whose distribution is known. We investigate whether a Bayesian nonparametric approach for modelling the latent distribution of the signal can yield inferences with asymptotic frequentist validity under the $L^1$-Wasserstein metric. When the error density is ordinary smooth, we develop two inversion inequalities relating either the $L^1$ or the $L^1$-Wasserstein distance between two mixture densities (of the observations) to the $L^1$-Wasserstein distance between the corresponding distributions of the signal. This smoothing inequality improves on those in the literature. We apply this general result to a Bayesian approach bayes on a Dirichlet process mixture of normal distributions as a prior on the mixing distribution (or distribution of the signal), with a Laplace or Linnik noise. In particular we construct an \textit{adaptive} approximation of the density of the observations by the convolution of a Laplace (or Linnik) with a well chosen mixture of normal densities and show that the posterior concentrates at the minimax rate up to a logarithmic factor. The same prior law is shown to also adapt to the Sobolev regularity level of the mixing density, thus leading to a new Bayesian estimation method, relative to the Wasserstein distance, for distributions with smooth densities.

測試誤差 · 有偏 · Networking · Neural Networks · 學成 ·

2021 年 11 月 12 日

A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning

Ben Adlam,Jake Levinson,Jeffrey Pennington

One of the distinguishing characteristics of modern deep learning systems is that they typically employ neural network architectures that utilize enormous numbers of parameters, often in the millions and sometimes even in the billions. While this paradigm has inspired significant research on the properties of large networks, relatively little work has been devoted to the fact that these networks are often used to model large complex datasets, which may themselves contain millions or even billions of constraints. In this work, we focus on this high-dimensional regime in which both the dataset size and the number of features tend to infinity. We analyze the performance of random feature regression with features $F=f(WX+B)$ for a random weight matrix $W$ and random bias vector $B$, obtaining exact formulae for the asymptotic training and test errors for data generated by a linear teacher model. The role of the bias can be understood as parameterizing a distribution over activation functions, and our analysis directly generalizes to such distributions, even those not expressible with a traditional additive bias. Intriguingly, we find that a mixture of nonlinearities can improve both the training and test errors over the best single nonlinearity, suggesting that mixtures of nonlinearities might be useful for approximate kernel methods or neural network architecture design.

變分自編碼 · 高斯分布 · 潛在 · MoDELS · 規范化的 ·

2018 年 9 月 26 日

Hyperspherical Variational Auto-Encoders

Tim R. Davidson,Luca Falorsi,Nicola De Cao,Thomas Kipf,Jakub M. Tomczak

from arxiv, GitHub repository: //github.com/nicola-decao/s-vae-tf, Blogpost: //nicola-decao.github.io/s-vae

The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or $\mathcal{S}$-VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, $\mathcal{N}$-VAE, in low dimensions on other data types.

MoDELS · SimPLe · CC · 模型評估 · 高斯混合（模型） ·

2018 年 2 月 24 日

The Search Problem in Mixture Models

Avik Ray,Joe Neeman,Sujay Sanghavi,Sanjay Shakkottai

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.