好男人在线观看免费2019,18禁不卡无毒免费网站入口,欧美日韩国产在线一区不卡,理论片午午伦夜理片久久

Discrepancy measures between probability distributions, often termed statistical distances, are ubiquitous in probability theory, statistics and machine learning. To combat the curse of dimensionality when estimating these distances from data, recent work has proposed smoothing out local irregularities in the measured distributions via convolution with a Gaussian kernel. Motivated by the scalability of this framework to high dimensions, we investigate the structural and statistical behavior of the Gaussian-smoothed $p$-Wasserstein distance $\mathsf{W}_p^{(\sigma)}$, for arbitrary $p\geq 1$. After establishing basic metric and topological properties of $\mathsf{W}_p^{(\sigma)}$, we explore the asymptotic statistical behavior of $\mathsf{W}_p^{(\sigma)}(\hat{\mu}_n,\mu)$, where $\hat{\mu}_n$ is the empirical distribution of $n$ independent observations from $\mu$. We prove that $\mathsf{W}_p^{(\sigma)}$ enjoys a parametric empirical convergence rate of $n^{-1/2}$, which contrasts the $n^{-1/d}$ rate for unsmoothed $\mathsf{W}_p$ when $d \geq 3$. Our proof relies on controlling $\mathsf{W}_p^{(\sigma)}$ by a $p$th-order smooth Sobolev distance $\mathsf{d}_p^{(\sigma)}$ and deriving the limit distribution of $\sqrt{n}\,\mathsf{d}_p^{(\sigma)}(\hat{\mu}_n,\mu)$, for all dimensions $d$. As applications, we provide asymptotic guarantees for two-sample testing and minimum distance estimation using $\mathsf{W}_p^{(\sigma)}$, with experiments for $p=2$ using a maximum mean discrepancy formulation of $\mathsf{d}_2^{(\sigma)}$.

相關內容

統計量

關注 3

DRM · 梯度上升 · 梯度上升法 · 估計/估計量 · 近似 ·

2022 年 2 月 22 日

Approximate gradient ascent methods for distortion risk measures

Nithia Vijayan,Prashanth L. A

from arxiv, arXiv admin note: text overlap with arXiv:2107.04422

We propose approximate gradient ascent algorithms for risk-sensitive reinforcement learning control problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using distortion risk measure (DRM) of the cumulative discounted reward. Our algorithms estimate the DRM using order statistics of the cumulative rewards, and calculate approximate gradients from the DRM estimates using a smoothed functional-based gradient estimation scheme. We derive non-asymptotic bounds that establish the convergence of our proposed algorithms to an approximate stationary point of the DRM objective.

CASE · Fractal · CASES · 向量化 · 確切的 ·

2022 年 2 月 22 日

Asymptotics of the quantization errors for Markov-type measures with complete overlaps

Sanguo Zhu

Let $\mathcal{G}$ be a directed graph with vertices $1,2,\ldots, 2N$. Let $\mathcal{T}=(T_{i,j})_{(i,j)\in\mathcal{G}}$ be a family of contractive similitudes. For every $1\leq i\leq N$, let $i^+:=i+N$. For $1\leq i,j\leq N$, we define $\mathcal{M}_{i,j}=\{(i,j),(i,j^+),(i^+,j),(i^+,j^+)\}\cap\mathcal{G}$. We assume that $T_{\widetilde{i},\widetilde{j}}=T_{i,j}$ for every $(\widetilde{i},\widetilde{j})\in \mathcal{M}_{i,j}$. Let $K$ denote the Mauldin-Williams fractal determined by $\mathcal{T}$. Let $\chi=(\chi_i)_{i=1}^{2N}$ be a positive probability vector and $P$ a row-stochastic matrix which serves as an incidence matrix for $\mathcal{G}$. We denote by $\nu$ the Markov-type measure associated with $\chi$ and $P$. Let $\Omega=\{1,\ldots,2N\}$ and $G_\infty=\{\sigma\in\Omega^{\mathbb{N}}:(\sigma_i,\sigma_{i+1})\in\mathcal{G}, \;i\geq 1\}$. Let $\pi$ be the natural projection from $G_\infty$ to $K$ and $\mu=\nu\circ\pi^{-1}$. We consider the following two cases: 1. $\mathcal{G}$ has two strongly connected components consisting of $N$ vertices; 2. $\mathcal{G}$ is strongly connected. With some assumptions for $\mathcal{G}$ and $\mathcal{T}$, for case 1, we determine the exact value $s_r$ of the quantization dimension $D_r(\mu)$ for $\mu$ and prove that the $s_r$-dimensional lower quantization coefficient is always positive, but the upper one can be infinite; we establish a necessary and sufficient condition for the upper quantization coefficient for $\mu$ to be finite; for case 2, we determine $D_r(\mu)$ in terms of a pressure-like function and prove that $D_r(\mu)$-dimensional upper and lower quantization coefficient are both positive and finite.

離散化 · 對抗學習 · 學成 · 正則化 · Machine Learning ·

2022 年 2 月 21 日

Simultaneous Transport Evolution for Minimax Equilibria on Measures

Carles Domingo-Enrich,Joan Bruna

from arxiv, Error in the proof of Lemma 1, which makes Theorem 1 not hold

Min-max optimization problems arise in several key machine learning setups, including adversarial learning and generative modeling. In their general form, in absence of convexity/concavity assumptions, finding pure equilibria of the underlying two-player zero-sum game is computationally hard [Daskalakis et al., 2021]. In this work we focus instead in finding mixed equilibria, and consider the associated lifted problem in the space of probability measures. By adding entropic regularization, our main result establishes global convergence towards the global equilibrium by using simultaneous gradient ascent-descent with respect to the Wasserstein metric -- a dynamics that admits efficient particle discretization in high-dimensions, as opposed to entropic mirror descent. We complement this positive result with a related entropy-regularized loss which is not bilinear but still convex-concave in the Wasserstein geometry, and for which simultaneous dynamics do not converge yet timescale separation does. Taken together, these results showcase the benign geometry of bilinear games in the space of measures, enabling particle dynamics with global qualitative convergence guarantees.

優化器 · 代價 · 維數災難 · Lipschitz · Performer ·

2022 年 2 月 21 日

Empirical Optimal Transport between Different Measures Adapts to Lower Complexity

Shayan Hundrieser,Thomas Staudt,Axel Munk

from arxiv, 28 pages, 4 figures

The empirical optimal transport (OT) cost between two probability measures from random data is a fundamental quantity in transport based data analysis. In this work, we derive novel guarantees for its convergence rate when the involved measures are different, possibly supported on different spaces. Our central observation is that the statistical performance of the empirical OT cost is determined by the less complex measure, a phenomenon we refer to as lower complexity adaptation of empirical OT. For instance, under Lipschitz ground costs, we find that the empirical OT cost based on $n$ observations converges at least with rate $n^{-1/d}$ to the population quantity if one of the two measures is concentrated on a $d$-dimensional manifold, while the other can be arbitrary. For semi-concave ground costs, we show that the upper bound for the rate improves to $n^{-2/d}$. Similarly, our theory establishes the general convergence rate $n^{-1/2}$ for semi-discrete OT. All of these results are valid in the two-sample case as well, meaning that the convergence rate is still governed by the simpler of the two measures. On a conceptual level, our findings therefore suggest that the curse of dimensionality only affects the estimation of the OT cost when both measures exhibit a high intrinsic dimension. Our proofs are based on the dual formulation of OT as a maximization over a suitable function class $\mathcal{F}_c$ and the observation that the $c$-transform of $\mathcal{F}_c$ under bounded costs has the same uniform metric entropy as $\mathcal{F}_c$ itself.

Neural Networks · GANs · Networking · 近似 · 近似誤差 ·

2022 年 2 月 21 日

Approximation bounds for norm constrained neural networks with applications to regression and GANs

Yuling Jiao,Yang Wang,Yunfei Yang

This paper studies the approximation capacity of ReLU neural networks with norm constraint on the weights. We prove upper and lower bounds on the approximation error of these networks for smooth function classes. The lower bound is derived through the Rademacher complexity of neural networks, which may be of independent interest. We apply these approximation bounds to analyze the convergence of regression using norm constrained neural networks and distribution estimation by GANs. In particular, we obtain convergence rates for over-parameterized neural networks. It is also shown that GANs can achieve optimal rate of learning probability distributions, when the discriminator is a properly chosen norm constrained neural network.

簇 · Performer · K-均值 · Extensibility · 泛函導數 ·

2022 年 2 月 20 日

Clustering by the Probability Distributions from Extreme Value Theory

Sixiao Zheng,Ke Fan,Yanxi Hou,Jianfeng Feng,Yanwei Fu

from arxiv, IEEE Transactions on Artificial Intelligence

Clustering is an essential task to unsupervised learning. It tries to automatically separate instances into coherent subsets. As one of the most well-known clustering algorithms, k-means assigns sample points at the boundary to a unique cluster, while it does not utilize the information of sample distribution or density. Comparably, it would potentially be more beneficial to consider the probability of each sample in a possible cluster. To this end, this paper generalizes k-means to model the distribution of clusters. Our novel clustering algorithm thus models the distributions of distances to centroids over a threshold by Generalized Pareto Distribution (GPD) in Extreme Value Theory (EVT). Notably, we propose the concept of centroid margin distance, use GPD to establish a probability model for each cluster, and perform a clustering algorithm based on the covering probability function derived from GPD. Such a GPD k-means thus enables the clustering algorithm from the probabilistic perspective. Correspondingly, we also introduce a naive baseline, dubbed as Generalized Extreme Value (GEV) k-means. GEV fits the distribution of the block maxima. In contrast, the GPD fits the distribution of distance to the centroid exceeding a sufficiently large threshold, leading to a more stable performance of GPD k-means. Notably, GEV k-means can also estimate cluster structure and thus perform reasonably well over classical k-means. Thus, extensive experiments on synthetic datasets and real datasets demonstrate that GPD k-means outperforms competitors. The github codes are released in //github.com/sixiaozheng/EVT-K-means.

估計/估計量 · 正則化項 · 分段 · 優化器 · 泛函 ·

2022 年 2 月 19 日

Error estimation of a discontinuous Galerkin method for time fractional subdiffusion problems with nonsmooth data

Binjie Li,Hao Luo,Xiaoping Xie

This paper is devoted to the numerical analysis of a piecewise constant discontinuous Galerkin method for time fractional subdiffusion problems. The regularity of weak solution is firstly established by using variational approach and Mittag-Leffler function. Then several optimal error estimates are derived with low regularity data. Finally, numerical experiments are conducted to verify the theoretical results.

估計/估計量 · 穩健性 · Fashion MNIST (數據集) · WGAN · SimPLe ·

2022 年 2 月 18 日

When OT meets MoM: Robust estimation of Wasserstein Distance

Guillaume Staerman,Pierre Laforgue,Pavlo Mozharovskyi,Florence d'Alché-Buc

Issued from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations. In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to leverage Medians of Means (MoM) estimators to robustify the estimation of Wasserstein distance. Exploiting the dual Kantorovitch formulation of Wasserstein distance, we introduce and discuss novel MoM-based robust estimators whose consistency is studied under a data contamination model and for which convergence rates are provided. These MoM estimators enable to make Wasserstein Generative Adversarial Network (WGAN) robust to outliers, as witnessed by an empirical study on two benchmarks CIFAR10 and Fashion MNIST. Eventually, we discuss how to combine MoM with the entropy-regularized approximation of the Wasserstein distance and propose a simple MoM-based re-weighting scheme that could be used in conjunction with the Sinkhorn algorithm.

統計量 · 樣例 · PDE · 穩健性 · 相似度 ·

2022 年 2 月 18 日

Theoretical Guarantees for the Statistical Finite Element Method

Yanni Papandreou,Jon Cockayne,Mark Girolami,Andrew B. Duncan

from arxiv, 27 pages for main article, 11 pages for supplement, 8 figures; typos corrected

The statistical finite element method (StatFEM) is an emerging probabilistic method that allows observations of a physical system to be synthesised with the numerical solution of a PDE intended to describe it in a coherent statistical framework, to compensate for model error. This work presents a new theoretical analysis of the statistical finite element method demonstrating that it has similar convergence properties to the finite element method on which it is based. Our results constitute a bound on the Wasserstein-2 distance between the ideal prior and posterior and the StatFEM approximation thereof, and show that this distance converges at the same mesh-dependent rate as finite element solutions converge to the true solution. Several numerical examples are presented to demonstrate our theory, including an example which test the robustness of StatFEM when extended to nonlinear quantities of interest.

MCMC · MoDELS · 近似 · 樣本 · 負定 ·

2022 年 2 月 17 日

Sampling Approximately Low-Rank Ising Models: MCMC meets Variational Methods

Frederic Koehler,Holden Lee,Andrej Risteski

from arxiv, 43 pages

We consider Ising models on the hypercube with a general interaction matrix $J$, and give a polynomial time sampling algorithm when all but $O(1)$ eigenvalues of $J$ lie in an interval of length one, a situation which occurs in many models of interest. This was previously known for the Glauber dynamics when *all* eigenvalues fit in an interval of length one; however, a single outlier can force the Glauber dynamics to mix torpidly. Our general result implies the first polynomial time sampling algorithms for low-rank Ising models such as Hopfield networks with a fixed number of patterns and Bayesian clustering models with low-dimensional contexts, and greatly improves the polynomial time sampling regime for the antiferromagnetic/ferromagnetic Ising model with inconsistent field on expander graphs. It also improves on previous approximation algorithm results based on the naive mean-field approximation in variational methods and statistical physics. Our approach is based on a new fusion of ideas from the MCMC and variational inference worlds. As part of our algorithm, we define a new nonconvex variational problem which allows us to sample from an exponential reweighting of a distribution by a negative definite quadratic form, and show how to make this procedure provably efficient using stochastic gradient descent. On top of this, we construct a new simulated tempering chain (on an extended state space arising from the Hubbard-Stratonovich transform) which overcomes the obstacle posed by large positive eigenvalues, and combine it with the SGD-based sampler to solve the full problem.