精品亚洲中文一区二区三区,中文字幕无线在线视频观看

In the density estimation model, we investigate the problem of constructing adaptive honest confidence sets with radius measured in Wasserstein distance $W_p$, $p\geq1$, and for densities with unknown regularity measured on a Besov scale. As sampling domains, we focus on the $d-$dimensional torus $\mathbb{T}^d$, in which case $1\leq p\leq 2$, and $\mathbb{R}^d$, for which $p=1$. We identify necessary and sufficient conditions for the existence of adaptive confidence sets with diameters of the order of the regularity-dependent $W_p$-minimax estimation rate. Interestingly, it appears that the possibility of such adaptation of the diameter depends on the dimension of the underlying space. In low dimensions, $d\leq 4$, adaptation to any regularity is possible. In higher dimensions, adaptation is possible if and only if the underlying regularities belong to some interval of width at least $d/(d-4)$. This contrasts with the usual $L_p-$theory where, independently of the dimension, adaptation requires regularities to lie in a small fixed-width window. For configurations allowing these adaptive sets to exist, we explicitly construct confidence regions via the method of risk estimation, centred at adaptive estimators. Those are the first results in a statistical approach to adaptive uncertainty quantification with Wasserstein distances. Our analysis and methods extend more globally to weak losses such as Sobolev norm distances with negative smoothness indices.

相關內容

正則化項

關注 0

周期的 · Integration · 近似 · 無限 · 數值分析 ·

2022 年 1 月 21 日

The Galerkin analysis for the random periodic solution of semilinear stochastic evolution equations

Yue Wu,Chenggui Yuan

In this paper we study the numerical method for approximating the random periodic solution of semiliear stochastic evolution equations. The main challenge lies in proving a convergence over an infinite time horizon while simulating infinite-dimensional objects. We first show the existence and uniqueness of the random periodic solution to the equation as the limit of the pull-back flows of the equation, and observe that its mild form is well-defined in the intersection of a family of decreasing Hilbert spaces. Then we propose a Galerkin-type exponential integrator scheme and establish its convergence rate of the strong error to the mild solution, where the order of convergence directly depends on the space (among the family of Hilbert spaces) for the initial point to live. We finally conclude with the best order of convergence that is arbitrarily close to 0.5.

Lipschitz · 學成 · 圖 · INFORMS · 離散化 ·

2022 年 1 月 21 日

Analysis and algorithms for $\ell_p$-based semi-supervised learning on graphs

Mauricio Flores Rios,Jeff Calder,Gilad Lerman

This paper addresses theory and applications of $\ell_p$-based Laplacian regularization in semi-supervised learning. The graph $p$-Laplacian for $p>2$ has been proposed recently as a replacement for the standard ($p=2$) graph Laplacian in semi-supervised learning problems with very few labels, where Laplacian learning is degenerate. In the first part of the paper we prove new discrete to continuum convergence results for $p$-Laplace problems on $k$-nearest neighbor ($k$-NN) graphs, which are more commonly used in practice than random geometric graphs. Our analysis shows that, on $k$-NN graphs, the $p$-Laplacian retains information about the data distribution as $p\to \infty$ and Lipschitz learning ($p=\infty$) is sensitive to the data distribution. This situation can be contrasted with random geometric graphs, where the $p$-Laplacian forgets the data distribution as $p\to \infty$. We also present a general framework for proving discrete to continuum convergence results in graph-based learning that only requires pointwise consistency and monotonicity. In the second part of the paper, we develop fast algorithms for solving the variational and game-theoretic $p$-Laplace equations on weighted graphs for $p>2$. We present several efficient and scalable algorithms for both formulations, and present numerical results on synthetic data indicating their convergence properties. Finally, we conduct extensive numerical experiments on the MNIST, FashionMNIST and EMNIST datasets that illustrate the effectiveness of the $p$-Laplacian formulation for semi-supervised learning with few labels. In particular, we find that Lipschitz learning ($p=\infty$) performs well with very few labels on $k$-NN graphs, which experimentally validates our theoretical findings that Lipschitz learning retains information about the data distribution (the unlabeled data) on $k$-NN graphs.

估計/估計量 · 離散化 · 方陣 · 噪聲 · Processing（編程語言） ·

2022 年 1 月 20 日

Least squares estimators for discretely observed stochastic processes driven by small fractional noise

S. Nakajima,S. Nakamura,Y. Shimizu

We study the problem of parameter estimation for discretely observed stochastic differential equations driven by small fractional noise. Under some conditions, we obtain strong consistency and rate of convergence of the least square estimator(LSE) when small dispersion coefficient converges to 0 and sample size converges to infty.

推斷 · 統計量 · 確切的 · Performer · 置信度 ·

2022 年 1 月 20 日

Exact Statistical Inference for the Wasserstein Distance by Selective Inference

Vo Nguyen Le Duy,Ichiro Takeuchi

In this paper, we study statistical inference for the Wasserstein distance, which has attracted much attention and has been applied to various machine learning tasks. Several studies have been proposed in the literature, but almost all of them are based on asymptotic approximation and do not have finite-sample validity. In this study, we propose an exact (non-asymptotic) inference method for the Wasserstein distance inspired by the concept of conditional Selective Inference (SI). To our knowledge, this is the first method that can provide a valid confidence interval (CI) for the Wasserstein distance with finite-sample coverage guarantee, which can be applied not only to one-dimensional problems but also to multi-dimensional problems. We evaluate the performance of the proposed method on both synthetic and real-world datasets.

估計/估計量 · Processing（編程語言） · 統計量 · 推斷 · 離散化 ·

2022 年 1 月 19 日

Adaptive inference for small diffusion processes based on sampled data

Tetsuya Kawai,Masayuki Uchida

from arxiv, 38 pages, 3 figures

We consider parametric estimation and tests for multi-dimensional diffusion processes with a small dispersion parameter $\varepsilon$ from discrete observations. For parametric estimation of diffusion processes, the main target is to estimate the drift parameter and the diffusion parameter. In this paper, we propose two types of adaptive estimators for both parameters and show their asymptotic properties under $\varepsilon\to0$, $n\to\infty$ and the balance condition that $(\varepsilon n^\rho)^{-1} =O(1)$ for some $\rho>0$. Using these adaptive estimators, we also introduce consistent adaptive testing methods and prove that test statistics for adaptive tests have asymptotic distributions under null hypothesis. In simulation studies, we examine and compare asymptotic behaviors of the two kinds of adaptive estimators and test statistics. Moreover, we treat the SIR model which describes a simple epidemic spread for a biological application.

估計/估計量 · Sphering · 相互獨立的 · 優化器 · 線性的 ·

2022 年 1 月 19 日

The minimal spherical dispersion

Joscha Prochno,Daniel Rudolf

from arxiv, 10 pages, 1 figure

We prove upper and lower bounds on the minimal spherical dispersion, improving upon previous estimates obtained by Rote and Tichy [Spherical dispersion with an application to polygonal approximation of curves, Anz. \"Osterreich. Akad. Wiss. Math.-Natur. Kl. 132 (1995), 3--10]. In particular, we see that the inverse $N(\varepsilon,d)$ of the minimal spherical dispersion is, for fixed $\varepsilon>0$, linear in the dimension $d$ of the ambient space. We also derive upper and lower bounds on the expected dispersion for points chosen independently and uniformly at random from the Euclidean unit sphere. In terms of the corresponding inverse $\widetilde{N}(\varepsilon,d)$, our bounds are optimal with respect to the dependence on $\varepsilon$.

MoDELS · 可約的 · 高斯混合（模型） · Neural Networks · 類別 ·

2021 年 1 月 15 日

Fundamental Tradeoffs in Distributionally Adversarial Training

Mohammad Mehrabi,Adel Javanmard,Ryan A. Rossi,Anup Rao,Tung Mai

from arxiv, 23 pages, 3 figures

Adversarial training is among the most effective techniques to improve the robustness of models against adversarial perturbations. However, the full effect of this approach on models is not well understood. For example, while adversarial training can reduce the adversarial risk (prediction error against an adversary), it sometimes increase standard risk (generalization error when there is no adversary). Even more, such behavior is impacted by various elements of the learning problem, including the size and quality of training data, specific forms of adversarial perturbations in the input, model overparameterization, and adversary's power, among others. In this paper, we focus on \emph{distribution perturbing} adversary framework wherein the adversary can change the test distribution within a neighborhood of the training data distribution. The neighborhood is defined via Wasserstein distance between distributions and the radius of the neighborhood is a measure of adversary's manipulative power. We study the tradeoff between standard risk and adversarial risk and derive the Pareto-optimal tradeoff, achievable over specific classes of models, in the infinite data limit with features dimension kept fixed. We consider three learning settings: 1) Regression with the class of linear models; 2) Binary classification under the Gaussian mixtures data model, with the class of linear classifiers; 3) Regression with the class of random features model (which can be equivalently represented as two-layer neural network with random first-layer weights). We show that a tradeoff between standard and adversarial risk is manifested in all three settings. We further characterize the Pareto-optimal tradeoff curves and discuss how a variety of factors, such as features correlation, adversary's power or the width of two-layer neural network would affect this tradeoff.

流形 · 近似 · 數據點 · 線性的 · 維數災難 ·

2019 年 3 月 7 日

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Barak Sober,David Levin

In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.

判別器 · GANs · 模式崩潰 · IPM · 多樣性 ·

2018 年 6 月 27 日

Approximability of Discriminators Implies Diversity in GANs

Yu Bai,Tengyu Ma,Andrej Risteski

While Generative Adversarial Networks (GANs) have empirically produced impressive results on learning complex real-world distributions, recent work has shown that they suffer from lack of diversity or mode collapse. The theoretical work of Arora et al.~\cite{AroraGeLiMaZh17} suggests a dilemma about GANs' statistical properties: powerful discriminators cause overfitting, whereas weak discriminators cannot detect mode collapse. In contrast, we show in this paper that GANs can in principle learn distributions in Wasserstein distance (or KL-divergence in many cases) with polynomial sample complexity, if the discriminator class has strong distinguishing power against the particular generator class (instead of against all possible generators). For various generator classes such as mixture of Gaussians, exponential families, and invertible neural networks generators, we design corresponding discriminators (which are often neural nets of specific architectures) such that the Integral Probability Metric (IPM) induced by the discriminators can provably approximate the Wasserstein distance and/or KL-divergence. This implies that if the training is successful, then the learned distribution is close to the true distribution in Wasserstein distance or KL divergence, and thus cannot drop modes. Our preliminary experiments show that on synthetic datasets the test IPM is well correlated with KL divergence, indicating that the lack of diversity may be caused by the sub-optimality in optimization instead of statistical inefficiency.

正則化項 · 對抗自編碼 · Better · 泛化理論 · MoDELS ·

2018 年 3 月 12 日

Wasserstein Auto-Encoders

Ilya Tolstikhin,Olivier Bousquet,Sylvain Gelly,Bernhard Schoelkopf

from arxiv, Fixed a typo in Algorithm 2

We propose the Wasserstein Auto-Encoder (WAE)---a new algorithm for building a generative model of the data distribution. WAE minimizes a penalized form of the Wasserstein distance between the model distribution and the target distribution, which leads to a different regularizer than the one used by the Variational Auto-Encoder (VAE). This regularizer encourages the encoded training distribution to match the prior. We compare our algorithm with several other techniques and show that it is a generalization of adversarial auto-encoders (AAE). Our experiments show that WAE shares many of the properties of VAEs (stable training, encoder-decoder architecture, nice latent manifold structure) while generating samples of better quality, as measured by the FID score.