Modern sample points in many applications no longer comprise real vectors in a real vector space but sample points of much more complex structures, which may be represented as points in a space with a certain underlying geometric structure, namely a manifold. Manifold learning is an emerging field for learning the underlying structure. The study of manifold learning can be split into two main branches: dimension reduction and manifold fitting. With the aim of combining statistics and geometry, we address the problem of manifold fitting in the ambient space. Inspired by the relation between the eigenvalues of the Laplace-Beltrami operator and the geometry of a manifold, we aim to find a small set of points that preserve the geometry of the underlying manifold. From this relationship, we extend the idea of subsampling to sample points in high-dimensional space and employ the Moving Least Squares (MLS) approach to approximate the underlying manifold. We analyze the two core steps in our proposed method theoretically and also provide the bounds for the MLS approach. Our simulation results and theoretical analysis demonstrate the superiority of our method in estimating the underlying manifold.
Let ${\mathcal M}\subset {\mathbb R}^n$ be a $C^2$-smooth compact submanifold of dimension $d$. Assume that the volume of ${\mathcal M}$ is at most $V$ and the reach (i.e. the normal injectivity radius) of ${\mathcal M}$ is greater than $\tau$. Moreover, let $\mu$ be a probability measure on ${\mathcal M}$ whose density on ${\mathcal M}$ is a strictly positive Lipschitz-smooth function. Let $x_j\in {\mathcal M}$, $j=1,2,\dots,N$ be $N$ independent random samples from distribution $\mu$. Also, let $\xi_j$, $j=1,2,\dots, N$ be independent random samples from a Gaussian random variable in ${\mathbb R}^n$ having covariance $\sigma^2I$, where $\sigma$ is less than a certain specified function of $d, V$ and $\tau$. We assume that we are given the data points $y_j=x_j+\xi_j,$ $j=1,2,\dots,N$, modelling random points of ${\mathcal M}$ with measurement noise. We develop an algorithm which produces from these data, with high probability, a $d$ dimensional submanifold ${\mathcal M}_o\subset {\mathbb R}^n$ whose Hausdorff distance to ${\mathcal M}$ is less than $Cd\sigma^2/\tau$ and whose reach is greater than $c{\tau}/d^6$ with universal constants $C,c > 0$. The number $N$ of random samples required depends almost linearly on $n$, polynomially on $\sigma^{-1}$ and exponentially on $d$.
An important challenge in statistical analysis lies in controlling the estimation bias when handling the ever-increasing data size and model complexity. For example, approximate methods are increasingly used to address the analytical and/or computational challenges when implementing standard estimators, but they often lead to inconsistent estimators. So consistent estimators can be difficult to obtain, especially for complex models and/or in settings where the number of parameters diverges with the sample size. We propose a general simulation-based estimation framework that allows to construct consistent and bias corrected estimators for parameters of increasing dimensions. The key advantage of the proposed framework is that it only requires to compute a simple inconsistent estimator multiple times. The resulting Just Identified iNdirect Inference estimator (JINI) enjoys nice properties, including consistency, asymptotic normality, and finite sample bias correction better than alternative methods. We further provide a simple algorithm to construct the JINI in a computationally efficient manner. Therefore, the JINI is especially useful in settings where standard methods may be challenging to apply, for example, in the presence of misclassification and rounding. We consider comprehensive simulation studies and analyze an alcohol consumption data example to illustrate the excellent performance and usefulness of the method.
We introduce Universal Solution Manifold Network (USM-Net), a novel surrogate model, based on Artificial Neural Networks (ANNs), which applies to differential problems whose solution depends on physical and geometrical parameters. Our method employs a mesh-less architecture, thus overcoming the limitations associated with image segmentation and mesh generation required by traditional discretization methods. Indeed, we encode geometrical variability through scalar landmarks, such as coordinates of points of interest. In biomedical applications, these landmarks can be inexpensively processed from clinical images. Our approach is non-intrusive and modular, as we select a data-driven loss function. The latter can also be modified by considering additional constraints, thus leveraging available physical knowledge. Our approach can also accommodate a universal coordinate system, which supports the USM-Net in learning the correspondence between points belonging to different geometries, boosting prediction accuracy on unobserved geometries. Finally, we present two numerical test cases in computational fluid dynamics involving variable Reynolds numbers as well as computational domains of variable shape. The results show that our method allows for inexpensive but accurate approximations of velocity and pressure, avoiding computationally expensive image segmentation, mesh generation, or re-training for every new instance of physical parameters and shape of the domain.
This paper studies how well generative adversarial networks (GANs) learn probability distributions from finite samples. Our main results establish the convergence rates of GANs under a collection of integral probability metrics defined through H\"older classes, including the Wasserstein distance as a special case. We also show that GANs are able to adaptively learn data distributions with low-dimensional structures or have H\"older densities, when the network architectures are chosen properly. In particular, for distributions concentrated around a low-dimensional set, we show that the learning rates of GANs do not depend on the high ambient dimension, but on the lower intrinsic dimension. Our analysis is based on a new oracle inequality decomposing the estimation error into the generator and discriminator approximation error and the statistical error, which may be of independent interest.
We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards. One key element of our approach is the use of information theoretic principles to shape our representations in a way so that our novelty reward goes beyond pixel similarity. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.
One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.
We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.
Models for dependent data are distinguished by their targets of inference. Marginal models are useful when interest lies in quantifying associations averaged across a population of clusters. When the functional form of a covariate-outcome association is unknown, flexible regression methods are needed to allow for potentially non-linear relationships. We propose a novel marginal additive model (MAM) for modelling cluster-correlated data with non-linear population-averaged associations. The proposed MAM is a unified framework for estimation and uncertainty quantification of a marginal mean model, combined with inference for between-cluster variability and cluster-specific prediction. We propose a fitting algorithm that enables efficient computation of standard errors and corrects for estimation of penalty terms. We demonstrate the proposed methods in simulations and in application to (i) a longitudinal study of beaver foraging behaviour, and (ii) a spatial analysis of Loaloa infection in West Africa. R code for implementing the proposed methodology is available at //github.com/awstringer1/mam.
Likelihood-based, or explicit, deep generative models use neural networks to construct flexible high-dimensional densities. This formulation directly contradicts the manifold hypothesis, which states that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. In this paper we investigate the pathologies of maximum-likelihood training in the presence of this dimensionality mismatch. We formally prove that degenerate optima are achieved wherein the manifold itself is learned but not the distribution on it, a phenomenon we call manifold overfitting. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation, and prove that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting. We also show that these procedures enable density estimation on the manifolds learned by implicit models, such as generative adversarial networks, hence addressing a major shortcoming of these models. Several recently proposed methods are instances of our two-step procedures; we thus unify, extend, and theoretically justify a large class of models.
Sufficient dimension reduction (SDR) is a successful tool in regression models. It is a feasible method to solve and analyze the nonlinear nature of the regression problems. This paper introduces the \textbf{itdr} R package that provides several functions based on integral transformation methods to estimate the SDR subspaces in a comprehensive and user-friendly manner. In particular, the \textbf{itdr} package includes the Fourier method (FM) and the convolution method (CM) of estimating the SDR subspaces such as the central mean subspace (CMS) and the central subspace (CS). In addition, the \textbf{itdr} package facilitates the recovery of the CMS and the CS by using the iterative Hessian transformation (IHT) method and the Fourier transformation approach for inverse dimension reduction method (invFM), respectively. Moreover, the use of the package is illustrated by three datasets. \textcolor{black}{Furthermore, this is the first package that implements integral transformation methods to estimate SDR subspaces. Hence, the \textbf{itdr} package may provide a huge contribution to research in the SDR field.