We study the convergence properties, in Hellinger and related distances, of nonparametric density estimators based on measure transport. These estimators represent the measure of interest as the pushforward of a chosen reference distribution under a transport map, where the map is chosen via a maximum likelihood objective (equivalently, minimizing an empirical Kullback-Leibler loss) or a penalized version thereof. We establish concentration inequalities for a general class of penalized measure transport estimators, by combining techniques from M-estimation with analytical properties of the transport-based density representation. We then demonstrate the implications of our theory for the case of triangular Knothe-Rosenblatt (KR) transports on the $d$-dimensional unit cube, and show that both penalized and unpenalized versions of such estimators achieve minimax optimal convergence rates over H\"older classes of densities. Specifically, we establish optimal rates for unpenalized nonparametric maximum likelihood estimation over bounded H\"older-type balls, and then for certain Sobolev-penalized estimators and sieved wavelet estimators.
Full Waveform Inversion (FWI) is a successful and well-established inverse method for reconstructing material models from measured wave signals. In the field of seismic exploration, FWI has proven particularly successful in the reconstruction of smoothly varying material deviations. In contrast, non-destructive testing (NDT) often requires the detection and specification of sharp defects in a specimen. If the contrast between materials is low, FWI can be successfully applied to these problems as well. However, so far the method is not fully suitable to image defects such as voids, which are characterized by a high contrast in the material parameters. In this paper, we introduce a dimensionless scaling function $\gamma$ to model voids in the forward and inverse scalar wave equation problem. Depending on which material parameters this function $\gamma$ scales, different modeling approaches are presented, leading to three formulations of mono-parameter FWI and one formulation of two-parameter FWI. The resulting problems are solved by first-order optimization, where the gradient is computed by an ajdoint state method. The corresponding Fr\'echet kernels are derived for each approach and the associated minimization is performed using an L-BFGS algorithm. A comparison between the different approaches shows that scaling the density with $\gamma$ is most promising for parameterizing voids in the forward and inverse problem. Finally, in order to consider arbitrary complex geometries known a priori, this approach is combined with an immersed boundary method, the finite cell method (FCM).
In this paper, we investigate the matrix estimation problem in the multi-response regression model with measurement errors. A nonconvex error-corrected estimator based on a combination of the amended loss function and the nuclear norm regularizer is proposed to estimate the matrix parameter. Then under the (near) low-rank assumption, we analyse statistical and computational theoretical properties of global solutions of the nonconvex regularized estimator from a general point of view. In the statistical aspect, we establish the nonasymptotic recovery bound for any global solution of the nonconvex estimator, under restricted strong convexity on the loss function. In the computational aspect, we solve the nonconvex optimization problem via the proximal gradient method. The algorithm is proved to converge to a near-global solution and achieve a linear convergence rate. In addition, we also verify sufficient conditions for the general results to be held, in order to obtain probabilistic consequences for specific types of measurement errors, including the additive noise and missing data. Finally, theoretical consequences are demonstrated by several numerical experiments on corrupted errors-in-variables multi-response regression models. Simulation results reveal excellent consistency with our theory under high-dimensional scaling.
Traditional nonparametric estimation methods often lead to a slow convergence rate in large dimensions and require unrealistically enormous sizes of datasets for reliable conclusions. We develop an approach based on mixed gradients, either observed or estimated, to effectively estimate the function at near-parametric convergence rates. The novel approach and computational algorithm could lead to methods useful to practitioners in many areas of science and engineering. Our theoretical results reveal a behavior universal to this class of nonparametric estimation problems. We explore a general setting involving tensor product spaces and build upon the smoothing spline analysis of variance (SS-ANOVA) framework. For $d$-dimensional models under full interaction, the optimal rates with gradient information on $p$ covariates are identical to those for the $(d-p)$-interaction models without gradients and, therefore, the models are immune to the "curse of interaction". For additive models, the optimal rates using gradient information are root-$n$, thus achieving the "parametric rate". We demonstrate aspects of the theoretical results through synthetic and real data applications.
State estimation is an essential part of autonomous systems. Integrating the Ultra-Wideband(UWB) technique has been shown to correct the long-term estimation drift and bypass the complexity of loop closure detection. However, few works on robotics adopt UWB as a stand-alone state estimation solution. The primary purpose of this work is to investigate planar pose estimation using only UWB range measurements and study the estimator's statistical efficiency. We prove the excellent property of a two-step scheme, which says that we can refine a consistent estimator to be asymptotically efficient by one step of Gauss-Newton iteration. Grounded on this result, we design the GN-ULS estimator and evaluate it through simulations and collected datasets. GN-ULS attains millimeter and sub-degree level accuracy on our static datasets and attains centimeter and degree level accuracy on our dynamic datasets, presenting the possibility of using only UWB for real-time state estimation.
We study the Bayesian density estimation of data living in the offset of an unknown submanifold of the Euclidean space. In this perspective, we introduce a new notion of anisotropic H\"older for the underlying density and obtain posterior rates that are minimax optimal and adaptive to the regularity of the density, to the intrinsic dimension of the manifold, and to the size of the offset, provided that the latter is not too small -- while still allowed to go to zero. Our Bayesian procedure, based on location-scale mixtures of Gaussians, appears to be convenient to implement and yields good practical results, even for quite singular data.
We develop the theory of a metric, which we call the $\nu$-based Wasserstein metric and denote by $W_\nu$, on the set of probability measures $\mathcal P(X)$ on a domain $X \subseteq \mathbb{R}^m$. This metric is based on a slight refinement of the notion of generalized geodesics with respect to a base measure $\nu$ and is relevant in particular for the case when $\nu$ is singular with respect to $m$-dimensional Lebesgue measure; it is also closely related to the concept of linearized optimal transport. The $\nu$-based Wasserstein metric is defined in terms of an iterated variational problem involving optimal transport to $\nu$; we also characterize it in terms of integrations of classical Wasserstein distance between the conditional probabilities and through limits of certain multi-marginal optimal transport problems. As we vary the base measure $\nu$, the $\nu$-based Wasserstein metric interpolates between the usual quadratic Wasserstein distance and a metric associated with the uniquely defined generalized geodesics obtained when $\nu$ is sufficiently regular. When $\nu$ concentrates on a lower dimensional submanifold of $\mathbb{R}^m$, we prove that the variational problem in the definition of the $\nu$-based Wasserstein distance has a unique solution. We establish geodesic convexity of the usual class of functionals and of the set of source measures $\mu$ such that optimal transport between $\mu$ and $\nu$ satisfies a strengthening of the generalized nestedness condition introduced in \cite{McCannPass20}.We finally introduce a slight variant of the dual metric mentioned above in order to prove convergence of an iterative scheme to solve a variational problem arising in game theory.
The matching principles behind optimal transport (OT) play an increasingly important role in machine learning, a trend which can be observed when OT is used to disambiguate datasets in applications (e.g. single-cell genomics) or used to improve more complex methods (e.g. balanced attention in transformers or self-supervised learning). To scale to more challenging problems, there is a growing consensus that OT requires solvers that can operate on millions, not thousands, of points. The low-rank optimal transport (LOT) approach advocated in \cite{scetbon2021lowrank} holds several promises in that regard, and was shown to complement more established entropic regularization approaches, being able to insert itself in more complex pipelines, such as quadratic OT. LOT restricts the search for low-cost couplings to those that have a low-nonnegative rank, yielding linear time algorithms in cases of interest. However, these promises can only be fulfilled if the LOT approach is seen as a legitimate contender to entropic regularization when compared on properties of interest, where the scorecard typically includes theoretical properties (statistical complexity and relation to other methods) or practical aspects (debiasing, hyperparameter tuning, initialization). We target each of these areas in this paper in order to cement the impact of low-rank approaches in computational OT.
Background: Instrumental variables (IVs) can be used to provide evidence as to whether a treatment X has a causal effect on an outcome Y. Even if the instrument Z satisfies the three core IV assumptions of relevance, independence and the exclusion restriction, further assumptions are required to identify the average causal effect (ACE) of X on Y. Sufficient assumptions for this include: homogeneity in the causal effect of X on Y; homogeneity in the association of Z with X; and no effect modification (NEM). Methods: We describe the NO Simultaneous Heterogeneity (NOSH) assumption, which requires the heterogeneity in the X-Y causal effect to be mean independent of (i.e., uncorrelated with) both Z and heterogeneity in the Z-X association. This happens, for example, if there are no common modifiers of the X-Y effect and the Z-X association, and the X-Y effect is additive linear. We illustrate NOSH using simulations and by re-examining selected published studies. Results: When NOSH holds, the Wald estimand equals the ACE even if both homogeneity assumptions and NEM (which we demonstrate to be special cases of - and therefore stronger than - NOSH) are violated. Conclusions: NOSH is sufficient for identifying the ACE using IVs. Since NOSH is weaker than existing assumptions for ACE identification, doing so may be more plausible than previously anticipated.
We revisit the method of mixture technique, also known as the Laplace method, to study the concentration phenomenon in generic exponential families. Combining the properties of Bregman divergence associated with log-partition function of the family with the method of mixtures for super-martingales, we establish a generic bound controlling the Bregman divergence between the parameter of the family and a finite sample estimate of the parameter. Our bound is time-uniform and makes appear a quantity extending the classical information gain to exponential families, which we call the Bregman information gain. For the practitioner, we instantiate this novel bound to several classical families, e.g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain. We further numerically compare the resulting confidence bounds to state-of-the-art alternatives for time-uniform concentration and show that this novel method yields competitive results. Finally, we highlight the benefit of our concentration bounds on some illustrative applications.
We study the problem of estimating an unknown parameter in a distributed and online manner. Existing work on distributed online learning typically either focuses on asymptotic analysis, or provides bounds on regret. However, these results may not directly translate into bounds on the error of the learned model after a finite number of time-steps. In this paper, we propose a distributed online estimation algorithm which enables each agent in a network to improve its estimation accuracy by communicating with neighbors. We provide non-asymptotic bounds on the estimation error, leveraging the statistical properties of the underlying model. Our analysis demonstrates a trade-off between estimation error and communication costs. Further, our analysis allows us to determine a time at which the communication can be stopped (due to the costs associated with communications), while meeting a desired estimation accuracy. We also provide a numerical example to validate our results.