Choosing models from a hypothesis space is a frequent task in approximation theory and inverse problems. Cross-validation is a classical tool in the learner's repertoire to compare the goodness of fit for different reconstruction models. Much work has been dedicated to computing this quantity in a fast manner but tackling its theoretical properties occurs to be difficult. So far, most optimality results are stated in an asymptotic fashion. In this paper we propose a concentration inequality on the difference of cross-validation score and the risk functional with respect to the squared error. This gives a pre-asymptotic bound which holds with high probability. For the assumptions we rely on bounds on the uniform error of the model which allow for a broadly applicable framework. We support our claims by applying this machinery to Shepard's model, where we are able to determine precise constants of the concentration inequality. Numerical experiments in combination with fast algorithms indicate the applicability of our results.
We are interested in the optimization of convex domains under a PDE constraint. Due to the difficulties of approximating convex domains in $\mathbb{R}^3$, the restriction to rotationally symmetric domains is used to reduce shape optimization problems to a two-dimensional setting. For the optimization of an eigenvalue arising in a problem of optimal insulation, the existence of an optimal domain is proven. An algorithm is proposed that can be applied to general shape optimization problems under the geometric constraints of convexity and rotational symmetry. The approximated optimal domains for the eigenvalue problem in optimal insulation are discussed.
We consider an elliptic linear-quadratic parameter estimation problem with a finite number of parameters. An adaptive finite element method driven by an a posteriori error estimator for the error in the parameters is presented. Unlike prior results in the literature, our estimator, which is composed of standard energy error residual estimators for the state equation and suitable co-state problems, reflects the faster convergence of the parameter error compared to the (co)-state variables. We show optimal convergence rates of our method; in particular and unlike prior works, we prove that the estimator decreases with a rate that is the sum of the best approximation rates of the state and co-state variables. Experiments confirm that our method matches the convergence rate of the parameter error.
We introduce a framework for obtaining tight mixing times for Markov chains based on what we call restricted modified log-Sobolev inequalities. Modified log-Sobolev inequalities (MLSI) quantify the rate of relative entropy contraction for the Markov operator, and are notoriously difficult to establish. However, infinitesimally close to stationarity, entropy contraction becomes equivalent to variance contraction, a.k.a. a Poincare inequality, which is significantly easier to establish through, e.g., spectral analysis. Motivated by this observation, we study restricted modified log-Sobolev inequalities that guarantee entropy contraction not for all starting distributions, but for those in a large neighborhood of the stationary distribution. We show how to sample from the hardcore and Ising models on $n$-node graphs that have a constant $\delta$ relative gap to the tree-uniqueness threshold, in nearly-linear time $\widetilde O_{\delta}(n)$. Notably, our bound does not depend on the maximum degree $\Delta$, and is therefore optimal even for high-degree graphs. This improves on prior mixing time bounds of $\widetilde O_{\delta, \Delta}(n)$ and $\widetilde O_{\delta}(n^2)$, established via (non-restricted) modified log-Sobolev and Poincare inequalities respectively. We further show that optimal concentration inequalities can still be achieved from the restricted form of modified log-Sobolev inequalities. To establish restricted entropy contraction, we extend the entropic independence framework of Anari, Jain, Koehler, Pham, and Vuong to the setting of distributions that are spectrally independent under a restricted set of external fields. We also develop an orthogonal trick that might be of independent interest: utilizing Bernoulli factories we show how to implement Glauber dynamics updates on high-degree graphs in $O(1)$ time, assuming standard adjacency array representation of the graph.
We investigate identifying the boundary of a domain from sample points in the domain. We introduce new estimators for the normal vector to the boundary, distance of a point to the boundary, and a test for whether a point lies within a boundary strip. The estimators can be efficiently computed and are more accurate than the ones present in the literature. We provide rigorous error estimates for the estimators. Furthermore we use the detected boundary points to solve boundary-value problems for PDE on point clouds. We prove error estimates for the Laplace and eikonal equations on point clouds. Finally we provide a range of numerical experiments illustrating the performance of our boundary estimators, applications to PDE on point clouds, and tests on image data sets.
A \emph{strong coreset} for the mean queries of a set $P$ in ${\mathbb{R}}^d$ is a small weighted subset $C\subseteq P$, which provably approximates its sum of squared distances to any center (point) $x\in {\mathbb{R}}^d$. A \emph{weak coreset} is (also) a small weighted subset $C$ of $P$, whose mean approximates the mean of $P$. While computing the mean of $P$ can be easily computed in linear time, its coreset can be used to solve harder constrained version, and is in the heart of generalizations such as coresets for $k$-means clustering. In this paper, we survey most of the mean coreset construction techniques, and suggest a unified analysis methodology for providing and explaining classical and modern results including step-by-step proofs. In particular, we collected folklore and scattered related results, some of which are not formally stated elsewhere. Throughout this survey, we present, explain, and prove a set of techniques, reductions, and algorithms very widespread and crucial in this field. However, when put to use in the (relatively simple) mean problem, such techniques are much simpler to grasp. The survey may help guide new researchers unfamiliar with the field, and introduce them to the very basic foundations of coresets, through a simple, yet fundamental, problem. Experts in this area might appreciate the unified analysis flow, and the comparison table for existing results. Finally, to encourage and help practitioners and software engineers, we provide full open source code for all presented algorithms.
NASA's Global Ecosystem Dynamics Investigation (GEDI) is a key climate mission whose goal is to advance our understanding of the role of forests in the global carbon cycle. While GEDI is the first space-based LIDAR explicitly optimized to measure vertical forest structure predictive of aboveground biomass, the accurate interpretation of this vast amount of waveform data across the broad range of observational and environmental conditions is challenging. Here, we present a novel supervised machine learning approach to interpret GEDI waveforms and regress canopy top height globally. We propose a probabilistic deep learning approach based on an ensemble of deep convolutional neural networks(CNN) to avoid the explicit modelling of unknown effects, such as atmospheric noise. The model learns to extract robust features that generalize to unseen geographical regions and, in addition, yields reliable estimates of predictive uncertainty. Ultimately, the global canopy top height estimates produced by our model have an expected RMSE of 2.7 m with low bias.
In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.
Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.