We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on $\mathbb{R}^d$ whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry. We bound the gradient complexity to reach $\epsilon$ error in total variation distance from a warm start by $\tilde O(d^{1/4}\text{polylog}(1/\epsilon))$ and demonstrate the benefit of choosing the number of leapfrog steps to be larger than 1. To surpass previous analysis on Metropolis-adjusted Langevin algorithm (MALA) that has $\tilde{O}(d^{1/2}\text{polylog}(1/\epsilon))$ dimension dependency in Wu et al. (2022), we reveal a key feature in our proof that the joint distribution of the location and velocity variables of the discretization of the continuous HMC dynamics stays approximately invariant. This key feature, when shown via induction over the number of leapfrog steps, enables us to obtain estimates on moments of various quantities that appear in the acceptance rate control of Metropolized HMC. Moreover, to deal with another bottleneck on the HMC proposal distribution overlap control in the literature, we provide a new approach to upper bound the Kullback-Leibler divergence between push-forwards of the Gaussian distribution through HMC dynamics initialized at two different points. Notably, our analysis does not require log-concavity or independence of the marginals, and only relies on an isoperimetric inequality. To illustrate the applicability of our result, several examples of natural functions that fall into our framework are discussed.
This article studies the convergence properties of trans-dimensional MCMC algorithms when the total number of models is finite. It is shown that, for reversible and some non-reversible trans-dimensional Markov chains, under mild conditions, geometric convergence is guaranteed if the Markov chains associated with the within-model moves are geometrically ergodic. This result is proved in an $L^2$ framework using the technique of Markov chain decomposition. While the technique was previously developed for reversible chains, this work extends it to the point that it can be applied to some commonly used non-reversible chains. Under geometric convergence, a central limit theorem holds for ergodic averages, even in the absence of Harris ergodicity. This allows for the construction of simultaneous confidence intervals for features of the target distribution. This procedure is rigorously examined in a trans-dimensional setting, and special attention is given to the case where the asymptotic covariance matrix in the central limit theorem is singular. The theory and methodology herein are applied to reversible jump algorithms for two Bayesian models: a robust autoregression with unknown model order, and a probit regression with variable selection.
Using techniques developed recently in the field of compressed sensing we prove new upper bounds for general (nonlinear) sampling numbers of (quasi-)Banach smoothness spaces in $L^2$. In particular, we show that in relevant cases such as mixed and isotropic weighted Wiener classes or Sobolev spaces with mixed smoothness, sampling numbers in $L^2$ can be upper bounded by best $n$-term trigonometric widths in $L^\infty$. We describe a recovery procedure from $m$ function values based on $\ell^1$-minimization (basis pursuit denoising). With this method, a significant gain in the rate of convergence compared to recently developed linear recovery methods is achieved. In this deterministic worst-case setting we see an additional speed-up of $m^{-1/2}$ (up to log factors) compared to linear methods in case of weighted Wiener spaces. For their quasi-Banach counterparts even arbitrary polynomial speed-up is possible. Surprisingly, our approach allows to recover mixed smoothness Sobolev functions belonging to $S^r_pW(\mathbb{T}^d)$ on the $d$-torus with a logarithmically better rate of convergence than any linear method can achieve when $1 < p < 2$ and $d$ is large. This effect is not present for isotropic Sobolev spaces.
The paper addresses an optimal ensemble control problem for nonlocal continuity equations on the space of probability measures. We admit the general nonlinear cost functional, and an option to directly control the nonlocal terms of the driving vector field. For this problem, we design a descent method based on Pontryagin's maximum principle (PMP). To this end, we derive a new form of PMP with a decoupled Hamiltonian system. Specifically, we extract the adjoint system of linear nonlocal balance laws on the space of signed measures and prove its well-posedness. As an implementation of the designed descent method, we propose an indirect deterministic numeric algorithm with backtracking. We prove the convergence of the algorithm and illustrate its modus operandi by treating a simple case involving a Kuramoto-type model of a population of interacting oscillators.
In this paper, we investigate the properties of standard and multilevel Monte Carlo methods for weak approximation of solutions of stochastic differential equations (SDEs) driven by the infinite-dimensional Wiener process and Poisson random measure with the Lipschitz payoff function. The error of the truncated dimension randomized numerical scheme, which is determined by two parameters, i.e grid density $n \in \mathbb{N}_{+}$ and truncation dimension parameter $M \in \mathbb{N}_{+},$ is of the order $n^{-1/2}+\delta(M)$ such that $\delta(\cdot)$ is positive and decreasing to $0$. The paper introduces the complexity model and provides proof for the upper complexity bound of the multilevel Monte Carlo method which depends on two increasing sequences of parameters for both $n$ and $M.$ The complexity is measured in terms of upper bound for mean-squared error and compared with the complexity of the standard Monte Carlo algorithm. The results from numerical experiments as well as Python and CUDA C implementation are also reported.
Practitioners in diverse fields such as healthcare, economics and education are eager to apply machine learning to improve decision making. The cost and impracticality of performing experiments and a recent monumental increase in electronic record keeping has brought attention to the problem of evaluating decisions based on non-experimental observational data. This is the setting of this work. In particular, we study estimation of individual-level causal effects, such as a single patient's response to alternative medication, from recorded contexts, decisions and outcomes. We give generalization bounds on the error in estimated effects based on distance measures between groups receiving different treatments, allowing for sample re-weighting. We provide conditions under which our bound is tight and show how it relates to results for unsupervised domain adaptation. Led by our theoretical results, we devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance, and encourage sharing of information between treatment groups. We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances. Finally, an experimental evaluation on real and synthetic data shows the value of our proposed representation architecture and regularization scheme.
This paper proposes a non-centered parameterization based infinite-dimensional mean-field variational inference (NCP-iMFVI) approach for solving the hierarchical Bayesian inverse problems. This method can generate available estimates from the approximated posterior distribution efficiently. To avoid the mutually singular obstacle that occurred in the infinite-dimensional hierarchical approach, we propose a rigorous theory of the non-centered variational Bayesian approach. Since the non-centered parameterization weakens the connection between the parameter and the hyper-parameter, we can introduce the hyper-parameter to all terms of the eigendecomposition of the prior covariance operator. We also show the relationships between the NCP-iMFVI and infinite-dimensional hierarchical approaches with centered parameterization. The proposed algorithm is applied to three inverse problems governed by the simple smooth equation, the Helmholtz equation, and the steady-state Darcy flow equation. Numerical results confirm our theoretical findings, illustrate the efficiency of solving the iMFVI problem formulated by large-scale linear and nonlinear statistical inverse problems, and verify the mesh-independent property.
The theory of Koopman operators allows to deploy non-parametric machine learning algorithms to predict and analyze complex dynamical systems. Estimators such as principal component regression (PCR) or reduced rank regression (RRR) in kernel spaces can be shown to provably learn Koopman operators from finite empirical observations of the system's time evolution. Scaling these approaches to very long trajectories is a challenge and requires introducing suitable approximations to make computations feasible. In this paper, we boost the efficiency of different kernel-based Koopman operator estimators using random projections (sketching). We derive, implement and test the new "sketched" estimators with extensive experiments on synthetic and large-scale molecular dynamics datasets. Further, we establish non asymptotic error bounds giving a sharp characterization of the trade-offs between statistical learning rates and computational efficiency. Our empirical and theoretical analysis shows that the proposed estimators provide a sound and efficient way to learn large scale dynamical systems. In particular our experiments indicate that the proposed estimators retain the same accuracy of PCR or RRR, while being much faster.
This note addresses the question of optimally estimating a linear functional of an object acquired through linear observations corrupted by random noise, where optimality pertains to a worst-case setting tied to a symmetric, convex, and closed model set containing the object. It complements the article "Statistical Estimation and Optimal Recovery" published in the Annals of Statistics in 1994. There, Donoho showed (among other things) that, for Gaussian noise, linear maps provide near-optimal estimation schemes relatively to a performance measure relevant in Statistical Estimation. Here, we advocate for a different performance measure arguably more relevant in Optimal Recovery. We show that, relatively to this new measure, linear maps still provide near-optimal estimation schemes even if the noise is merely log-concave. Our arguments, which make a connection to the deterministic noise situation and bypass properties specific to the Gaussian case, offer an alternative to parts of Donoho's proof.
The majority of fault-tolerant distributed algorithms are designed assuming a nominal corruption model, in which at most a fraction $f_n$ of parties can be corrupted by the adversary. However, due to the infamous Sybil attack, nominal models are not sufficient to express the trust assumptions in open (i.e., permissionless) settings. Instead, permissionless systems typically operate in a weighted model, where each participant is associated with a weight and the adversary can corrupt a set of parties holding at most a fraction $f_w$ of total weight. In this paper, we suggest a simple way to transform a large class of protocols designed for the nominal model into the weighted model. To this end, we formalize and solve three novel optimization problems, which we collectively call the weight reduction problems, that allow us to map large real weights into small integer weights while preserving the properties necessary for the correctness of the protocols. In all cases, we manage to keep the sum of the integer weights to be at most linear in the number of parties, resulting in extremely efficient protocols for the weighted model. Moreover, we demonstrate that, on weight distributions that emerge in practice, the sum of the integer weights tends to be far from the theoretical worst-case and, often even smaller than the number of participants. While, for some protocols, our transformation requires an arbitrarily small reduction in resilience (i.e., $f_w = f_n - \epsilon$), surprisingly, for many important problems we manage to obtain weighted solutions with the same resilience ($f_w = f_n$) as nominal ones. Notable examples include asynchronous consensus, verifiable secret sharing, erasure-coded distributed storage and broadcast protocols.
Monte Carlo methods represent a cornerstone of computer science. They allow to sample high dimensional distribution functions in an efficient way. In this paper we consider the extension of Automatic Differentiation (AD) techniques to Monte Carlo process, addressing the problem of obtaining derivatives (and in general, the Taylor series) of expectation values. Borrowing ideas from the lattice field theory community, we examine two approaches. One is based on reweighting while the other represents an extension of the Hamiltonian approach typically used by the Hybrid Monte Carlo (HMC) and similar algorithms. We show that the Hamiltonian approach can be understood as a change of variables of the reweighting approach, resulting in much reduced variances of the coefficients of the Taylor series. This work opens the door to find other variance reduction techniques for derivatives of expectation values.