We consider a family of unadjusted generalized HMC samplers, which includes standard position HMC samplers and discretizations of the underdamped Langevin process. A detailed analysis and optimization of the parameters is conducted in the Gaussian case, which shows an improvement from $1/\kappa$ to $1/\sqrt{\kappa}$ for the convergence rate in terms of the condition number $\kappa$ by using partial velocity refreshment, with respect to classical full refreshments. A similar effect is observed empirically for two related algorithms, namely Metropolis-adjusted gHMC and kinetic piecewise-deterministic Markov processes. Then, a stochastic gradient version of the samplers is considered, for which dimension-free convergence rates are established for log-concave smooth targets over a large range of parameters, gathering in a unified framework previous results on position HMC and underdamped Langevin and extending them to HMC with inertia.
Let $f \colon \mathcal{M} \to \mathbb{R}$ be a Lipschitz and geodesically convex function defined on a $d$-dimensional Riemannian manifold $\mathcal{M}$. Does there exist a first-order deterministic algorithm which (a) uses at most $O(\mathrm{poly}(d) \log(\epsilon^{-1}))$ subgradient queries to find a point with target accuracy $\epsilon$, and (b) requires only $O(\mathrm{poly}(d))$ arithmetic operations per query? In convex optimization, the classical ellipsoid method achieves this. After detailing related work, we provide an ellipsoid-like algorithm with query complexity $O(d^2 \log^2(\epsilon^{-1}))$ and per-query complexity $O(d^2)$ for the limited case where $\mathcal{M}$ has constant curvature (hemisphere or hyperbolic space). We then detail possible approaches and corresponding obstacles for designing an ellipsoid-like method for general Riemannian manifolds.
In this paper, we derive results about the limiting distribution of the empirical magnetization vector and the maximum likelihood (ML) estimates of the natural parameters in the tensor Curie-Weiss Potts model. Our results reveal surprisingly new phase transition phenomena including the existence of a smooth curve in the interior of the parameter plane on which the magnetization vector and the ML estimates have mixture limiting distributions, the latter comprising of both continuous and discrete components, and a surprising superefficiency phenomenon of the ML estimates, which stipulates an $N^{-3/4}$ rate of convergence of the estimates to some non-Gaussian distribution at certain special points of one type and an $N^{-5/6}$ rate of convergence to some other non-Gaussian distribution at another special point of a different type. The last case can arise only for one particular value of the tuple of the tensor interaction order and the number of colors. These results are then used to derive asymptotic confidence intervals for the natural parameters at all points where consistent estimation is possible.
Least-squares programming is a popular tool in robotics due to its simplicity and availability of open-source solvers. However, certain problems like sparse programming in the $\ell_0$- or $\ell_1$-norm for time-optimal control are not equivalently solvable. In this work, we propose a non-linear hierarchical least-squares programming (NL-HLSP) for time-optimal control of non-linear discrete dynamic systems. We use a continuous approximation of the heaviside step function with an additional term that avoids vanishing gradients. We use a simple discretization method by keeping states and controls piece-wise constant between discretization steps. This way, we obtain a comparatively easily implementable NL-HLSP in contrast to direct transcription approaches of optimal control. We show that the NL-HLSP indeed recovers the discrete time-optimal control in the limit for resting goal points. We confirm the results in simulation for linear and non-linear control scenarios.
In this work, we consider space-time goal-oriented a posteriori error estimation for parabolic problems. Temporal and spatial discretizations are based on Galerkin finite elements of continuous and discontinuous type. The main objectives are the development and analysis of space-time estimators, in which the localization is based on a weak form employing a partition-of-unity. The resulting error indicators are used for temporal and spatial adaptivity. Our developments are substantiated with several numerical examples.
Multi-distribution learning is a natural generalization of PAC learning to settings with multiple data distributions. There remains a significant gap between the known upper and lower bounds for PAC-learnable classes. In particular, though we understand the sample complexity of learning a VC dimension d class on $k$ distributions to be $O(\epsilon^{-2} \ln(k)(d + k) + \min\{\epsilon^{-1} dk, \epsilon^{-4} \ln(k) d\})$, the best lower bound is $\Omega(\epsilon^{-2}(d + k \ln(k)))$. We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.
Convergence rate analyses of random walk Metropolis-Hastings Markov chains on general state spaces have largely focused on establishing sufficient conditions for geometric ergodicity or on analysis of mixing times. Geometric ergodicity is a key sufficient condition for the Markov chain Central Limit Theorem and allows rigorous approaches to assessing Monte Carlo error. The sufficient conditions for geometric ergodicity of the random walk Metropolis-Hastings Markov chain are refined and extended, which allows the analysis of previously inaccessible settings such as Bayesian Poisson regression. The key technical innovation is the development of explicit drift and minorization conditions for random walk Metropolis-Hastings, which allows explicit upper and lower bounds on the geometric rate of convergence. Further, lower bounds on the geometric rate of convergence are also developed using spectral theory. The existing sufficient conditions for geometric ergodicity, to date, have not provided explicit constraints on the rate of geometric rate of convergence because the method used only implies the existence of drift and minorization conditions. The theoretical results are applied to random walk Metropolis-Hastings algorithms for a class of exponential families and generalized linear models that address Bayesian Regression problems.
Active domain adaptation (ADA) aims to improve the model adaptation performance by incorporating active learning (AL) techniques to label a maximally-informative subset of target samples. Conventional AL methods do not consider the existence of domain shift, and hence, fail to identify the truly valuable samples in the context of domain adaptation. To accommodate active learning and domain adaption, the two naturally different tasks, in a collaborative framework, we advocate that a customized learning strategy for the target data is the key to the success of ADA solutions. We present Divide-and-Adapt (DiaNA), a new ADA framework that partitions the target instances into four categories with stratified transferable properties. With a novel data subdivision protocol based on uncertainty and domainness, DiaNA can accurately recognize the most gainful samples. While sending the informative instances for annotation, DiaNA employs tailored learning strategies for the remaining categories. Furthermore, we propose an informativeness score that unifies the data partitioning criteria. This enables the use of a Gaussian mixture model (GMM) to automatically sample unlabeled data into the proposed four categories. Thanks to the "divideand-adapt" spirit, DiaNA can handle data with large variations of domain gap. In addition, we show that DiaNA can generalize to different domain adaptation settings, such as unsupervised domain adaptation (UDA), semi-supervised domain adaptation (SSDA), source-free domain adaptation (SFDA), etc.
In inverse problems, one attempts to infer spatially variable functions from indirect measurements of a system. To practitioners of inverse problems, the concept of "information" is familiar when discussing key questions such as which parts of the function can be inferred accurately and which cannot. For example, it is generally understood that we can identify system parameters accurately only close to detectors, or along ray paths between sources and detectors, because we have "the most information" for these places. Although referenced in many publications, the "information" that is invoked in such contexts is not a well understood and clearly defined quantity. Herein, we present a definition of information density that is based on the variance of coefficients as derived from a Bayesian reformulation of the inverse problem. We then discuss three areas in which this information density can be useful in practical algorithms for the solution of inverse problems, and illustrate the usefulness in one of these areas -- how to choose the discretization mesh for the function to be reconstructed -- using numerical experiments.
Imitation learning (IL) seeks to teach agents specific tasks through expert demonstrations. One of the key approaches to IL is to define a distance between agent and expert and to find an agent policy that minimizes that distance. Optimal transport methods have been widely used in imitation learning as they provide ways to measure meaningful distances between agent and expert trajectories. However, the problem of how to optimally combine multiple expert demonstrations has not been widely studied. The standard method is to simply concatenate state (-action) trajectories, which is problematic when trajectories are multi-modal. We propose an alternative method that uses a multi-marginal optimal transport distance and enables the combination of multiple and diverse state-trajectories in the OT sense, providing a more sensible geometric average of the demonstrations. Our approach enables an agent to learn from several experts, and its efficiency is analyzed on OpenAI Gym control environments and demonstrates that the standard method is not always optimal.
For every $g\geq 2$ we distinguish real period matrices of real Riemann surfaces of topological type $(g,0,0)$ from the ones of topological type $(g,k,1)$, with $k$ equal to one or two for $g$ even or odd respectively (Theorem B). To that purpose, we exhibit new invariants of real principally polarized abelian varieties of orthosymmetric type (Theorem A.1). As a direct application, we obtain an exhaustive criterion to decide about the existence of real points on a real Riemann surface, requiring only a real period matrix of its and the evaluation of the sign of at most one (real) theta constant (Theorem C). A part of our real, algebro-geometric instruments first appeared in the framework of nonlinear integrable partial differential equations.