In applications of imprecise probability, analysts must compute lower (or upper) expectations, defined as the infimum of an expectation over a set of parameter values. Monte Carlo methods consistently approximate expectations at fixed parameter values, but can be costly to implement in grid search to locate minima over large subsets of the parameter space. We investigate the use of stochastic iterative root-finding methods for efficiently computing lower expectations. In two examples we illustrate the use of various stochastic approximation methods, and demonstrate their superior performance in comparison to grid search.
In this paper, we consider a Bayesian inverse problem modeled by elliptic partial differential equations (PDEs). Specifically, we propose a data-driven and model-based approach to accelerate the Hamiltonian Monte Carlo (HMC) method in solving large-scale Bayesian inverse problems. The key idea is to exploit (model-based) and construct (data-based) the intrinsic approximate low-dimensional structure of the underlying problem which consists of two components - a training component that computes a set of data-driven basis to achieve significant dimension reduction in the solution space, and a fast solving component that computes the solution and its derivatives for a newly sampled elliptic PDE with the constructed data-driven basis. Hence we achieve an effective data and model-based approach for the Bayesian inverse problem and overcome the typical computational bottleneck of HMC - repeated evaluation of the Hamiltonian involving the solution (and its derivatives) modeled by a complex system, a multiscale elliptic PDE in our case. We present numerical examples to demonstrate the accuracy and efficiency of the proposed method.
Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(\varepsilon^{-3})$ to produce a stochastic $\varepsilon$-stationary solution, if a mean-squared smoothness condition holds and $\Theta(\varepsilon^{-1})$ samples are available for the initial update. Different from existing optimal methods, PStorm can still achieve a near-optimal complexity result $\tilde{O}(\varepsilon^{-3})$ by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.
Physics-based character animation has seen significant advances in recent years with the adoption of Deep Reinforcement Learning (DRL). However, DRL-based learning methods are usually computationally expensive and their performance crucially depends on the choice of hyperparameters. Tuning hyperparameters for these methods often requires repetitive training of control policies, which is even more computationally prohibitive. In this work, we propose a novel Curriculum-based Multi-Fidelity Bayesian Optimization framework (CMFBO) for efficient hyperparameter optimization of DRL-based character control systems. Using curriculum-based task difficulty as fidelity criterion, our method improves searching efficiency by gradually pruning search space through evaluation on easier motor skill tasks. We evaluate our method on two physics-based character control tasks: character morphology optimization and hyperparameter tuning of DeepMimic. Our algorithm significantly outperforms state-of-the-art hyperparameter optimization methods applicable for physics-based character animation. In particular, we show that hyperparameters optimized through our algorithm result in at least 5x efficiency gain comparing to author-released settings in DeepMimic.
We study high-dimensional Bayesian linear regression with product priors. Using the nascent theory of non-linear large deviations (Chatterjee and Dembo,2016), we derive sufficient conditions for the leading-order correctness of the naive mean-field approximation to the log-normalizing constant of the posterior distribution. Subsequently, assuming a true linear model for the observed data, we derive a limiting infinite dimensional variational formula for the log normalizing constant of the posterior. Furthermore, we establish that under an additional "separation" condition, the variational problem has a unique optimizer, and this optimizer governs the probabilistic properties of the posterior distribution. We provide intuitive sufficient conditions for the validity of this "separation" condition. Finally, we illustrate our results on concrete examples with specific design matrices.
Estimation and prediction in high dimensional multivariate factor stochastic volatility models is an important and active research area because such models allow a parsimonious representation of multivariate stochastic volatility. Bayesian inference for factor stochastic volatility models is usually done by Markov chain Monte Carlo methods, often by particle Markov chain Monte Carlo, which are usually slow for high dimensional or long time series because of the large number of parameters and latent states involved. Our article makes two contributions. The first is to propose fast and accurate variational Bayes methods to approximate the posterior distribution of the states and parameters in factor stochastic volatility models. The second contribution is to extend this batch methodology to develop fast sequential variational updates for prediction as new observations arrive. The methods are applied to simulated and real datasets and shown to produce good approximate inference and prediction compared to the latest particle Markov chain Monte Carlo approaches, but are much faster.
A fundamental problem in numerical analysis and approximation theory is approximating smooth functions by polynomials. A much harder version under recent consideration is to enforce bounds constraints on the approximating polynomial. In this paper, we consider the problem of approximating functions by polynomials whose Bernstein coefficients with respect to a given degree satisfy such bounds, which implies such bounds on the approximant. We frame the problem as an inequality-constrained optimization problem and give an algorithm for finding the Bernstein coefficients of the exact solution. Additionally, our method can be modified slightly to include equality constraints such as mass preservation. It also extends naturally to multivariate polynomials over a simplex.
We present three alternative derivations of the method of characteristics (MOC) for a second order nonlinear hyperbolic partial differential equation. The MOC gives rise to two mutually coupled systems of ordinary differential equations. As a special case we consider the Monge-Amp\`ere equation, for which we solve the system of ODE's using explicit one-step methods (Euler, Runge-Kutta) and spline interpolation. Numerical examples demonstrate the performance of the methods.
This paper concerns a convex, stochastic zeroth-order optimization (S-ZOO) problem, where the objective is to minimize the expectation of a cost function and its gradient is not accessible directly. To solve this problem, traditional optimization techniques mostly yield query complexities that grow polynomially with dimensionality, i.e., the number of function evaluations is a polynomial function of the number of decision variables. Consequently, these methods may not perform well in solving massive-dimensional problems arising in many modern applications. Although more recent methods can be provably dimension-insensitive, almost all of them work with arguably more stringent conditions such as everywhere sparse or compressible gradient. Thus, prior to this research, it was unknown whether dimension-insensitive S-ZOO is possible without such conditions. In this paper, we give an affirmative answer to this question by proposing a sparsity-inducing stochastic gradient-free (SI-SGF) algorithm. It is proved to achieve dimension-insensitive query complexity in both convex and strongly convex cases when neither gradient sparsity nor gradient compressibility is satisfied. Our numerical results demonstrate the strong potential of the proposed SI-SGF compared with existing alternatives.
The numerical solution of differential equations can be formulated as an inference problem to which formal statistical approaches can be applied. However, nonlinear partial differential equations (PDEs) pose substantial challenges from an inferential perspective, most notably the absence of explicit conditioning formula. This paper extends earlier work on linear PDEs to a general class of initial value problems specified by nonlinear PDEs, motivated by problems for which evaluations of the right-hand-side, initial conditions, or boundary conditions of the PDE have a high computational cost. The proposed method can be viewed as exact Bayesian inference under an approximate likelihood, which is based on discretisation of the nonlinear differential operator. Proof-of-concept experimental results demonstrate that meaningful probabilistic uncertainty quantification for the unknown solution of the PDE can be performed, while controlling the number of times the right-hand-side, initial and boundary conditions are evaluated. A suitable prior model for the solution of the PDE is identified using novel theoretical analysis of the sample path properties of Mat\'{e}rn processes, which may be of independent interest.
Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.