An efficient simulation-based methodology is proposed for the rolling window estimation of state space models, called particle rolling Markov chain Monte Carlo (MCMC) with double block sampling. In our method, which is based on Sequential Monte Carlo (SMC), particles are sequentially updated to approximate the posterior distribution for each window by learning new information and discarding old information from observations. Th particles are refreshed with an MCMC algorithm when the importance weights degenerate. To avoid degeneracy, which is crucial for reducing the computation time, we introduce a block sampling scheme and generate multiple candidates by the algorithm based on the conditional SMC. The theoretical discussion shows that the proposed methodology with a nested structure is expressed as SMC sampling for the augmented space to provide the justification. The computational performance is evaluated in illustrative examples, showing that the posterior distributions of the model parameters are accurately estimated. The proofs and additional discussions (algorithms and experimental results) are provided in the Supplementary Material.
Nonlinear state-space models are powerful tools to describe dynamical structures in complex time series. In a streaming setting where data are processed one sample at a time, simultaneous inference of the state and its nonlinear dynamics has posed significant challenges in practice. We develop a novel online learning framework, leveraging variational inference and sequential Monte Carlo, which enables flexible and accurate Bayesian joint filtering. Our method provides an approximation of the filtering posterior which can be made arbitrarily close to the true filtering distribution for a wide class of dynamics models and observation models. Specifically, the proposed framework can efficiently approximate a posterior over the dynamics using sparse Gaussian processes, allowing for an interpretable model of the latent dynamics. Constant time complexity per sample makes our approach amenable to online learning scenarios and suitable for real-time applications.
Dimensionality reduction (DR) techniques help analysts to understand patterns in high-dimensional spaces. These techniques, often represented by scatter plots, are employed in diverse science domains and facilitate similarity analysis among clusters and data samples. For datasets containing many granularities or when analysis follows the information visualization mantra, hierarchical DR techniques are the most suitable approach since they present major structures beforehand and details on demand. However, current hierarchical DR techniques are not fully capable of addressing literature problems because they do not preserve the projection mental map across hierarchical levels or are not suitable for most data types. This work presents HUMAP, a novel hierarchical dimensionality reduction technique designed to be flexible on preserving local and global structures and preserve the mental map throughout hierarchical exploration. We provide empirical evidence of our technique's superiority compared with current hierarchical approaches and show two case studies to demonstrate its strengths.
The Wasserstein distance, rooted in optimal transport (OT) theory, is a popular discrepancy measure between probability distributions with various applications to statistics and machine learning. Despite their rich structure and demonstrated utility, Wasserstein distances are sensitive to outliers in the considered distributions, which hinders applicability in practice. Inspired by the Huber contamination model, we propose a new outlier-robust Wasserstein distance $\mathsf{W}_p^\varepsilon$ which allows for $\varepsilon$ outlier mass to be removed from each contaminated distribution. Our formulation amounts to a highly regular optimization problem that lends itself better for analysis compared to previously considered frameworks. Leveraging this, we conduct a thorough theoretical study of $\mathsf{W}_p^\varepsilon$, encompassing characterization of optimal perturbations, regularity, duality, and statistical estimation and robustness results. In particular, by decoupling the optimization variables, we arrive at a simple dual form for $\mathsf{W}_p^\varepsilon$ that can be implemented via an elementary modification to standard, duality-based OT solvers. We illustrate the benefits of our framework via applications to generative modeling with contaminated datasets.
We present Bayesian techniques for solving inverse problems which involve mean-square convergent random approximations of the forward map. Noisy approximations of the forward map arise in several fields, such as multiscale problems and probabilistic numerical methods. In these fields, a random approximation can enhance the quality or the efficiency of the inference procedure, but entails additional theoretical and computational difficulties due to the randomness of the forward map. A standard technique to address this issue is to combine Monte Carlo averaging with Markov chain Monte Carlo samplers, as for example in the pseudo-marginal Metropolis--Hastings methods. In this paper, we consider mean-square convergent random approximations, and quantify how Monte Carlo errors propagate from the forward map to the solution of the inverse problems. Moreover, we review and describe simple techniques to solve such inverse problems, and compare performances with a series of numerical experiments.
Principal Component Analysis (PCA) is a transform for finding the principal components (PCs) that represent features of random data. PCA also provides a reconstruction of the PCs to the original data. We consider an extension of PCA which allows us to improve the associated accuracy and diminish the numerical load, in comparison with known techniques. This is achieved due to the special structure of the proposed transform which contains two matrices $T_0$ and $T_1$, and a special transformation $\mathcal{f}$ of the so called auxiliary random vector $\mathbf w$. For this reason, we call it the three-term PCA. In particular, we show that the three-term PCA always exists, i.e. is applicable to the case of singular data. Both rigorous theoretical justification of the three-term PCA and simulations with real-world data are provided.
Efficient and reliable generation of global path plans are necessary for safe execution and deployment of autonomous systems. In order to generate planning graphs which adequately resolve the topology of a given environment, many sampling-based motion planners resort to coarse, heuristically-driven strategies which often fail to generalize to new and varied surroundings. Further, many of these approaches are not designed to contend with partial-observability. We posit that such uncertainty in environment geometry can, in fact, help \textit{drive} the sampling process in generating feasible, and probabilistically-safe planning graphs. We propose a method for Probabilistic Roadmaps which relies on particle-based Variational Inference to efficiently cover the posterior distribution over feasible regions in configuration space. Our approach, Stein Variational Probabilistic Roadmap (SV-PRM), results in sample-efficient generation of planning-graphs and large improvements over traditional sampling approaches. We demonstrate the approach on a variety of challenging planning problems, including real-world probabilistic occupancy maps and high-dof manipulation problems common in robotics.
We propose a novel method for sampling and optimization tasks based on a stochastic interacting particle system. We explain how this method can be used for the following two goals: (i) generating approximate samples from a given target distribution; (ii) optimizing a given objective function. The approach is derivative-free and affine invariant, and is therefore well-suited for solving inverse problems defined by complex forward models: (i) allows generation of samples from the Bayesian posterior and (ii) allows determination of the maximum a posteriori estimator. We investigate the properties of the proposed family of methods in terms of various parameter choices, both analytically and by means of numerical simulations. The analysis and numerical simulation establish that the method has potential for general purpose optimization tasks over Euclidean space; contraction properties of the algorithm are established under suitable conditions, and computational experiments demonstrate wide basins of attraction for various specific problems. The analysis and experiments also demonstrate the potential for the sampling methodology in regimes in which the target distribution is unimodal and close to Gaussian; indeed we prove that the method recovers a Laplace approximation to the measure in certain parametric regimes and provide numerical evidence that this Laplace approximation attracts a large set of initial conditions in a number of examples.
We develop an Explore-Exploit Markov chain Monte Carlo algorithm ($\operatorname{Ex^2MCMC}$) that combines multiple global proposals and local moves. The proposed method is massively parallelizable and extremely computationally efficient. We prove $V$-uniform geometric ergodicity of $\operatorname{Ex^2MCMC}$ under realistic conditions and compute explicit bounds on the mixing rate showing the improvement brought by the multiple global moves. We show that $\operatorname{Ex^2MCMC}$ allows fine-tuning of exploitation (local moves) and exploration (global moves) via a novel approach to proposing dependent global moves. Finally, we develop an adaptive scheme, $\operatorname{FlEx^2MCMC}$, that learns the distribution of global moves using normalizing flows. We illustrate the efficiency of $\operatorname{Ex^2MCMC}$ and its adaptive versions on many classical sampling benchmarks. We also show that these algorithms improve the quality of sampling GANs as energy-based models.
We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.
We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.