We consider the problem of reducing the dimensions of parameters and data in non-Gaussian Bayesian inference problems. Our goal is to identify an "informed" subspace of the parameters and an "informative" subspace of the data so that a high-dimensional inference problem can be approximately reformulated in low-to-moderate dimensions, thereby improving the computational efficiency of many inference techniques. To do so, we exploit gradient evaluations of the log-likelihood function. Furthermore, we use an information-theoretic analysis to derive a bound on the posterior error due to parameter and data dimension reduction. This bound relies on logarithmic Sobolev inequalities, and it reveals the appropriate dimensions of the reduced variables. We compare our method with classical dimension reduction techniques, such as principal component analysis and canonical correlation analysis, on applications ranging from mechanics to image processing.
Offline estimators are often inadequate for real-time applications. Nevertheless, many online estimators are found by sacrificing some statistical efficiency. This paper presents a general framework to understand and construct efficient nonparametric estimators for online problems. Statistically, we choose long-run variance as an exemplary estimand and derive the first set of sufficient conditions for O(1)-time or O(1)-space update, which allows methodological generation of estimators. Our asymptotic theory shows that the generated estimators dominate existing alternatives. Computationally, we introduce mini-batch estimation to accelerate online estimators for real-time applications. Implementation issues such as automatic optimal parameters selection are discussed. Practically, we demonstrate how to use our framework with recent development in change point detection, causal inference, and stochastic approximation. We also illustrate the strength of our estimators in some classical problems such as Markov chain Monte Carlo convergence diagnosis and confidence interval construction.
Approximate Bayesian computation (ABC) is commonly used for parameter estimation and model comparison for intractable simulator-based models whose likelihood function cannot be evaluated. In this paper we instead investigate the feasibility of ABC as a generic approximate method for predictive inference, in particular, for computing the posterior predictive distribution of future observations or missing data of interest. We consider three complementary ABC approaches for this goal, each based on different assumptions regarding which predictive density of the intractable model can be sampled from. The case where only simulation from the joint density of the observed and future data given the model parameters can be used for inference is given particular attention and it is shown that the ideal summary statistic in this setting is minimal predictive sufficient instead of merely minimal sufficient (in the ordinary sense). An ABC prediction approach that takes advantage of a certain latent variable representation is also investigated. We additionally show how common ABC sampling algorithms can be used in the predictive settings considered. Our main results are first illustrated by using simple time-series models that facilitate analytical treatment, and later by using two common intractable dynamic models.
The integration of discrete algorithmic components in deep learning architectures has numerous applications. Recently, Implicit Maximum Likelihood Estimation (IMLE, Niepert, Minervini, and Franceschi 2021), a class of gradient estimators for discrete exponential family distributions, was proposed by combining implicit differentiation through perturbation with the path-wise gradient estimator. However, due to the finite difference approximation of the gradients, it is especially sensitive to the choice of the finite difference step size which needs to be specified by the user. In this work, we present Adaptive IMLE (AIMLE) the first adaptive gradient estimator for complex discrete distributions: it adaptively identifies the target distribution for IMLE by trading off the density of gradient information with the degree of bias in the gradient estimates. We empirically evaluate our estimator on synthetic examples, as well as on Learning to Explain, Discrete Variational Auto-Encoders, and Neural Relational Inference tasks. In our experiments, we show that our adaptive gradient estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.
Parallel Markov Chain Monte Carlo (pMCMC) algorithms generate clouds of proposals at each step to efficiently resolve a target probability distribution. We build a rigorous foundational framework for pMCMC algorithms that situates these methods within a unified `extended phase space' measure-theoretic formalism. Drawing on our recent work that provides a comprehensive theory for reversible single proposal methods, we herein derive general criteria for multiproposal acceptance mechanisms which yield unbiased chains on general state spaces. Our formulation encompasses a variety of methodologies, including proposal cloud resampling and Hamiltonian methods, while providing a basis for the derivation of novel algorithms. In particular, we obtain a top-down picture for a class of methods arising from `conditionally independent' proposal structures. As an immediate application, we identify several new algorithms including a multiproposal version of the popular preconditioned Crank-Nicolson (pCN) sampler suitable for high- and infinite-dimensional target measures which are absolutely continuous with respect to a Gaussian base measure. To supplement our theoretical results, we carry out a selection of numerical case studies that evaluate the efficacy of these novel algorithms. First, noting that the true potential of pMCMC algorithms arises from their natural parallelizability, we provide a limited parallelization study using TensorFlow and a graphics processing unit to scale pMCMC algorithms that leverage as many as 100k proposals at each step. Second, we use our multiproposal pCN algorithm (mpCN) to resolve a selection of problems in Bayesian statistical inversion for partial differential equations motivated by fluid measurement. These examples provide preliminary evidence of the efficacy of mpCN for high-dimensional target distributions featuring complex geometries and multimodal structures.
An important problem across disciplines is the discovery of interventions that produce a desired outcome. When the space of possible interventions is large, making an exhaustive search infeasible, experimental design strategies are needed. In this context, encoding the causal relationships between the variables, and thus the effect of interventions on the system, is critical in order to identify desirable interventions efficiently. We develop an iterative causal method to identify optimal interventions, as measured by the discrepancy between the post-interventional mean of the distribution and a desired target mean. We formulate an active learning strategy that uses the samples obtained so far from different interventions to update the belief about the underlying causal model, as well as to identify samples that are most informative about optimal interventions and thus should be acquired in the next batch. The approach employs a Bayesian update for the causal model and prioritizes interventions using a carefully designed, causally informed acquisition function. This acquisition function is evaluated in closed form, allowing for efficient optimization. The resulting algorithms are theoretically grounded with information-theoretic bounds and provable consistency results. We illustrate the method on both synthetic data and real-world biological data, namely gene expression data from Perturb-CITE-seq experiments, to identify optimal perturbations that induce a specific cell state transition; the proposed causal approach is observed to achieve better sample efficiency compared to several baselines. In both cases we observe that the causally informed acquisition function notably outperforms existing criteria allowing for optimal intervention design with significantly less experiments.
Mixture models are widely used to fit complex and multimodal datasets. In this paper we study mixtures with high dimensional sparse latent parameter vectors and consider the problem of support recovery of those vectors. While parameter learning in mixture models is well-studied, the sparsity constraint remains relatively unexplored. Sparsity of parameter vectors is a natural constraint in variety of settings, and support recovery is a major step towards parameter estimation. We provide efficient algorithms for support recovery that have a logarithmic sample complexity dependence on the dimensionality of the latent space. Our algorithms are quite general, namely they are applicable to 1) mixtures of many different canonical distributions including Uniform, Poisson, Laplace, Gaussians, etc. 2) Mixtures of linear regressions and linear classifiers with Gaussian covariates under different assumptions on the unknown parameters. In most of these settings, our results are the first guarantees on the problem while in the rest, our results provide improvements on existing works.
Bayesian model selection provides a powerful framework for objectively comparing models directly from observed data, without reference to ground truth data. However, Bayesian model selection requires the computation of the marginal likelihood (model evidence), which is computationally challenging, prohibiting its use in many high-dimensional Bayesian inverse problems. With Bayesian imaging applications in mind, in this work we present the proximal nested sampling methodology to objectively compare alternative Bayesian imaging models for applications that use images to inform decisions under uncertainty. The methodology is based on nested sampling, a Monte Carlo approach specialised for model comparison, and exploits proximal Markov chain Monte Carlo techniques to scale efficiently to large problems and to tackle models that are log-concave and not necessarily smooth (e.g., involving l_1 or total-variation priors). The proposed approach can be applied computationally to problems of dimension O(10^6) and beyond, making it suitable for high-dimensional inverse imaging problems. It is validated on large Gaussian models, for which the likelihood is available analytically, and subsequently illustrated on a range of imaging problems where it is used to analyse different choices of dictionary and measurement model.
This paper proposes a generalization of Gaussian mixture models, where the mixture weight is allowed to behave as an unknown function of time. This model is capable of successfully capturing the features of the data, as demonstrated by simulated and real datasets. It can be useful in studies such as clustering, change-point and process control. In order to estimate the mixture weight function, we propose two new Bayesian nonlinear dynamic approaches for polynomial models, that can be extended to other problems involving polynomial nonlinear dynamic models. One of the methods, called here component-wise Metropolis-Hastings, apply the Metropolis-Hastings algorithm to each local level component of the state equation. It is more general and can be used in any situation where the observation and state equations are nonlinearly connected. The other method tends to be faster, but is applied specifically to binary data (using the probit link function). The performance of these methods of estimation, in the context of the proposed dynamic Gaussian mixture model, is evaluated through simulated datasets. Also, an application to an array Comparative Genomic Hybridization (aCGH) dataset from glioblastoma cancer illustrates our proposal, highlighting the ability of the method to detect chromosome aberrations.
Statistical inferences for high-dimensional regression models have been extensively studied for their wide applications ranging from genomics, neuroscience, to economics. In practice, there are often potential unmeasured confounders associated with both the response and covariates, leading to the invalidity of the standard debiasing methods. This paper focuses on a generalized linear regression framework with hidden confounding and proposes a debiasing approach to address this high-dimensional problem by adjusting for effects induced by the unmeasured confounders. We establish consistency and asymptotic normality for the proposed debiased estimator. The finite sample performance of the proposed method is demonstrated via extensive numerical studies and an application to a genetic dataset.
This paper studies the E-Bayesian (expectation of the Bayesian estimation) estimation of the parameter of Lomax distribution based on different loss functions. Under different loss functions, we calculate the Bayesian estimation of the parameter and then calculate the expectation of the estimated value to get the E-Bayesian estimation. To measure the estimated error, the E-MSE (expected mean squared error) is introduced. And the formulas of E-Bayesian estimation and E-MSE are given. By applying Markov Chain Monte Carlo technology, we analyze the performances of the proposed methods. Results are compared on the basis of E-MSE. Then, cases of samples in real data sets are presented for illustration. In order to test whether the Lomax distribution can be used in analyzing the datasets, Kolmogorov Smirnov tests are conducted. Using real data, we can get the maximum likelihood estimation at the same time and compare it with E-Bayesian estimation. At last, we get the results of the comparison between Bayesian and E-Bayesian estimation methods under three different loss functions.