Spike-and-slab and horseshoe regression are arguably the most popular Bayesian variable selection approaches for linear regression models. However, their performance can deteriorate if outliers and heteroskedasticity are present in the data, which are common features in many real-world statistics and machine learning applications. In this work, we propose a Bayesian nonparametric approach to linear regression that performs variable selection while accounting for outliers and heteroskedasticity. Our proposed model is an instance of a Dirichlet process scale mixture model with the advantage that we can derive the full conditional distributions of all parameters in closed form, hence producing an efficient Gibbs sampler for posterior inference. Moreover, we present how to extend the model to account for heavy-tailed response variables. The performance of the model is tested against competing algorithms on synthetic and real-world datasets.
We develop a Bayesian nonparametric autoregressive model applied to flexibly estimate general transition densities exhibiting nonlinear lag dependence. Our approach is related to Bayesian density regression using Dirichlet process mixtures, with the Markovian likelihood defined through the conditional distribution obtained from the mixture. This results in a Bayesian nonparametric extension of a mixtures-of-experts model formulation. We address computational challenges to posterior sampling that arise from the Markovian structure in the likelihood. The base model is illustrated with synthetic data from a classical model for population dynamics, as well as a series of waiting times between eruptions of Old Faithful Geyser. We study inferences available through the base model before extending the methodology to include automatic relevance detection among a pre-specified set of lags. Inference for global and local lag selection is explored with additional simulation studies, and the methods are illustrated through analysis of an annual time series of pink salmon abundance in a stream in Alaska. We further explore and compare transition density estimation performance for alternative configurations of the proposed model.
Many datasets are collected automatically, and are thus easily contaminated by outliers. In order to overcome this issue there was recently a regain of interest in robust estimation. However, most robust estimation methods are designed for specific models. In regression, methods have been notably developed for estimating the regression coefficients in generalized linear models, while some other approaches have been proposed e.g.\ for robust inference in beta regression or in sample selection models. In this paper, we propose Maximum Mean Discrepancy optimization as a universal framework for robust regression. We prove non-asymptotic error bounds, showing that our estimators are robust to Huber-type contamination. We also provide a (stochastic) gradient algorithm for computing these estimators, whose implementation requires only to be able to sample from the model and to compute the gradient of its log-likelihood function. We finally illustrate the proposed approach by a set of simulations.
In this work we investigate the variation of the online kernelized ridge regression algorithm in the setting of $d-$dimensional adversarial nonparametric regression. We derive the regret upper bounds on the classes of Sobolev spaces $W_{p}^{\beta}(\mathcal{X})$, $p\geq 2, \beta>\frac{d}{p}$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $\beta> \frac{d}{2}$ or $p=\infty$ these rates are (essentially) optimal. Finally, we compare the performance of the kernelized ridge regression forecaster to the known non-parametric forecasters in terms of the regret rates and their computational complexity as well as to the excess risk rates in the setting of statistical (i.i.d.) nonparametric regression.
We develop a post-selective Bayesian framework to jointly and consistently estimate parameters within automatic group-sparse regression models. Selected through an indispensable class of learning algorithms, e.g. the Group LASSO, the overlapping Group LASSO, the sparse Group LASSO etc., uncertainty estimates for the matched parameters are unreliable in the absence of adjustments for selection bias. Limiting however the application of state of the art tools for the group-sparse problem include estimation strictly tailored to (i) real-valued projections onto very specific selected subspaces, (ii) selection events admitting representations as linear inequalities in the data variables. Our Bayesian methods address these gaps by deriving an adjustment factor in an easily feasible analytic form that eliminates bias from the selection of promising groups. Paying a very nominal price for this adjustment, experiments on simulated data and the Human Connectome Project demonstrate the efficacy of our methods at a joint estimation of group-sparse parameters learned from data.
Variable selection is crucial in high-dimensional omics-based analyses, since it is biologically reasonable to assume only a subset of non-noisy features contributes to the data structures. However, the task is particularly hard in an unsupervised setting, and a priori ad hoc variable selection is still a very frequent approach, despite the evident drawbacks and lack of reproducibility. We propose a Bayesian variable selection approach for rank-based transcriptomic analysis. Making use of data rankings instead of the actual continuous measurements increases the robustness of conclusions when compared to classical statistical methods, and embedding variable selection into the inferential tasks allows complete reproducibility. Specifically, we develop a novel extension of the Bayesian Mallows model for variable selection that allows for a full probabilistic analysis, leading to coherent quantification of uncertainties. We test our approach on simulated data using several data generating procedures, demonstrating the versatility and robustness of the method under different scenarios. We then use the novel approach to analyse genome-wide RNAseq gene expression data from ovarian cancer samples: several genes that affect cancer development are correctly detected in a completely unsupervised fashion, showing the method usefulness in the context of signature discovery for cancer genomics. Moreover, the possibility to also perform uncertainty quantification plays a key role in the subsequent biological investigation.
For the general parametric regression models with covariates contaminated with normal measurement errors, this paper proposes an accelerated version of the classical simulation extrapolation algorithm to estimate the unknown parameters in the regression function. By applying the conditional expectation directly to the target function, the proposed algorithm successfully removes the simulation step, by generating an estimation equation either for immediate use or for extrapolating, thus significantly reducing the computational time. Large sample properties of the resulting estimator, including the consistency and the asymptotic normality, are thoroughly discussed. Potential wide applications of the proposed estimation procedure are illustrated by examples, simulation studies, as well as a real data analysis.
Subspace-valued functions arise in a wide range of problems, including parametric reduced order modeling (PROM). In PROM, each parameter point can be associated with a subspace, which is used for Petrov-Galerkin projections of large system matrices. Previous efforts to approximate such functions use interpolations on manifolds, which can be inaccurate and slow. To tackle this, we propose a novel Bayesian nonparametric model for subspace prediction: the Gaussian Process Subspace regression (GPS) model. This method is extrinsic and intrinsic at the same time: with multivariate Gaussian distributions on the Euclidean space, it induces a joint probability model on the Grassmann manifold, the set of fixed-dimensional subspaces. The GPS adopts a simple yet general correlation structure, and a principled approach for model selection. Its predictive distribution admits an analytical form, which allows for efficient subspace prediction over the parameter space. For PROM, the GPS provides a probabilistic prediction at a new parameter point that retains the accuracy of local reduced models, at a computational complexity that does not depend on system dimension, and thus is suitable for online computation. We give four numerical examples to compare our method to subspace interpolation, as well as two methods that interpolate local reduced models. Overall, GPS is the most data efficient, more computationally efficient than subspace interpolation, and gives smooth predictions with uncertainty quantification.
Interactions among multiple time series of positive random variables are crucial in diverse financial applications, from spillover effects to volatility interdependence. A popular model in this setting is the vector Multiplicative Error Model (vMEM) which poses a linear iterative structure on the dynamics of the conditional mean, perturbed by a multiplicative innovation term. A main limitation of vMEM is however its restrictive assumption on the distribution of the random innovation term. A Bayesian semiparametric approach that models the innovation vector as an infinite location-scale mixture of multidimensional kernels with support on the positive orthant is used to address this major shortcoming of vMEM. Computational complications arising from the constraints to the positive orthant are avoided through the formulation of a slice sampler on the parameter-extended unconstrained version of the model. The method is applied to simulated and real data and a flexible specification is obtained that outperforms the classical ones in terms of fitting and predictive power.
The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizing for expected performance, many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.