When performing Bayesian computations in practice, one is often faced with the challenge that the constituent model components and/or the data are only available in a distributed fashion, e.g. due to privacy concerns or sheer volume. While various methods have been proposed for performing posterior inference in such federated settings, these either make very strong assumptions on the data and/or model or otherwise introduce significant bias when the local posteriors are combined to form an approximation of the target posterior. By leveraging recently developed methods for Markov Chain Monte Carlo (MCMC) based on Piecewise Deterministic Markov Processes (PDMPs), we develop a computation -- and communication -- efficient family of posterior inference algorithms (Fed-PDMC) which provides asymptotically exact approximations of the full posterior over a large class of Bayesian models, allowing heterogenous model and data contributions from each client. We show that communication between clients and the server preserves the privacy of the individual data sources by establishing differential privacy guarantees. We quantify the performance of Fed-PDMC over a class of illustrative analytical case-studies and demonstrate its efficacy on a number of synthetic examples along with realistic Bayesian computation benchmarks.
Practitioners are often left tuning Metropolis-Hastings algorithms by trial and error or using optimal scaling guidelines to avoid poor empirical performance. We develop lower bounds on the convergence rates of geometrically ergodic accept-reject-based Markov chains (i.e. Metropolis-Hastings, non-reversible Metropolis-Hastings) to study their computational complexity. If the target density concentrates with a parameter $n$ (e.g. Bayesian posterior concentration, Laplace approximations), we show the convergence rate can tend to $1$ exponentially fast if the tuning parameters do not depend carefully on $n$. We show this is the case for random-walk Metropolis in Bayesian logistic regression with Zellner's g-prior when the dimension and sample size $d/n \to \gamma \in (0, 1)$. We focus on more general target densities using a special class of Metropolis-Hastings algorithms with a Gaussian proposal (e.g. random walk and Metropolis-adjusted Langevin algorithms) where we give more general conditions. An application to flat prior Bayesian logistic regression as $n \to \infty$ is studied. We also develop lower bounds in the Wasserstein distances which have become popular in the convergence analysis of high-dimensional MCMC algorithms with similar conclusions.
Despite the significant interest and progress in reinforcement learning (RL) problems with adversarial corruption, current works are either confined to the linear setting or lead to an undesired $\tilde{O}(\sqrt{T}\zeta)$ regret bound, where $T$ is the number of rounds and $\zeta$ is the total amount of corruption. In this paper, we consider the contextual bandit with general function approximation and propose a computationally efficient algorithm to achieve a regret of $\tilde{O}(\sqrt{T}+\zeta)$. The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit \citep{he2022nearly} and a new weighted estimator of uncertainty for the general function class. In contrast to the existing analysis that heavily relies on the linear structure, we develop a novel technique to control the sum of weighted uncertainty, thus establishing the final regret bounds. We then generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $\zeta$ in the scenario of general function approximation. Notably, our algorithms achieve regret bounds either nearly match the performance lower bound or improve the existing methods for all the corruption levels and in both known and unknown $\zeta$ cases.
Penalized likelihood models are widely used to simultaneously select variables and estimate model parameters. However, the existence of weak signals can lead to inaccurate variable selection, biased parameter estimation, and invalid inference. Thus, identifying weak signals accurately and making valid inferences are crucial in penalized likelihood models. We develop a unified approach to identify weak signals and make inferences in penalized likelihood models, including the special case when the responses are categorical. To identify weak signals, we use the estimated selection probability of each covariate as a measure of the signal strength and formulate a signal identification criterion. To construct confidence intervals, we propose a two-step inference procedure. Extensive simulation studies show that the proposed procedure outperforms several existing methods. We illustrate the proposed method by applying it to the Practice Fusion diabetes data set.
High-dimensional data can often display heterogeneity due to heteroscedastic variance or inhomogeneous covariate effects. Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data. The former is computationally challenging due to the non-smooth nature of the check loss, and the latter is sensitive to heavy-tailed error distributions. In this paper, we propose and study (penalized) robust expectile regression (retire), with a focus on iteratively reweighted $\ell_1$-penalization which reduces the estimation bias from $\ell_1$-penalization and leads to oracle properties. Theoretically, we establish the statistical properties of the retire estimator under two regimes: (i) low-dimensional regime in which $d \ll n$; (ii) high-dimensional regime in which $s\ll n\ll d$ with $s$ denoting the number of significant predictors. In the high-dimensional setting, we carefully characterize the solution path of the iteratively reweighted $\ell_1$-penalized retire estimation, adapted from the local linear approximation algorithm for folded-concave regularization. Under a mild minimum signal strength condition, we show that after as many as $\log(\log d)$ iterations the final iterate enjoys the oracle convergence rate. At each iteration, the weighted $\ell_1$-penalized convex program can be efficiently solved by a semismooth Newton coordinate descent algorithm. Numerical studies demonstrate the competitive performance of the proposed procedure compared with either non-robust or quantile regression based alternatives.
We propose to use L\'evy {\alpha}-stable distributions for constructing priors for Bayesian inverse problems. The construction is based on Markov fields with stable-distributed increments. Special cases include the Cauchy and Gaussian distributions, with stability indices {\alpha} = 1, and {\alpha} = 2, respectively. Our target is to show that these priors provide a rich class of priors for modelling rough features. The main technical issue is that the {\alpha}-stable probability density functions do not have closed-form expressions in general, and this limits their applicability. For practical purposes, we need to approximate probability density functions through numerical integration or series expansions. Current available approximation methods are either too time-consuming or do not function within the range of stability and radius arguments needed in Bayesian inversion. To address the issue, we propose a new hybrid approximation method for symmetric univariate and bivariate {\alpha}-stable distributions, which is both fast to evaluate, and accurate enough from a practical viewpoint. Then we use approximation method in the numerical implementation of {\alpha}-stable random field priors. We demonstrate the applicability of the constructed priors on selected Bayesian inverse problems which include the deconvolution problem, and the inversion of a function governed by an elliptic partial differential equation. We also demonstrate hierarchical {\alpha}-stable priors in the one-dimensional deconvolution problem. We employ maximum-a-posterior-based estimation at all the numerical examples. To that end, we exploit the limited-memory BFGS and its bounded variant for the estimator.
We consider a sequential decision making task where we are not allowed to evaluate parameters that violate an a priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on the unknown constraint and allow evaluations only in regions that are safe with high probability. Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case. Moreover, the way in which they exploit regularity assumptions about the constraint introduces an additional critical hyperparameter. In this paper, we propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate. Our approach is naturally applicable to continuous domains and does not require additional hyperparameters. We theoretically analyze the method and show that we do not violate the safety constraint with high probability and that we explore by learning about the constraint up to arbitrary precision. Empirical evaluations demonstrate improved data-efficiency and scalability.
GTFLAT, as a game theory-based add-on, addresses an important research question: How can a federated learning algorithm achieve better performance and training efficiency by setting more effective adaptive weights for averaging in the model aggregation phase? The main objectives for the ideal method of answering the question are: (1) empowering federated learning algorithms to reach better performance in fewer communication rounds, notably in the face of heterogeneous scenarios, and last but not least, (2) being easy to use alongside the state-of-the-art federated learning algorithms as a new module. To this end, GTFLAT models the averaging task as a strategic game among active users. Then it proposes a systematic solution based on the population game and evolutionary dynamics to find the equilibrium. In contrast with existing approaches that impose the weights on the participants, GTFLAT concludes a self-enforcement agreement among clients in a way that none of them is motivated to deviate from it individually. The results reveal that, on average, using GTFLAT increases the top-1 test accuracy by 1.38%, while it needs 21.06% fewer communication rounds to reach the accuracy.
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. This motivates much of the recent theoretical study on linear MDPs. However, most approaches require a given representation under unrealistic assumptions about the normalization of the decomposition or introduce unresolved computational challenges in practice. Instead, we consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning via contrastive estimation. The framework also admits confidence-adjusted index algorithms, enabling an efficient and principled approach to incorporating optimism or pessimism in the face of uncertainty. To the best of our knowledge, this provides the first practical representation learning method for linear MDPs that achieves both strong theoretical guarantees and empirical performance. Theoretically, we prove that the proposed algorithm is sample efficient in both the online and offline settings. Empirically, we demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
Large machine learning models with improved predictions have become widely available in the chemical sciences. Unfortunately, these models do not protect the privacy necessary within commercial settings, prohibiting the use of potentially extremely valuable data by others. Encrypting the prediction process can solve this problem by double-blind model evaluation and prohibits the extraction of training or query data. However, contemporary ML models based on fully homomorphic encryption or federated learning are either too expensive for practical use or have to trade higher speed for weaker security. We have implemented secure and computationally feasible encrypted machine learning models using oblivious transfer enabling and secure predictions of molecular quantum properties across chemical compound space. However, we find that encrypted predictions using kernel ridge regression models are a million times more expensive than without encryption. This demonstrates a dire need for a compact machine learning model architecture, including molecular representation and kernel matrix size, that minimizes model evaluation costs.
In recent years, mobile devices have gained increasingly development with stronger computation capability and larger storage. Some of the computation-intensive machine learning and deep learning tasks can now be run on mobile devices. To take advantage of the resources available on mobile devices and preserve users' privacy, the idea of mobile distributed machine learning is proposed. It uses local hardware resources and local data to solve machine learning sub-problems on mobile devices, and only uploads computation results instead of original data to contribute to the optimization of the global model. This architecture can not only relieve computation and storage burden on servers, but also protect the users' sensitive information. Another benefit is the bandwidth reduction, as various kinds of local data can now participate in the training process without being uploaded to the server. In this paper, we provide a comprehensive survey on recent studies of mobile distributed machine learning. We survey a number of widely-used mobile distributed machine learning methods. We also present an in-depth discussion on the challenges and future directions in this area. We believe that this survey can demonstrate a clear overview of mobile distributed machine learning and provide guidelines on applying mobile distributed machine learning to real applications.