We investigate the application of ensemble transform approaches to Bayesian inference of logistic regression problems. Our approach relies on appropriate extensions of the popular ensemble Kalman filter and the feedback particle filter to the cross entropy loss function and is based on a well-established homotopy approach to Bayesian inference. The arising finite particle evolution equations as well as their mean-field limits are affine-invariant. Furthermore, the proposed methods can be implemented in a gradient-free manner in case of nonlinear logistic regression and the data can be randomly subsampled similar to mini-batching of stochastic gradient descent. We also propose a closely related SDE-based sampling method which again is affine-invariant and can easily be made gradient-free. Numerical examples demonstrate the appropriateness of the proposed methodologies.
The Ensemble Kalman inversion (EKI), proposed by Iglesias et al. for the solution of Bayesian inverse problems of type $y=A u^\dagger +\varepsilon$, with $u^\dagger$ being an unknown parameter and $y$ a given datum, is a powerful tool usually derived from a sequential Monte Carlo point of view. It describes the dynamics of an ensemble of particles $\{u^j(t)\}_{j=1}^J$, whose initial empirical measure is sampled from the prior, evolving over an artificial time $t$ towards an approximate solution of the inverse problem. Using spectral techniques, we provide a complete description of the deterministic dynamics of EKI and their asymptotic behavior in parameter space. In particular, we analyze the dynamics of deterministic EKI and mean-field EKI. We demonstrate that the Bayesian posterior can only be recovered with the mean-field limit and not with finite sample sizes or deterministic EKI. Furthermore, we show that -- even in the deterministic case -- residuals in parameter space do not decrease monotonously in the Euclidean norm and suggest a problem-adapted norm, where monotonicity can be proved. Finally, we derive a system of ordinary differential equations governing the spectrum and eigenvectors of the covariance matrix.
In the statistical literature, sparse modeling is the standard approach to achieve improvements in prediction tasks and interpretability. Alternatively, in the seminal paper "Statistical Modeling: The Two Cultures," Breiman (2001) advocated for the adoption of algorithmic approaches to generate ensembles to achieve superior prediction accuracy than single-model methods at the cost of loss of interpretability. In a recent important and critical paper, Rudin (2019) argued that blackbox algorithmic approaches should be avoided for high-stakes decisions and that the tradeoff between accuracy and interpretability is a myth. In response to this recent change in philosophy, we generalize best subset selection (BSS) to best split selection (BSpS), a data-driven approach aimed at finding the optimal split of predictor variables among the models of an ensemble. The proposed methodology results in an ensemble of sparse and diverse models that provide possible mechanisms that explain the relationship between the predictors and the response. The high computational cost of BSpS motivates the need for computational tractable ways to approximate the exhaustive search, and we benchmark one such recent proposal by Christidis et al. (2020) based on a multi-convex relaxation. Our objective with this article is to motivate research in this new exciting field with great potential for data analysis tasks for high-dimensional data.
We consider random matrix ensembles on the Hermitian matrices that are heavy tailed, in particular not all moments exist, and that are invariant under the conjugate action of the unitary group. The latter property entails that the eigenvectors are Haar distributed and, therefore, factorise from the eigenvalue statistics. We prove a classification for stable matrix ensembles of this kind of matrices represented in terms of matrices, their eigenvalues and their diagonal entries with the help of the classification of the multivariate stable distributions and the harmonic analysis on symmetric matrix spaces. Moreover, we identify sufficient and necessary conditions for their domains of attraction. To illustrate our findings we discuss for instance elliptical invariant random matrix ensembles and P\'olya ensembles. As a byproduct we generalise the derivative principle on the Hermitian matrices to general tempered distributions.
In this work we study the orbit recovery problem over $SO(3)$, where the goal is to recover a band-limited function on the sphere from noisy measurements of randomly rotated copies of it. This is a natural abstraction for the problem of recovering the three-dimensional structure of a molecule through cryo-electron tomography. Symmetries play an important role: Recovering the function up to rotation is equivalent to solving a system of polynomial equations that comes from the invariant ring associated with the group action. Prior work investigated this system through computational algebra tools up to a certain size. However many statistical and algorithmic questions remain: How many moments suffice for recovery, or equivalently at what degree do the invariant polynomials generate the full invariant ring? And is it possible to algorithmically solve this system of polynomial equations? We revisit these problems from the perspective of smoothed analysis whereby we perturb the coefficients of the function in the basis of spherical harmonics. Our main result is a quasi-polynomial time algorithm for orbit recovery over $SO(3)$ in this model. We analyze a popular heuristic called frequency marching that exploits the layered structure of the system of polynomial equations by setting up a system of {\em linear} equations to solve for the higher-order frequencies assuming the lower-order ones have already been found. The main questions are: Do these systems have a unique solution? And how fast can the errors compound? Our main technical contribution is in bounding the condition number of these algebraically-structured linear systems. Thus smoothed analysis provides a compelling model in which we can expand the types of group actions we can handle in orbit recovery, beyond the finite and/or abelian case.
Many popular specifications for Vector Autoregressions (VARs) with multivariate stochastic volatility are not invariant to the way the variables are ordered due to the use of a Cholesky decomposition for the error covariance matrix. We show that the order invariance problem in existing approaches is likely to become more serious in large VARs. We propose the use of a specification which avoids the use of this Cholesky decomposition. We show that the presence of multivariate stochastic volatility allows for identification of the proposed model and prove that it is invariant to ordering. We develop a Markov Chain Monte Carlo algorithm which allows for Bayesian estimation and prediction. In exercises involving artificial and real macroeconomic data, we demonstrate that the choice of variable ordering can have non-negligible effects on empirical results. In a macroeconomic forecasting exercise involving VARs with 20 variables we find that our order-invariant approach leads to the best forecasts and that some choices of variable ordering can lead to poor forecasts using a conventional, non-order invariant, approach.
Reinforcement learning algorithms can solve dynamic decision-making and optimal control problems. With continuous-valued state and input variables, reinforcement learning algorithms must rely on function approximators to represent the value function and policy mappings. Commonly used numerical approximators, such as neural networks or basis function expansions, have two main drawbacks: they are black-box models offering little insight into the mappings learned, and they require extensive trial and error tuning of their hyper-parameters. In this paper, we propose a new approach to constructing smooth value functions in the form of analytic expressions by using symbolic regression. We introduce three off-line methods for finding value functions based on a state-transition model: symbolic value iteration, symbolic policy iteration, and a direct solution of the Bellman equation. The methods are illustrated on four nonlinear control problems: velocity control under friction, one-link and two-link pendulum swing-up, and magnetic manipulation. The results show that the value functions yield well-performing policies and are compact, mathematically tractable, and easy to plug into other algorithms. This makes them potentially suitable for further analysis of the closed-loop system. A comparison with an alternative approach using neural networks shows that our method outperforms the neural network-based one.
Learning how to effectively control unknown dynamical systems is crucial for intelligent autonomous systems. This task becomes a significant challenge when the underlying dynamics are changing with time. Motivated by this challenge, this paper considers the problem of controlling an unknown Markov jump linear system (MJS) to optimize a quadratic objective. By taking a model-based perspective, we consider identification-based adaptive control for MJSs. We first provide a system identification algorithm for MJS to learn the dynamics in each mode as well as the Markov transition matrix, underlying the evolution of the mode switches, from a single trajectory of the system states, inputs, and modes. Through mixing-time arguments, sample complexity of this algorithm is shown to be $\mathcal{O}(1/\sqrt{T})$. We then propose an adaptive control scheme that performs system identification together with certainty equivalent control to adapt the controllers in an episodic fashion. Combining our sample complexity results with recent perturbation results for certainty equivalent control, we prove that when the episode lengths are appropriately chosen, the proposed adaptive control scheme achieves $\mathcal{O}(\sqrt{T})$ regret, which can be improved to $\mathcal{O}(polylog(T))$ with partial knowledge of the system. Our proof strategy introduces innovations to handle Markovian jumps and a weaker notion of stability common in MJSs. Our analysis provides insights into system theoretic quantities that affect learning accuracy and control performance. Numerical simulations are presented to further reinforce these insights.
This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.
Matter evolved under influence of gravity from minuscule density fluctuations. Non-perturbative structure formed hierarchically over all scales, and developed non-Gaussian features in the Universe, known as the Cosmic Web. To fully understand the structure formation of the Universe is one of the holy grails of modern astrophysics. Astrophysicists survey large volumes of the Universe and employ a large ensemble of computer simulations to compare with the observed data in order to extract the full information of our own Universe. However, to evolve trillions of galaxies over billions of years even with the simplest physics is a daunting task. We build a deep neural network, the Deep Density Displacement Model (hereafter D$^3$M), to predict the non-linear structure formation of the Universe from simple linear perturbation theory. Our extensive analysis, demonstrates that D$^3$M outperforms the second order perturbation theory (hereafter 2LPT), the commonly used fast approximate simulation method, in point-wise comparison, 2-point correlation, and 3-point correlation. We also show that D$^3$M is able to accurately extrapolate far beyond its training data, and predict structure formation for significantly different cosmological parameters. Our study proves, for the first time, that deep learning is a practical and accurate alternative to approximate simulations of the gravitational structure formation of the Universe.
Learning embedding functions, which map semantically related inputs to nearby locations in a feature space supports a variety of classification and information retrieval tasks. In this work, we propose a novel, generalizable and fast method to define a family of embedding functions that can be used as an ensemble to give improved results. Each embedding function is learned by randomly bagging the training labels into small subsets. We show experimentally that these embedding ensembles create effective embedding functions. The ensemble output defines a metric space that improves state of the art performance for image retrieval on CUB-200-2011, Cars-196, In-Shop Clothes Retrieval and VehicleID.