Probabilistic solvers for ordinary differential equations (ODEs) have emerged as an efficient framework for uncertainty quantification and inference on dynamical systems. In this work, we explain the mathematical assumptions and detailed implementation schemes behind solving {high-dimensional} ODEs with a probabilistic numerical algorithm. This has not been possible before due to matrix-matrix operations in each solver step, but is crucial for scientifically relevant problems -- most importantly, the solution of discretised {partial} differential equations. In a nutshell, efficient high-dimensional probabilistic ODE solutions build either on independence assumptions or on Kronecker structure in the prior model. We evaluate the resulting efficiency on a range of problems, including the probabilistic numerical simulation of a differential equation with millions of dimensions.
We prove axiomatic characterizations of several important multiwinner rules within the class of approval-based committee choice rules. These are voting rules that return a set of (fixed-size) committees. In particular, we provide axiomatic characterizations of Proportional Approval Voting, the Chamberlin--Courant rule, and other Thiele methods. These rules share the important property that they satisfy an axiom called consistency, which is crucial in our characterizations.
The logistic and probit link functions are the most common choices for regression models with a binary response. However, these choices are not robust to the presence of outliers/unexpected observations. The robit link function, which is equal to the inverse CDF of the Student's $t$-distribution, provides a robust alternative to the probit and logistic link functions. A multivariate normal prior for the regression coefficients is the standard choice for Bayesian inference in robit regression models. The resulting posterior density is intractable and a Data Augmentation (DA) Markov chain is used to generate approximate samples from the desired posterior distribution. Establishing geometric ergodicity for this DA Markov chain is important as it provides theoretical guarantees for asymptotic validity of MCMC standard errors for desired posterior expectations/quantiles. Previous work [Roy(2012)] established geometric ergodicity of this robit DA Markov chain assuming (i) the sample size $n$ dominates the number of predictors $p$, and (ii) an additional constraint which requires the sample size to be bounded above by a fixed constant which depends on the design matrix $X$. In particular, modern high-dimensional settings where $n < p$ are not considered. In this work, we show that the robit DA Markov chain is trace-class (i.e., the eigenvalues of the corresponding Markov operator are summable) for arbitrary choices of the sample size $n$, the number of predictors $p$, the design matrix $X$, and the prior mean and variance parameters. The trace-class property implies geometric ergodicity. Moreover, this property allows us to conclude that the sandwich robit chain (obtained by inserting an inexpensive extra step in between the two steps of the DA chain) is strictly better than the robit DA chain in an appropriate sense.
In this work, we introduce a new and efficient solution approach for the problem of decision making under uncertainty, which can be formulated as decision making in a belief space, over a possibly high-dimensional state space. Typically, to solve a decision problem, one should identify the optimal action from a set of candidates, according to some objective. We claim that one can often generate and solve an analogous yet simplified decision problem, which can be solved more efficiently. A wise simplification method can lead to the same action selection, or one for which the maximal loss in optimality can be guaranteed. Furthermore, such simplification is separated from the state inference, and does not compromise its accuracy, as the selected action would finally be applied on the original state. First, we present the concept for general decision problems, and provide a theoretical framework for a coherent formulation of the approach. We then practically apply these ideas to decision problems in the belief space, which can be simplified by considering a sparse approximation of their initial belief. The scalable belief sparsification algorithm we provide is able to yield solutions which are guaranteed to be consistent with the original problem. We demonstrate the benefits of the approach in the solution of a realistic active-SLAM problem, and manage to significantly reduce computation time, with no loss in the quality of solution. This work is both fundamental and practical, and holds numerous possible extensions.
Existing frameworks for probabilistic inference assume the inferential target is the posited statistical model's parameter. In machine learning applications, however, often there is no statistical model, so the quantity of interest is not a model parameter but a statistical functional. In this paper, we develop a generalized inferential model framework for cases when this functional is a risk minimizer or solution to an estimating equation. We construct a data-dependent possibility measure for uncertainty quantification and inference whose computation is based on the bootstrap. We then prove that this new generalized inferential model provides approximately valid inference in the sense that the plausibility values assigned to hypotheses about the unknowns are asymptotically well-calibrated in a frequentist sense. Among other things, this implies that confidence regions for the underlying functional derived from our new generalized inferential model are approximately valid. The method is shown to perform well in classical examples, including quantile regression, and in a personalized medicine application.
Let $P$ be a linear differential operator over $\mathcal{D} \subset \mathbb{R}^d$ and $U = (U_x)_{x \in \mathcal{D}}$ a second order stochastic process. In the first part of this article, we prove a new simple necessary and sufficient condition for all the trajectories of $U$ to verify the partial differential equation (PDE) $T(U) = 0$. This condition is formulated in terms of the covariance kernel of $U$. The novelty of this result is that the equality $T(U) = 0$ is understood in the sense of distributions, which is a functional analysis framework particularly adapted to the study of PDEs. This theorem provides precious insights during the second part of this article, which is dedicated to performing "physically informed" machine learning on data that is solution to the homogeneous 3 dimensional free space wave equation. We perform Gaussian Process Regression (GPR) on this data, which is a kernel based Bayesian approach to machine learning. To do so, we put Gaussian process (GP) priors over the wave equation's initial conditions and propagate them through the wave equation. We obtain explicit formulas for the covariance kernel of the corresponding stochastic process; this kernel can then be used for GPR. We explore two particular cases : the radial symmetry and the point source. For the former, we derive convolution-free GPR formulas; for the latter, we show a direct link between GPR and the classical triangulation method for point source localization used e.g. in GPS systems. Additionally, this Bayesian framework gives rise to a new answer for the ill-posed inverse problem of reconstructing initial conditions for the wave equation with finite dimensional data, and simultaneously provides a way of estimating physical parameters from this data as in [Raissi et al,2017]. We finish by showcasing this physically informed GPR on a number of practical examples.
The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle very high-dimensional observations they often encounter in the real world (e.g. image observations in robotic domains). In this work, we propose Visual Tree Search (VTS), a learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. VTS bridges offline model training and online planning by utilizing a set of deep generative observation models to predict and evaluate the likelihood of image observations in a Monte Carlo tree search planner. We show that VTS is robust to different observation noises and, since it utilizes online, model-based planning, can adapt to different reward structures without the need to re-train. This new approach outperforms a baseline state-of-the-art on-policy planning algorithm while using significantly less offline training time.
We design a new algorithm for solving parametric systems having finitely many complex solutions for generic values of the parameters. More precisely, let $f = (f_1, \ldots, f_m)\subset \mathbb{Q}[y][x]$ with $y = (y_1, \ldots, y_t)$ and $x = (x_1, \ldots, x_n)$, $V\subset \mathbb{C}^{t+n}$ be the algebraic set defined by $f$ and $\pi$ be the projection $(y, x) \to y$. Under the assumptions that $f$ admits finitely many complex roots for generic values of $y$ and that the ideal generated by $f$ is radical, we solve the following problem. On input $f$, we compute semi-algebraic formulas defining semi-algebraic subsets $S_1, \ldots, S_l$ of the $y$-space such that $\cup_{i=1}^l S_i$ is dense in $\mathbb{R}^t$ and the number of real points in $V\cap \pi^{-1}(\eta)$ is invariant when $\eta$ varies over each $S_i$. This algorithm exploits properties of some well chosen monomial bases in the algebra $\mathbb{Q}(y)[x]/I$ where $I$ is the ideal generated by $f$ in $\mathbb{Q}(y)[x]$ and the specialization property of the so-called Hermite matrices. This allows us to obtain compact representations of the sets $S_i$ by means of semi-algebraic formulas encoding the signature of a symmetric matrix. When $f$ satisfies extra genericity assumptions, we derive complexity bounds on the number of arithmetic operations in $\mathbb{Q}$ and the degree of the output polynomials. Let $d$ be the maximal degree of the $f_i$'s and $D = n(d-1)d^n$, we prove that, on a generic $f=(f_1,\ldots,f_n)$, one can compute those semi-algebraic formulas with $O^~( \binom{t+D}{t}2^{3t}n^{2t+1} d^{3nt+2(n+t)+1})$ operations in $\mathbb{Q}$ and that the polynomials involved have degree bounded by $D$. We report on practical experiments which illustrate the efficiency of our algorithm on generic systems and systems from applications. It allows us to solve problems which are out of reach of the state-of-the-art.
Community detection, a fundamental task for network analysis, aims to partition a network into multiple sub-structures to help reveal their latent functions. Community detection has been extensively studied in and broadly applied to many real-world network problems. Classical approaches to community detection typically utilize probabilistic graphical models and adopt a variety of prior knowledge to infer community structures. As the problems that network methods try to solve and the network data to be analyzed become increasingly more sophisticated, new approaches have also been proposed and developed, particularly those that utilize deep learning and convert networked data into low dimensional representation. Despite all the recent advancement, there is still a lack of insightful understanding of the theoretical and methodological underpinning of community detection, which will be critically important for future development of the area of network analysis. In this paper, we develop and present a unified architecture of network community-finding methods to characterize the state-of-the-art of the field of community detection. Specifically, we provide a comprehensive review of the existing community detection methods and introduce a new taxonomy that divides the existing methods into two categories, namely probabilistic graphical model and deep learning. We then discuss in detail the main idea behind each method in the two categories. Furthermore, to promote future development of community detection, we release several benchmark datasets from several problem domains and highlight their applications to various network analysis tasks. We conclude with discussions of the challenges of the field and suggestions of possible directions for future research.
Spatio-temporal forecasting has numerous applications in analyzing wireless, traffic, and financial networks. Many classical statistical models often fall short in handling the complexity and high non-linearity present in time-series data. Recent advances in deep learning allow for better modelling of spatial and temporal dependencies. While most of these models focus on obtaining accurate point forecasts, they do not characterize the prediction uncertainty. In this work, we consider the time-series data as a random realization from a nonlinear state-space model and target Bayesian inference of the hidden states for probabilistic forecasting. We use particle flow as the tool for approximating the posterior distribution of the states, as it is shown to be highly effective in complex, high-dimensional settings. Thorough experimentation on several real world time-series datasets demonstrates that our approach provides better characterization of uncertainty while maintaining comparable accuracy to the state-of-the art point forecasting methods.
Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.