Gaussian variational inference and the Laplace approximation are popular alternatives to Markov chain Monte Carlo that formulate Bayesian posterior inference as an optimization problem, enabling the use of simple and scalable stochastic optimization algorithms. However, a key limitation of both methods is that the solution to the optimization problem is typically not tractable to compute; even in simple settings the problem is nonconvex. Thus, recently developed statistical guarantees -- which all involve the (data) asymptotic properties of the global optimum -- are not reliably obtained in practice. In this work, we provide two major contributions: a theoretical analysis of the asymptotic convexity properties of variational inference with a Gaussian family and the maximum a posteriori (MAP) problem required by the Laplace approximation; and two algorithms -- consistent Laplace approximation (CLA) and consistent stochastic variational inference (CSVI) -- that exploit these properties to find the optimal approximation in the asymptotic regime. Both CLA and CSVI involve a tractable initialization procedure that finds the local basin of the optimum, and CSVI further includes a scaled gradient descent algorithm that provably stays locally confined to that basin. Experiments on nonconvex synthetic and real-data examples show that compared with standard variational and Laplace approximations, both CSVI and CLA improve the likelihood of obtaining the global optimum of their respective optimization problems.
Statistical techniques are needed to analyse data structures with complex dependencies such that clinically useful information can be extracted. Individual-specific networks, which capture dependencies in complex biological systems, are often summarized by graph-theoretical features. These features, which lend themselves to outcome modelling, can be subject to high variability due to arbitrary decisions in network inference and noise. Correlation-based adjacency matrices often need to be sparsified before meaningful graph-theoretical features can be extracted, requiring the data analysts to determine an optimal threshold.. To address this issue, we propose to incorporate a flexible weighting function over the full range of possible thresholds to capture the variability of graph-theoretical features over the threshold domain. The potential of this approach, which extends concepts from functional data analysis to a graph-theoretical setting, is explored in a plasmode simulation study using real functional magnetic resonance imaging (fMRI) data from the Autism Brain Imaging Data Exchange (ABIDE) Preprocessed initiative. The simulations show that our modelling approach yields accurate estimates of the functional form of the weight function, improves inference efficiency, and achieves a comparable or reduced root mean square prediction error compared to competitor modelling approaches. This assertion holds true in settings where both complex functional forms underlie the outcome-generating process and a universal threshold value is employed. We demonstrate the practical utility of our approach by using resting-state fMRI data to predict biological age in children. Our study establishes the flexible modelling approach as a statistically principled, serious competitor to ad-hoc methods with superior performance.
Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation. Despite their success, most of these methods only consider spatial correlations between body joints and do not take into account temporal correlations, thereby limiting their ability to capture relationships in the presence of occlusions and inherent ambiguity. To address this potential weakness, we propose a spatio-temporal network architecture composed of a joint-mixing multi-layer perceptron block that facilitates communication among different joints and a graph weighted Jacobi network block that enables communication among various feature channels. The major novelty of our approach lies in a new weighted Jacobi feature propagation rule obtained through graph filtering with implicit fairing. We leverage temporal information from the 2D pose sequences, and integrate weight modulation into the model to enable untangling of the feature transformations of distinct nodes. We also employ adjacency modulation with the aim of learning meaningful correlations beyond defined linkages between body joints by altering the graph topology through a learnable modulation matrix. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our model, outperforming recent state-of-the-art methods for 3D human pose estimation.
By a semi-Lagrangian change of coordinates, the hydrostatic Euler equations describing free-surface sheared flows is rewritten as a system of quasilinear equations, where stability conditions can be determined by the analysis of its hyperbolic structure. This new system can be written as a quasi linear system in time and horizontal variables and involves no more vertical derivatives. However, the coefficients in front of the horizontal derivatives include an integral operator acting on the new vertical variable. The spectrum of these operators is studied in detail, in particular it includes a continuous part. Riemann invariants are then determined as conserved quantities along the characteristic curves. Examples of solutions are provided, in particular stationary solutions and solutions blowing-up in finite time. Eventually, we propose an exact multi-layer $\mathbb{P}_0$-discretization, which could be used to solve numerically this semi-Lagrangian system, and analyze the eigenvalues of the corresponding discretized operator to investigate the hyperbolic nature of the approximated system.
We study a family of distances between functions of a single variable. These distances are examples of integral probability metrics, and have been used previously for comparing probability measures. Special cases include the Earth Mover's Distance and the Kolmogorov Metric. We examine their properties for general signals, proving that they are robust to a broad class of perturbations and that the distance between one-dimensional tomographic projections of a two-dimensional function is bounded by the size of the difference in projection angles. We also establish error bounds for approximating the metric from finite samples, and prove that these approximations are robust to additive Gaussian noise. The results are illustrated in numerical experiments.
We introduce and analyze a hybridizable discontinuous Galerkin (HDG) method for the dual-porosity-Stokes problem. This coupled problem describes the interaction between free flow in macrofractures/conduits, governed by the Stokes equations, and flow in microfractures/matrix, governed by a dual-porosity model. We prove that the HDG method is strongly conservative, well-posed, and give an a priori error analysis showing dependence on the problem parameters. Our theoretical findings are corroborated by numerical examples
We aim to efficiently compute spreading speeds of reaction-diffusion-advection (RDA) fronts in divergence free random flows under the Kolmogorov-Petrovsky-Piskunov (KPP) nonlinearity. We study a stochastic interacting particle method (IPM) for the reduced principal eigenvalue (Lyapunov exponent) problem of an associated linear advection-diffusion operator with spatially random coefficients. The Fourier representation of the random advection field and the Feynman-Kac (FK) formula of the principal eigenvalue (Lyapunov exponent) form the foundation of our method implemented as a genetic evolution algorithm. The particles undergo advection-diffusion, and mutation/selection through a fitness function originated in the FK semigroup. We analyze convergence of the algorithm based on operator splitting, present numerical results on representative flows such as 2D cellular flow and 3D Arnold-Beltrami-Childress (ABC) flow under random perturbations. The 2D examples serve as a consistency check with semi-Lagrangian computation. The 3D results demonstrate that IPM, being mesh free and self-adaptive, is simple to implement and efficient for computing front spreading speeds in the advection-dominated regime for high-dimensional random flows on unbounded domains where no truncation is needed.
We study a class of Gaussian processes for which the posterior mean, for a particular choice of data, replicates a truncated Taylor expansion of any order. The data consist of derivative evaluations at the expansion point and the prior covariance kernel belongs to the class of Taylor kernels, which can be written in a certain power series form. We discuss and prove some results on maximum likelihood estimation of parameters of Taylor kernels. The proposed framework is a special case of Gaussian process regression based on data that is orthogonal in the reproducing kernel Hilbert space of the covariance kernel.
The aim of this article is to infer the connectivity structures of brain regions before and during epileptic seizure. Our contributions are fourfold. First, we propose a 6N-dimensional stochastic differential equation for modelling the activity of N coupled populations of neurons in the brain. This model further develops the (single population) stochastic Jansen and Rit neural mass model, which describes human electroencephalography (EEG) rhythms, in particular signals with epileptic activity. Second, we construct a reliable and efficient numerical scheme for the model simulation, extending a splitting procedure proposed for one neural population. Third, we propose an adapted Sequential Monte Carlo Approximate Bayesian Computation algorithm for simulation-based inference of both the relevant real-valued model parameters as well as the {0,1}-valued network parameters, the latter describing the coupling directions among the N modelled neural populations. Fourth, after illustrating and validating the proposed statistical approach on different types of simulated data, we apply it to a set of multi-channel EEG data recorded before and during an epileptic seizure. The real data experiments suggest, for example, a larger activation in each neural population and a stronger connectivity on the left brain hemisphere during seizure.
Block majorization-minimization (BMM) is a simple iterative algorithm for nonconvex constrained optimization that sequentially minimizes majorizing surrogates of the objective function in each block coordinate while the other coordinates are held fixed. BMM entails a large class of optimization algorithms such as block coordinate descent and its proximal-point variant, expectation-minimization, and block projected gradient descent. We establish that for general constrained nonconvex optimization, BMM with strongly convex surrogates can produce an $\epsilon$-stationary point within $O(\epsilon^{-2}(\log \epsilon^{-1})^{2})$ iterations and asymptotically converges to the set of stationary points. Furthermore, we propose a trust-region variant of BMM that can handle surrogates that are only convex and still obtain the same iteration complexity and asymptotic stationarity. These results hold robustly even when the convex sub-problems are inexactly solved as long as the optimality gaps are summable. As an application, we show that a regularized version of the celebrated multiplicative update algorithm for nonnegative matrix factorization by Lee and Seung has iteration complexity of $O(\epsilon^{-2}(\log \epsilon^{-1})^{2})$. The same result holds for a wide class of regularized nonnegative tensor decomposition algorithms as well as the classical block projected gradient descent algorithm. These theoretical results are validated through various numerical experiments.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.