We develop sampling algorithms to fit Bayesian hierarchical models, the computational complexity of which scales linearly with the number of observations and the number of parameters in the model. We focus on crossed random effect and nested multilevel models, which are used ubiquitously in applied sciences. The posterior dependence in both classes is sparse: in crossed random effects models it resembles a random graph, whereas in nested multilevel models it is tree-structured. For each class we identify a framework for scalable computation, building on previous work. Methods for crossed models are based on extensions of appropriately designed collapsed Gibbs samplers, where we introduce the idea of local centering; while methods for nested models are based on sparse linear algebra and data augmentation. We provide a theoretical analysis of the proposed algorithms in some simplified settings, including a comparison with previously proposed methodologies and an average-case analysis based on random graph theory. Numerical experiments, including two challenging real data analyses on predicting electoral results and real estate prices, compare with off-the-shelf Hamiltonian Monte Carlo, displaying drastic improvement in performance.
Nowadays, numerical models are widely used in most of engineering fields to simulate the behaviour of complex systems, such as for example power plants or wind turbine in the energy sector. Those models are nevertheless affected by uncertainty of different nature (numerical, epistemic) which can affect the reliability of their predictions. We develop here a new method for quantifying conditional parameter uncertainty within a chain of two numerical models in the context of multiphysics simulation. More precisely, we aim to calibrate the parameters $\theta$ of the second model of the chain conditionally on the value of parameters $\lambda$ of the first model, while assuming the probability distribution of $\lambda$ is known. This conditional calibration is carried out from the available experimental data of the second model. In doing so, we aim to quantify as well as possible the impact of the uncertainty of $\lambda$ on the uncertainty of $\theta$. To perform this conditional calibration, we set out a nonparametric Bayesian formalism to estimate the functional dependence between $\theta$ and $\lambda$, denoted by $\theta(\lambda)$. First, each component of $\theta(\lambda)$ is assumed to be the realization of a Gaussian process prior. Then, if the second model is written as a linear function of $\theta(\lambda)$, the Bayesian machinery allows us to compute analytically the posterior predictive distribution of $\theta(\lambda)$ for any set of realizations $\lambda$. The effectiveness of the proposed method is illustrated on several analytical examples.
Network motifs are recurrent, small-scale patterns of interactions observed frequently in a system. They shed light on the interplay between the topology and the dynamics of complex networks across various domains. In this work, we focus on the problem of counting occurrences of small sub-hypergraph patterns in very large hypergraphs, where higher-order interactions connect arbitrary numbers of system units. We show how directly exploiting higher-order structures speeds up the counting process compared to traditional data mining techniques for exact motif discovery. Moreover, with hyperedge sampling, performance is further improved at the cost of small errors in the estimation of motif frequency. We evaluate our method on several real-world datasets describing face-to-face interactions, co-authorship and human communication. We show that our approximated algorithm allows us to extract higher-order motifs faster and on a larger scale, beyond the computational limits of an exact approach.
We present a reduced basis stochastic Galerkin method for partial differential equations with random inputs. In this method, the reduced basis methodology is integrated into the stochastic Galerkin method, resulting in a significant reduction in the cost of solving the Galerkin system. To reduce the main cost of matrix-vector manipulation involved in our reduced basis stochastic Galerkin approach, the secant method is applied to identify the number of reduced basis functions. We present a general mathematical framework of the methodology, validate its accuracy and demonstrate its efficiency with numerical experiments.
We develop a numerical method for the Westervelt equation, an important equation in nonlinear acoustics, in the form where the attenuation is represented by a class of non-local in time operators. A semi-discretisation in time based on the trapezoidal rule and A-stable convolution quadrature is stated and analysed. Existence and regularity analysis of the continuous equations informs the stability and error analysis of the semi-discrete system. The error analysis includes the consideration of the singularity at $t = 0$ which is addressed by the use of a correction in the numerical scheme. Extensive numerical experiments confirm the theory.
Effective application of mathematical models to interpret biological data and make accurate predictions often requires that model parameters are identifiable. Approaches to assess the so-called structural identifiability of models are well-established for ordinary differential equation models, yet there are no commonly adopted approaches that can be applied to assess the structural identifiability of the partial differential equation (PDE) models that are requisite to capture spatial features inherent to many phenomena. The differential algebra approach to structural identifiability has recently been demonstrated to be applicable to several specific PDE models. In this brief article, we present general methodology for performing structural identifiability analysis on partially observed linear reaction-advection-diffusion (RAD) PDE models. We show that the differential algebra approach can always, in theory, be applied to linear RAD models. Moreover, despite the perceived complexity introduced by the addition of advection and diffusion terms, identifiability of spatial analogues of non-spatial models cannot decrease structural identifiability. Finally, we show that our approach can also be applied to a class of non-linear PDE models that are linear in the unobserved variables, and conclude by discussing future possibilities and computational cost of performing structural identifiability analysis on more general PDE models in mathematical biology.
Langevin dynamics are widely used in sampling high-dimensional, non-Gaussian distributions whose densities are known up to a normalizing constant. In particular, there is strong interest in unadjusted Langevin algorithms (ULA), which directly discretize Langevin dynamics to estimate expectations over the target distribution. We study the use of transport maps that approximately normalize a target distribution as a way to precondition and accelerate the convergence of Langevin dynamics. We show that in continuous time, when a transport map is applied to Langevin dynamics, the result is a Riemannian manifold Langevin dynamics (RMLD) with metric defined by the transport map. We also show that applying a transport map to an irreversibly-perturbed ULA results in a geometry-informed irreversible perturbation (GiIrr) of the original dynamics. These connections suggest more systematic ways of learning metrics and perturbations, and also yield alternative discretizations of the RMLD described by the map, which we study. Under appropriate conditions, these discretized processes can be endowed with non-asymptotic bounds describing convergence to the target distribution in 2-Wasserstein distance. Illustrative numerical results complement our theoretical claims.
The complexity class Quantum Statistical Zero-Knowledge ($\mathsf{QSZK}$) captures computational difficulties of the time-bounded quantum state testing problem with respect to the trace distance, known as the Quantum State Distinguishability Problem (QSDP) introduced by Watrous (FOCS 2002). However, QSDP is in $\mathsf{QSZK}$ merely within the constant polarizing regime, similar to its classical counterpart shown by Sahai and Vadhan (JACM 2003) due to the polarization lemma (error reduction for SDP). Recently, Berman, Degwekar, Rothblum, and Vasudevan (TCC 2019) extended the $\mathsf{SZK}$ containment for SDP beyond the polarizing regime via the time-bounded distribution testing problems with respect to the triangular discrimination and the Jensen-Shannon divergence. Our work introduces proper quantum analogs for these problems by defining quantum counterparts for triangular discrimination. We investigate whether the quantum analogs behave similarly to their classical counterparts and examine the limitations of existing approaches to polarization regarding quantum distances. These new $\mathsf{QSZK}$-complete problems improve $\mathsf{QSZK}$ containments for QSDP beyond the polarizing regime and establish a simple $\mathsf{QSZK}$-hardness for the quantum entropy difference problem (QEDP) defined by Ben-Aroya, Schwartz, and Ta-Shma (ToC 2010). Furthermore, we prove that QSDP with some exponentially small errors is in $\mathsf{PP}$, while the same problem without error is in $\mathsf{NQP}$.
Run-times of quantum algorithms are often studied via an asymptotic, worst-case analysis. Whilst useful, such a comparison can often fall short: it is not uncommon for algorithms with a large worst-case run-time to end up performing well on instances of practical interest. To remedy this it is necessary to resort to run-time analyses of a more empirical nature, which for sufficiently small input sizes can be performed on a quantum device or a simulation thereof. For larger input sizes, alternative approaches are required. In this paper we consider an approach that combines classical emulation with detailed complexity bounds that include all constants. We simulate quantum algorithms by running classical versions of the sub-routines, whilst simultaneously collecting information about what the run-time of the quantum routine would have been if it were run instead. To do this accurately and efficiently for very large input sizes, we describe an estimation procedure and prove that it obtains upper bounds on the true expected complexity of the quantum algorithms. We apply our method to some simple quantum speedups of classical heuristic algorithms for solving the well-studied MAX-$k$-SAT optimization problem. This requires rigorous bounds (including all constants) on the expected- and worst-case complexities of two important quantum sub-routines: Grover search with an unknown number of marked items, and quantum maximum-finding. These improve upon existing results and might be of broader interest. Amongst other results, we found that the classical heuristic algorithms we studied did not offer significant quantum speedups despite the existence of a theoretical per-step speedup. This suggests that an empirical analysis such as the one we implement in this paper already yields insights beyond those that can be seen by an asymptotic analysis alone.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.