The continuous-time Markov chain (CTMC) is the mathematical workhorse of evolutionary biology. Learning CTMC model parameters using modern, gradient-based methods requires the derivative of the matrix exponential evaluated at the CTMC's infinitesimal generator (rate) matrix. Motivated by the derivative's extreme computational complexity as a function of state space cardinality, recent work demonstrates the surprising effectiveness of a naive, first-order approximation for a host of problems in computational biology. In response to this empirical success, we obtain rigorous deterministic and probabilistic bounds for the error accrued by the naive approximation and establish a "blessing of dimensionality" result that is universal for a large class of rate matrices with random entries. Finally, we apply the first-order approximation within surrogate-trajectory Hamiltonian Monte Carlo for the analysis of the early spread of SARS-CoV-2 across 44 geographic regions that comprise a state space of unprecedented dimensionality for unstructured (flexible) CTMC models within evolutionary biology.
To minimize the average of a set of log-convex functions, the stochastic Newton method iteratively updates its estimate using subsampled versions of the full objective's gradient and Hessian. We contextualize this optimization problem as sequential Bayesian inference on a latent state-space model with a discriminatively-specified observation process. Applying Bayesian filtering then yields a novel optimization algorithm that considers the entire history of gradients and Hessians when forming an update. We establish matrix-based conditions under which the effect of older observations diminishes over time, in a manner analogous to Polyak's heavy ball momentum. We illustrate various aspects of our approach with an example and review other relevant innovations for the stochastic Newton method.
We consider the solution to the biharmonic equation in mixed form discretized by the Hybrid High-Order (HHO) methods. The two resulting second-order elliptic problems can be decoupled via the introduction of a new unknown, corresponding to the boundary value of the solution of the first Laplacian problem. This technique yields a global linear problem that can be solved iteratively via a Krylov-type method. More precisely, at each iteration of the scheme, two second-order elliptic problems have to be solved, and a normal derivative on the boundary has to be computed. In this work, we specialize this scheme for the HHO discretization. To this aim, an explicit technique to compute the discrete normal derivative of an HHO solution of a Laplacian problem is proposed. Moreover, we show that the resulting discrete scheme is well-posed. Finally, a new preconditioner is designed to speed up the convergence of the Krylov method. Numerical experiments assessing the performance of the proposed iterative algorithm on both two- and three-dimensional test cases are presented.
This paper introduces Concurrent Valuation Algebras (CVAs), a novel extension of ordered valuation algebras (OVAs). CVAs include two combine operators representing parallel and sequential products, adhering to a weak exchange law. This development offers theoretical and practical benefits for the specification and modelling of concurrent and distributed systems. As a presheaf on a space of domains, CVAs enable localised specifications, supporting modularity, compositionality, and the ability to represent large and complex systems. Furthermore, CVAs align with lattice-based refinement reasoning and are compatible with established methodologies such as Hoare and Rely-Guarantee logics. The flexibility of CVAs is explored through three trace models, illustrating distinct paradigms of concurrent/distributed computing, interrelated by morphisms. The paper also highlights the potential to incorporate a powerful local computation framework from valuation algebras for model checking in concurrent and distributed systems. The foundational results presented have been verified with the proof assistant Isabelle/HOL.
Numerical methods such as the Finite Element Method (FEM) have been successfully adapted to utilize the computational power of GPU accelerators. However, much of the effort around applying FEM to GPU's has been focused on high-order FEM due to higher arithmetic intensity and order of accuracy. For applications such as the simulation of subsurface processes, high levels of heterogeneity results in high-resolution grids characterized by highly discontinuous (cell-wise) material property fields. Moreover, due to the significant uncertainties in the characterization of the domain of interest, e.g. geologic reservoirs, the benefits of high order accuracy are reduced, and low-order methods are typically employed. In this study, we present a strategy for implementing highly performant low-order matrix-free FEM operator kernels in the context of the conjugate gradient (CG) method. Performance results of matrix-free Laplace and isotropic elasticity operator kernels are presented and are shown to compare favorably to matrix-based SpMV operators on V100, A100, and MI250X GPUs.
A standard approach to solve ordinary differential equations, when they describe dynamical systems, is to adopt a Runge-Kutta or related scheme. Such schemes, however, are not applicable to the large class of equations which do not constitute dynamical systems. In several physical systems, we encounter integro-differential equations with memory terms where the time derivative of a state variable at a given time depends on all past states of the system. Secondly, there are equations whose solutions do not have well-defined Taylor series expansion. The Maxey-Riley-Gatignol equation, which describes the dynamics of an inertial particle in nonuniform and unsteady flow, displays both challenges. We use it as a test bed to address the questions we raise, but our method may be applied to all equations of this class. We show that the Maxey-Riley-Gatignol equation can be embedded into an extended Markovian system which is constructed by introducing a new dynamical co-evolving state variable that encodes memory of past states. We develop a Runge-Kutta algorithm for the resultant Markovian system. The form of the kernels involved in deriving the Runge-Kutta scheme necessitates the use of an expansion in powers of $t^{1/2}$. Our approach naturally inherits the benefits of standard time-integrators, namely a constant memory storage cost, a linear growth of operational effort with simulation time, and the ability to restart a simulation with the final state as the new initial condition.
One major problem in the study of numerical semigroups is determining the growth of the semigroup tree. In the present work, infinite chains of numerical semigroups in the semigroup tree, firstly introduced in Bras-Amor\'os and Nulygin (2009), are studied. Computational results show that these chains are rare, but without them the tree would not be infinite. It is proved that for each genus $g\geq 5$ there are more semigroups of that genus not belonging to infinite chains than semigroups belonging. The reference Bras-Amor\'os and Bulygin (2009) presented a characterization of the semigroups that belong to infinite chains in terms of the coprimality of the left elements of the semigroup as well as a result on the cardinality of the set of infinite chains to which a numerical semigroup belongs in terms of the primality of the greatest common divisor of these left elements. We revisit these results and fix an imprecision on the cardinality of the set of infinite chains to which a semigroup belongs in the case when the greatest common divisor of the left elements is a prime number. We then look at infinite chains in subtrees with fixed multiplicity. When the multiplicity is a prime number there is only one infinite chain in the tree of semigroups with such multiplicity. When the multpliplicity is $4$ or $6$ we prove a self-replication behavior in the subtree and prove a formula for the number of semigroups in infinite chains of a given genus and multiplicity $4$ and $6$, respectively.
Quantization summarizes continuous distributions by calculating a discrete approximation. Among the widely adopted methods for data quantization is Lloyd's algorithm, which partitions the space into Vorono\"i cells, that can be seen as clusters, and constructs a discrete distribution based on their centroids and probabilistic masses. Lloyd's algorithm estimates the optimal centroids in a minimal expected distance sense, but this approach poses significant challenges in scenarios where data evaluation is costly, and relates to rare events. Then, the single cluster associated to no event takes the majority of the probability mass. In this context, a metamodel is required and adapted sampling methods are necessary to increase the precision of the computations on the rare clusters.
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in the high-dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite-dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step-size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate. We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classification via two-layer networks for binary and XOR-type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub-optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations. At the same time, we demonstrate the benefit of overparametrization by showing that the latter probability goes to zero as the second layer width grows.
Graphical models have exhibited their performance in numerous tasks ranging from biological analysis to recommender systems. However, graphical models with hub nodes are computationally difficult to fit, particularly when the dimension of the data is large. To efficiently estimate the hub graphical models, we introduce a two-phase algorithm. The proposed algorithm first generates a good initial point via a dual alternating direction method of multipliers (ADMM), and then warm starts a semismooth Newton (SSN) based augmented Lagrangian method (ALM) to compute a solution that is accurate enough for practical tasks. The sparsity structure of the generalized Jacobian ensures that the algorithm can obtain a nice solution very efficiently. Comprehensive experiments on both synthetic data and real data show that it obviously outperforms the existing state-of-the-art algorithms. In particular, in some high dimensional tasks, it can save more than 70\% of the execution time, meanwhile still achieves a high-quality estimation.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.