We analyze the Wasserstein distance ($W$-distance) between two probability distributions associated with two multidimensional jump-diffusion processes. Specifically, we analyze a temporally decoupled squared $W_2$-distance, which provides both upper and lower bounds associated with the discrepancies in the drift, diffusion, and jump amplitude functions between the two jump-diffusion processes. Then, we propose a temporally decoupled squared $W_2$-distance method for efficiently reconstructing unknown jump-diffusion processes from data using parameterized neural networks. We further show its performance can be enhanced by utilizing prior information on the drift function of the jump-diffusion process. The effectiveness of our proposed reconstruction method is demonstrated across several examples and applications.
We present a novel and mathematically transparent approach to function approximation and the training of large, high-dimensional neural networks, based on the approximate least-squares solution of associated Fredholm integral equations of the first kind by Ritz-Galerkin discretization, Tikhonov regularization and tensor-train methods. Practical application to supervised learning problems of regression and classification type confirm that the resulting algorithms are competitive with state-of-the-art neural network-based methods.
The stochastic description of chemical reaction networks with the kinetic chemical master equation (CME) is important for studying biological cells, but it suffers from the curse of dimensionality: The amount of data to be stored grows exponentially with the number of chemical species and thus exceeds the capacity of common computational devices for realistic problems. Therefore, time-dependent model order reduction techniques such as the dynamical low-rank approximation are desirable. In this paper we propose a dynamical low-rank algorithm for the kinetic CME using binary tree tensor networks. The dimensionality of the problem is reduced in this approach by hierarchically dividing the reaction network into partitions. Only reactions that cross partitions are subject to an approximation error. We demonstrate by two numerical examples (a 5-dimensional lambda phage model and a 20-dimensional reaction cascade) that the proposed method drastically reduces memory consumption and shows improved computational performance and better accuracy compared to a Monte Carlo method.
We consider the discretization of the $1d$-integral Dirichlet fractional Laplacian by $hp$-finite elements. We present quadrature schemes to set up the stiffness matrix and load vector that preserve the exponential convergence of $hp$-FEM on geometric meshes. The schemes are based on Gauss-Jacobi and Gauss-Legendre rules. We show that taking a number of quadrature points slightly exceeding the polynomial degree is enough to preserve root exponential convergence. The total number of algebraic operations to set up the system is $\mathcal{O}(N^{5/2})$, where $N$ is the problem size. Numerical example illustrate the analysis. We also extend our analysis to the fractional Laplacian in higher dimensions for $hp$-finite element spaces based on shape regular meshes.
Efficient and accurate algorithms are necessary to reconstruct particles in the highly granular detectors anticipated at the High-Luminosity Large Hadron Collider and the Future Circular Collider. We study scalable machine learning models for event reconstruction in electron-positron collisions based on a full detector simulation. Particle-flow reconstruction can be formulated as a supervised learning task using tracks and calorimeter clusters. We compare a graph neural network and kernel-based transformer and demonstrate that we can avoid quadratic operations while achieving realistic reconstruction. We show that hyperparameter tuning significantly improves the performance of the models. The best graph neural network model shows improvement in the jet transverse momentum resolution by up to 50% compared to the rule-based algorithm. The resulting model is portable across Nvidia, AMD and Habana hardware. Accurate and fast machine-learning based reconstruction can significantly improve future measurements at colliders.
The class of subweibull distributions has recently been shown to generalize the important properties of subexponential and subgaussian random variables. We describe alternative characterizations of subweibull distributions and detail the conditions under which their tail behavior is preserved after exponential tilting.
A fundamental quantity of interest in Shannon theory, classical or quantum, is the error exponent of a given channel $W$ and rate $R$: the constant $E(W,R)$ which governs the exponential decay of decoding error when using ever larger optimal codes of fixed rate $R$ to communicate over ever more (memoryless) instances of a given channel $W$. Nearly matching lower and upper bounds are well-known for classical channels. Here I show a lower bound on the error exponent of communication over arbitrary classical-quantum (CQ) channels which matches Dalai's sphere-packing upper bound [IEEE TIT 59, 8027 (2013)] for rates above a critical value, exactly analogous to the case of classical channels. Unlike the classical case, however, the argument does not proceed via a refined analysis of a suitable decoder, but instead by leveraging a bound by Hayashi on the error exponent of the cryptographic task of privacy amplification [CMP 333, 335 (2015)]. This bound is then related to the coding problem via tight entropic uncertainty relations and Gallager's method of constructing capacity-achieving parity-check codes for arbitrary channels. Along the way, I find a lower bound on the error exponent of the task of compression of classical information relative to quantum side information that matches the sphere-packing upper bound of Cheng et al. [IEEE TIT 67, 902 (2021)]. In turn, the polynomial prefactors to the sphere-packing bound found by Cheng et al. may be translated to the privacy amplification problem, sharpening a recent result by Li, Yao, and Hayashi [IEEE TIT 69, 1680 (2023)], at least for linear randomness extractors.
It is a notorious open question whether integer programs (IPs), with an integer coefficient matrix $M$ whose subdeterminants are all bounded by a constant $\Delta$ in absolute value, can be solved in polynomial time. We answer this question in the affirmative if we further require that, by removing a constant number of rows and columns from $M$, one obtains a submatrix $A$ that is the transpose of a network matrix. Our approach focuses on the case where $A$ arises from $M$ after removing $k$ rows only, where $k$ is a constant. We achieve our result in two main steps, the first related to the theory of IPs and the second related to graph minor theory. First, we derive a strong proximity result for the case where $A$ is a general totally unimodular matrix: Given an optimal solution of the linear programming relaxation, an optimal solution to the IP can be obtained by finding a constant number of augmentations by circuits of $[A\; I]$. Second, for the case where $A$ is transpose of a network matrix, we reformulate the problem as a maximum constrained integer potential problem on a graph $G$. We observe that if $G$ is $2$-connected, then it has no rooted $K_{2,t}$-minor for $t = \Omega(k \Delta)$. We leverage this to obtain a tree-decomposition of $G$ into highly structured graphs for which we can solve the problem locally. This allows us to solve the global problem via dynamic programming.
\v{C}ech Persistence diagrams (PDs) are topological descriptors routinely used to capture the geometry of complex datasets. They are commonly compared using the Wasserstein distances $OT_{p}$; however, the extent to which PDs are stable with respect to these metrics remains poorly understood. We partially close this gap by focusing on the case where datasets are sampled on an $m$-dimensional submanifold of $\mathbb{R}^{d}$. Under this manifold hypothesis, we show that convergence with respect to the $OT_{p}$ metric happens exactly when $p\gt m$. We also provide improvements upon the bottleneck stability theorem in this case and prove new laws of large numbers for the total $\alpha$-persistence of PDs. Finally, we show how these theoretical findings shed new light on the behavior of the feature maps on the space of PDs that are used in ML-oriented applications of Topological Data Analysis.
Computational models of syntax are predominantly text-based. Here we propose that the most basic syntactic operations can be modeled directly from raw speech in a fully unsupervised way. We focus on one of the most ubiquitous and elementary properties of syntax -- concatenation. We introduce spontaneous concatenation: a phenomenon where convolutional neural networks (CNNs) trained on acoustic recordings of individual words start generating outputs with two or even three words concatenated without ever accessing data with multiple words in the input. We replicate this finding in several independently trained models with different hyperparameters and training data. Additionally, networks trained on two words learn to embed words into novel unobserved word combinations. To our knowledge, this is a previously unreported property of CNNs trained in the ciwGAN/fiwGAN setting on raw speech and has implications both for our understanding of how these architectures learn as well as for modeling syntax and its evolution from raw acoustic inputs.
Variance reduction for causal inference in the presence of network interference is often achieved through either outcome modeling, which is typically analyzed under unit-randomized Bernoulli designs, or clustered experimental designs, which are typically analyzed without strong parametric assumptions. In this work, we study the intersection of these two approaches and consider the problem of estimation in low-order outcome models using data from a general experimental design. Our contributions are threefold. First, we present an estimator of the total treatment effect (also called the global average treatment effect) in a low-degree outcome model when the data are collected under general experimental designs, generalizing previous results for Bernoulli designs. We refer to this estimator as the pseudoinverse estimator and give bounds on its bias and variance in terms of properties of the experimental design. Second, we evaluate these bounds for the case of cluster randomized designs with both Bernoulli and complete randomization. For clustered Bernoulli randomization, we find that our estimator is always unbiased and that its variance scales like the smaller of the variance obtained from a low-order assumption and the variance obtained from cluster randomization, showing that combining these variance reduction strategies is preferable to using either individually. For clustered complete randomization, we find a notable bias-variance trade-off mediated by specific features of the clustering. Third, when choosing a clustered experimental design, our bounds can be used to select a clustering from a set of candidate clusterings. Across a range of graphs and clustering algorithms, we show that our method consistently selects clusterings that perform well on a range of response models, suggesting that our bounds are useful to practitioners.