Given a pointed metric space $(X,\mathsf{dist}, w)$ on $n$ points, its Gromov's approximating tree is a 0-hyperbolic pseudo-metric space $(X,\mathsf{dist}_T)$ such that $\mathsf{dist}(x,w)=\mathsf{dist}_T(x,w)$ and $\mathsf{dist}(x, y)-2 \delta \log_2n \leq \mathsf{dist}_T (x, y) \leq \mathsf{dist}(x, y)$ for all $x, y \in X$ where $\delta$ is the Gromov hyperbolicity of $X$. On the other hand, the all pairs bottleneck paths (APBP) problem asks, given an undirected graph with some capacities on its edges, to find the maximal path capacity between each pair of vertices. In this note, we prove: $\bullet$ Computing Gromov's approximating tree for a metric space with $n+1$ points from its matrix of distances reduces to solving the APBP problem on an connected graph with $n$ vertices. $\bullet$ There is an explicit algorithm that computes Gromov's approximating tree for a graph from its adjacency matrix in quadratic time. $\bullet$ Solving the APBP problem on a weighted graph with $n$ vertices reduces to finding Gromov's approximating tree for a metric space with $n+1$ points from its distance matrix.
The computation of off-diagonal blocks of matrix functions $f(T)$, where $T$ is block triangular, poses a challenging problem in scientific computing. We present a novel algorithm that exploits the structure of block triangular matrices, generalizing the algorithm of Al-Mohy and Higham for computing the Fr\'echet derivative of the matrix exponential. This work has significant applications in fields such as exponential integrators for solving systems of first-order differential equations, Hamiltonian linear systems in control theory, and option pricing in finance. Our approach introduces a linear operator that maps off-diagonal blocks of $T$ into their counterparts in $f(T)$. By studying the algebraic properties of the operator, we establish a comprehensive computational framework, paving the way to extend existing Fr\'echet derivative algorithms of matrix functions to more general settings. For the matrix exponential, in particular, the algorithm employs the scaling and squaring method with diagonal Pad\'e approximants to $\exp(x)$, with parameters chosen based on a rigorous backward error analysis, which notably does not depend on the norm of the off-diagonal blocks. The numerical experiment demonstrates that our algorithm surpasses existing algorithms in terms of accuracy and efficiency, making it highly valuable for a wide range of applications.
We present \textbf{H}ybrid-\textbf{A}utoregressive \textbf{IN}ference Tr\textbf{AN}sducers (HAINAN), a novel architecture for speech recognition that extends the Token-and-Duration Transducer (TDT) model. Trained with randomly masked predictor network outputs, HAINAN supports both autoregressive inference with all network components and non-autoregressive inference without the predictor. Additionally, we propose a novel semi-autoregressive inference paradigm that first generates an initial hypothesis using non-autoregressive inference, followed by refinement steps where each token prediction is regenerated using parallelized autoregression on the initial hypothesis. Experiments on multiple datasets across different languages demonstrate that HAINAN achieves efficiency parity with CTC in non-autoregressive mode and with TDT in autoregressive mode. In terms of accuracy, autoregressive HAINAN outperforms TDT and RNN-T, while non-autoregressive HAINAN significantly outperforms CTC. Semi-autoregressive inference further enhances the model's accuracy with minimal computational overhead, and even outperforms TDT results in some cases. These results highlight HAINAN's flexibility in balancing accuracy and speed, positioning it as a strong candidate for real-world speech recognition applications.
We introduce $5/2$- and $7/2$-order $L^2$-accurate randomized Runge-Kutta-Nystr\"{o}m methods, tailored for approximating Hamiltonian flows within non-reversible Markov chain Monte Carlo samplers, such as unadjusted Hamiltonian Monte Carlo and unadjusted kinetic Langevin Monte Carlo. We establish quantitative $5/2$-order $L^2$-accuracy upper bounds under gradient and Hessian Lipschitz assumptions on the potential energy function. The numerical experiments demonstrate the superior efficiency of the proposed unadjusted samplers on a variety of well-behaved, high-dimensional target distributions.
Nutrient load simulators are large, deterministic, models that simulate the hydrodynamics and biogeochemical processes in aquatic ecosystems. They are central tools for planning cost efficient actions to fight eutrophication since they allow scenario predictions on impacts of nutrient load reductions to, e.g., harmful algal biomass growth. Due to being computationally heavy, the uncertainties related to these predictions are typically not rigorously assessed though. In this work, we developed a novel Bayesian computational approach for estimating the uncertainties in predictions of the Finnish coastal nutrient load model FICOS. First, we constructed a likelihood function for the multivariate spatiotemporal outputs of the FICOS model. Then, we used Bayes optimization to locate the posterior mode for the model parameters conditional on long term monitoring data. After that, we constructed a space filling design for FICOS model runs around the posterior mode and used it to train a Gaussian process emulator for the (log) posterior density of the model parameters. We then integrated over this (approximate) parameter posterior to produce probabilistic predictions for algal biomass and chlorophyll a concentration under alternative nutrient load reduction scenarios. Our computational algorithm allowed for fast posterior inference and the Gaussian process emulator had good predictive accuracy within the highest posterior probability mass region. The posterior predictive scenarios showed that the probability to reach the EUs Water Framework Directive objectives in the Finnish Archipelago Sea is generally low even under large load reductions.
Symbolic regression (SR) is a powerful machine learning approach that searches for both the structure and parameters of algebraic models, offering interpretable and compact representations of complex data. Unlike traditional regression methods, SR explores progressively complex feature spaces, which can uncover simple models that generalize well, even from small datasets. Among SR algorithms, the Sure Independence Screening and Sparsifying Operator (SISSO) has proven particularly effective in the natural sciences, helping to rediscover fundamental physical laws as well as discover new interpretable equations for materials property modeling. However, its widespread adoption has been limited by performance inefficiencies and the challenges posed by its FORTRAN-based implementation, especially in modern computing environments. In this work, we introduce TorchSISSO, a native Python implementation built in the PyTorch framework. TorchSISSO leverages GPU acceleration, easy integration, and extensibility, offering a significant speed-up and improved accuracy over the original. We demonstrate that TorchSISSO matches or exceeds the performance of the original SISSO across a range of tasks, while dramatically reducing computational time and improving accessibility for broader scientific applications.
To effectively study complex causal systems, it is often useful to construct representations that simplify parts of the system by discarding irrelevant details while preserving key features. The Information Bottleneck (IB) method is a widely used approach in representation learning that compresses random variables while retaining information about a target variable. Traditional methods like IB are purely statistical and ignore underlying causal structures, making them ill-suited for causal tasks. We propose the Causal Information Bottleneck (CIB), a causal extension of the IB, which compresses a set of chosen variables while maintaining causal control over a target variable. This method produces representations which are causally interpretable, and which can be used when reasoning about interventions. We present experimental results demonstrating that the learned representations accurately capture causality as intended.
Common practice in modern machine learning involves fitting a large number of parameters relative to the number of observations. These overparameterized models can exhibit surprising generalization behavior, e.g., ``double descent'' in the prediction error curve when plotted against the raw number of model parameters, or another simplistic notion of complexity. In this paper, we revisit model complexity from first principles, by first reinterpreting and then extending the classical statistical concept of (effective) degrees of freedom. Whereas the classical definition is connected to fixed-X prediction error (in which prediction error is defined by averaging over the same, nonrandom covariate points as those used during training), our extension of degrees of freedom is connected to random-X prediction error (in which prediction error is averaged over a new, random sample from the covariate distribution). The random-X setting more naturally embodies modern machine learning problems, where highly complex models, even those complex enough to interpolate the training data, can still lead to desirable generalization performance under appropriate conditions. We demonstrate the utility of our proposed complexity measures through a mix of conceptual arguments, theory, and experiments, and illustrate how they can be used to interpret and compare arbitrary prediction models.
Significant work has been done on computing the ``average'' optimal solution value for various $\mathsf{NP}$-complete problems using the Erd\"{o}s-R\'{e}nyi model to establish \emph{critical thresholds}. Critical thresholds define narrow bounds for the optimal solution of a problem instance such that the probability that the solution value lies outside these bounds vanishes as the instance size approaches infinity. In this paper, we extend the Erd\"{o}s-R\'{e}nyi model to general hypergraphs on $n$ vertices and $M$ hyperedges. We consider the problem of determining critical thresholds for the largest cardinality matching, and we show that for $M=o(1.155^n)$ the size of the maximum cardinality matching is almost surely 1. On the other hand, if $M=\Theta(2^n)$ then the size of the maximum cardinality matching is $\Omega(n^{\frac12-\gamma})$ for an arbitrary $\gamma >0$. Lastly, we address the gap where $\Omega(1.155^n)=M=o(2^n)$ empirically through computer simulations.
Union volume estimation is a classical algorithmic problem. Given a family of objects $O_1,\ldots,O_n \subseteq \mathbb{R}^d$, we want to approximate the volume of their union. In the special case where all objects are boxes (also known as hyperrectangles) this is known as Klee's measure problem. The state-of-the-art algorithm [Karp, Luby, Madras '89] for union volume estimation and Klee's measure problem in constant dimension $d$ computes a $(1+\varepsilon)$-approximation with constant success probability by using a total of $O(n/\varepsilon^2)$ queries of the form (i) ask for the volume of $O_i$, (ii) sample a point uniformly at random from $O_i$, and (iii) query whether a given point is contained in $O_i$. We show that if one can only interact with the objects via the aforementioned three queries, the query complexity of [Karp, Luby, Madras '89] is indeed optimal, i.e., $\Omega(n/\varepsilon^2)$ queries are necessary. Our lower bound already holds for estimating the union of equiponderous axis-aligned polygons in $\mathbb{R}^2$, and even if the algorithm is allowed to inspect the coordinates of the points sampled from the polygons, and still holds when a containment query can ask containment of an arbitrary (not sampled) point. Guided by the insights of the lower bound, we provide a more efficient approximation algorithm for Klee's measure problem improving the $O(n/\varepsilon^2)$ time to $O((n+\frac{1}{\varepsilon^2}) \cdot \log^{O(d)}n)$. We achieve this improvement by exploiting the geometry of Klee's measure problem in various ways: (1) Since we have access to the boxes' coordinates, we can split the boxes into classes of boxes of similar shape. (2) Within each class, we show how to sample from the union of all boxes, by using orthogonal range searching. And (3) we exploit that boxes of different classes have small intersection, for most pairs of classes.
The {\em Spanning Tree Congestion} problem is an easy-to-state NP-hard problem: given a graph $G$, construct a spanning tree $T$ of $G$ minimizing its maximum edge congestion where the congestion of an edge $e\in T$ is the number of edges $uv$ in $G$ such that the unique path between $u$ and $v$ in $T$ passes through $e$; the optimum value for a given graph $G$ is denoted $STC(G)$. It is known that {\em every} spanning tree is an $n/2$-approximation. A long-standing problem is to design a better approximation algorithm. Our contribution towards this goal is an $O(\Delta\cdot\log^{3/2}n)$-approximation algorithm for the minimum congestion spanning tree problem where $\Delta$ is the maximum degree in $G$. For graphs with maximum degree bounded by polylog of the number of vertices, this is an exponential improvement over the previous best approximation. For graphs with maximum degree bounded by $o(n/\log^{3/2}n)$, we get $o(n)$-approximation; this is the largest class of graphs that we know of, for which sublinear approximation is known for this problem. Our main tool for the algorithm is a new lower bound on the spanning tree congestion which is of independent interest. We prove that for every graph $G$, $STC(G)\geq \Omega(hb(G)/\Delta)$ where $hb(G)$ denotes the maximum bisection width over all subgraphs of $G$.