We provide a perfect sampling algorithm for the hard-sphere model on subsets of $\mathbb{R}^d$ with expected running time linear in the volume under the assumption of strong spatial mixing. A large number of perfect and approximate sampling algorithms have been devised to sample from the hard-sphere model, and our perfect sampling algorithm is efficient for a range of parameters for which only efficient approximate samplers were previously known and is faster than these known approximate approaches. Our methods also extend to the more general setting of Gibbs point processes interacting via finite-range, repulsive potentials.
The computation of off-diagonal blocks of matrix functions $f(T)$, where $T$ is block triangular, poses a challenging problem in scientific computing. We present a novel algorithm that exploits the structure of block triangular matrices, generalizing the algorithm of Al-Mohy and Higham for computing the Fr\'echet derivative of the matrix exponential. This work has significant applications in fields such as exponential integrators for solving systems of first-order differential equations, Hamiltonian linear systems in control theory, and option pricing in finance. Our approach introduces a linear operator that maps off-diagonal blocks of $T$ into their counterparts in $f(T)$. By studying the algebraic properties of the operator, we establish a comprehensive computational framework, paving the way to extend existing Fr\'echet derivative algorithms of matrix functions to more general settings. For the matrix exponential, in particular, the algorithm employs the scaling and squaring method with diagonal Pad\'e approximants to $\exp(x)$, with parameters chosen based on a rigorous backward error analysis, which notably does not depend on the norm of the off-diagonal blocks. The numerical experiment demonstrates that our algorithm surpasses existing algorithms in terms of accuracy and efficiency, making it highly valuable for a wide range of applications.
We introduce $5/2$- and $7/2$-order $L^2$-accurate randomized Runge-Kutta-Nystr\"{o}m methods, tailored for approximating Hamiltonian flows within non-reversible Markov chain Monte Carlo samplers, such as unadjusted Hamiltonian Monte Carlo and unadjusted kinetic Langevin Monte Carlo. We establish quantitative $5/2$-order $L^2$-accuracy upper bounds under gradient and Hessian Lipschitz assumptions on the potential energy function. The numerical experiments demonstrate the superior efficiency of the proposed unadjusted samplers on a variety of well-behaved, high-dimensional target distributions.
Nutrient load simulators are large, deterministic, models that simulate the hydrodynamics and biogeochemical processes in aquatic ecosystems. They are central tools for planning cost efficient actions to fight eutrophication since they allow scenario predictions on impacts of nutrient load reductions to, e.g., harmful algal biomass growth. Due to being computationally heavy, the uncertainties related to these predictions are typically not rigorously assessed though. In this work, we developed a novel Bayesian computational approach for estimating the uncertainties in predictions of the Finnish coastal nutrient load model FICOS. First, we constructed a likelihood function for the multivariate spatiotemporal outputs of the FICOS model. Then, we used Bayes optimization to locate the posterior mode for the model parameters conditional on long term monitoring data. After that, we constructed a space filling design for FICOS model runs around the posterior mode and used it to train a Gaussian process emulator for the (log) posterior density of the model parameters. We then integrated over this (approximate) parameter posterior to produce probabilistic predictions for algal biomass and chlorophyll a concentration under alternative nutrient load reduction scenarios. Our computational algorithm allowed for fast posterior inference and the Gaussian process emulator had good predictive accuracy within the highest posterior probability mass region. The posterior predictive scenarios showed that the probability to reach the EUs Water Framework Directive objectives in the Finnish Archipelago Sea is generally low even under large load reductions.
By selecting different filter functions, spectral algorithms can generate various regularization methods to solve statistical inverse problems within the learning-from-samples framework. This paper combines distributed spectral algorithms with Sobolev kernels to tackle the functional linear regression problem. The design and mathematical analysis of the algorithms require only that the functional covariates are observed at discrete sample points. Furthermore, the hypothesis function spaces of the algorithms are the Sobolev spaces generated by the Sobolev kernels, optimizing both approximation capability and flexibility. Through the establishment of regularity conditions for the target function and functional covariate, we derive matching upper and lower bounds for the convergence of the distributed spectral algorithms in the Sobolev norm. This demonstrates that the proposed regularity conditions are reasonable and that the convergence analysis under these conditions is tight, capturing the essential characteristics of functional linear regression. The analytical techniques and estimates developed in this paper also enhance existing results in the previous literature.
We present the first polynomial-time algorithm to exactly compute the number of labeled chordal graphs on $n$ vertices. Our algorithm solves a more general problem: given $n$ and $\omega$ as input, it computes the number of $\omega$-colorable labeled chordal graphs on $n$ vertices, using $O(n^7)$ arithmetic operations. A standard sampling-to-counting reduction then yields a polynomial-time exact sampler that generates an $\omega$-colorable labeled chordal graph on $n$ vertices uniformly at random. Our counting algorithm improves upon the previous best result by Wormald (1985), which computes the number of labeled chordal graphs on $n$ vertices in time exponential in $n$. An implementation of the polynomial-time counting algorithm gives the number of labeled chordal graphs on up to $30$ vertices in less than three minutes on a standard desktop computer. Previously, the number of labeled chordal graphs was only known for graphs on up to $15$ vertices. In addition, we design two approximation algorithms: (1) an approximate counting algorithm that computes a $(1\pm\varepsilon)$-approximation of the number of $n$-vertex labeled chordal graphs, and (2) an approximate sampling algorithm that generates a random labeled chordal graph according to a distribution whose total variation distance from the uniform distribution is at most $\varepsilon$. The approximate counting algorithm runs in $O(n^3\log{n}\log^7(1/\varepsilon))$ time, and the approximate sampling algorithm runs in $O(n^3\log{n}\log^7(1/\varepsilon))$ expected time.
Given two sets $R$ and $B$ of $n$ points in the plane, we present efficient algorithms to find a two-line linear classifier that best separates the "red" points in $R$ from the "blue" points in $B$ and is robust to outliers. More precisely, we find a region $\mathcal{W}_B$ bounded by two lines, so either a halfplane, strip, wedge, or double wedge, containing (most of) the blue points $B$, and few red points. Our running times vary between optimal $O(n\log n)$ and around $O(n^3)$, depending on the type of region $\mathcal{W}_B$ and whether we wish to minimize only red outliers, only blue outliers, or both.
In an error-correcting code, a sender encodes a message $x \in \{ 0, 1 \}^k$ such that it is still decodable by a receiver on the other end of a noisy channel. In the setting of \emph{error-correcting codes with feedback}, after sending each bit, the sender learns what was received at the other end and can tailor future messages accordingly. While the unique decoding radius of feedback codes has long been known to be $\frac13$, the list decoding capabilities of feedback codes is not well understood. In this paper, we provide the first nontrivial bounds on the list decoding radius of feedback codes for lists of size $\ell$. For $\ell = 2$, we fully determine the $2$-list decoding radius to be $\frac37$. For larger values of $\ell$, we show an upper bound of $\frac12 - \frac{1}{2^{\ell + 2} - 2}$, and show that the same techniques for the $\ell = 2$ case cannot match this upper bound in general.
A property $\Pi$ on a finite set $U$ is \emph{monotone} if for every $X \subseteq U$ satisfying $\Pi$, every superset $Y \subseteq U$ of $X$ also satisfies $\Pi$. Many combinatorial properties can be seen as monotone properties. The problem of finding a minimum subset of $U$ satisfying $\Pi$ is a central problem in combinatorial optimization. Although many approximate/exact algorithms have been developed to solve this kind of problem on numerous properties, a solution obtained by these algorithms is often unsuitable for real-world applications due to the difficulty of building accurate mathematical models on real-world problems. A promising approach to overcome this difficulty is to \emph{enumerate} multiple small solutions rather than to \emph{find} a single small solution. To this end, given a weight function $w: U \to \mathbb N$ and an integer $k$, we devise algorithms that \emph{approximately} enumerate all minimal subsets of $U$ with weight at most $k$ satisfying $\Pi$ for various monotone properties $\Pi$, where "approximate enumeration" means that algorithms output all minimal subsets satisfying $\Pi$ whose weight at most $k$ and may output some minimal subsets satisfying $\Pi$ whose weight exceeds $k$ but is at most $ck$ for some constant $c \ge 1$. These algorithms allow us to efficiently enumerate minimal vertex covers, minimal dominating sets in bounded degree graphs, minimal feedback vertex sets, minimal hitting sets in bounded rank hypergraphs, etc., of weight at most $k$ with constant approximation factors.
Significant work has been done on computing the ``average'' optimal solution value for various $\mathsf{NP}$-complete problems using the Erd\"{o}s-R\'{e}nyi model to establish \emph{critical thresholds}. Critical thresholds define narrow bounds for the optimal solution of a problem instance such that the probability that the solution value lies outside these bounds vanishes as the instance size approaches infinity. In this paper, we extend the Erd\"{o}s-R\'{e}nyi model to general hypergraphs on $n$ vertices and $M$ hyperedges. We consider the problem of determining critical thresholds for the largest cardinality matching, and we show that for $M=o(1.155^n)$ the size of the maximum cardinality matching is almost surely 1. On the other hand, if $M=\Theta(2^n)$ then the size of the maximum cardinality matching is $\Omega(n^{\frac12-\gamma})$ for an arbitrary $\gamma >0$. Lastly, we address the gap where $\Omega(1.155^n)=M=o(2^n)$ empirically through computer simulations.
Union volume estimation is a classical algorithmic problem. Given a family of objects $O_1,\ldots,O_n \subseteq \mathbb{R}^d$, we want to approximate the volume of their union. In the special case where all objects are boxes (also known as hyperrectangles) this is known as Klee's measure problem. The state-of-the-art algorithm [Karp, Luby, Madras '89] for union volume estimation and Klee's measure problem in constant dimension $d$ computes a $(1+\varepsilon)$-approximation with constant success probability by using a total of $O(n/\varepsilon^2)$ queries of the form (i) ask for the volume of $O_i$, (ii) sample a point uniformly at random from $O_i$, and (iii) query whether a given point is contained in $O_i$. We show that if one can only interact with the objects via the aforementioned three queries, the query complexity of [Karp, Luby, Madras '89] is indeed optimal, i.e., $\Omega(n/\varepsilon^2)$ queries are necessary. Our lower bound already holds for estimating the union of equiponderous axis-aligned polygons in $\mathbb{R}^2$, and even if the algorithm is allowed to inspect the coordinates of the points sampled from the polygons, and still holds when a containment query can ask containment of an arbitrary (not sampled) point. Guided by the insights of the lower bound, we provide a more efficient approximation algorithm for Klee's measure problem improving the $O(n/\varepsilon^2)$ time to $O((n+\frac{1}{\varepsilon^2}) \cdot \log^{O(d)}n)$. We achieve this improvement by exploiting the geometry of Klee's measure problem in various ways: (1) Since we have access to the boxes' coordinates, we can split the boxes into classes of boxes of similar shape. (2) Within each class, we show how to sample from the union of all boxes, by using orthogonal range searching. And (3) we exploit that boxes of different classes have small intersection, for most pairs of classes.