We present a new algorithm for automatically bounding the Taylor remainder series. In the special case of a scalar function $f: \mathbb{R} \to \mathbb{R}$, our algorithm takes as input a reference point $x_0$, trust region $[a, b]$, and integer $k \ge 1$, and returns an interval $I$ such that $f(x) - \sum_{i=0}^{k-1} \frac {1} {i!} f^{(i)}(x_0) (x - x_0)^i \in I (x - x_0)^k$ for all $x \in [a, b]$. As in automatic differentiation, the function $f$ is provided to the algorithm in symbolic form, and must be composed of known atomic functions. At a high level, our algorithm has two steps. First, for a variety of commonly-used elementary functions (e.g., $\exp$, $\log$), we use recently-developed theory to derive sharp polynomial upper and lower bounds on the Taylor remainder series. We then recursively combine the bounds for the elementary functions using an interval arithmetic variant of Taylor-mode automatic differentiation. Our algorithm can make efficient use of machine learning hardware accelerators, and we provide an open source implementation in JAX. We then turn our attention to applications. Most notably, in a companion paper we use our new machinery to create the first universal majorization-minimization optimization algorithms: algorithms that iteratively minimize an arbitrary loss using a majorizer that is derived automatically, rather than by hand. We also show that our automatically-derived bounds can be used for verified global optimization and numerical integration, and to prove sharper versions of Jensen's inequality.
Distributed maximization of a submodular function in the MapReduce model has received much attention, culminating in two frameworks that allow a centralized algorithm to be run in the MR setting without loss of approximation, as long as the centralized algorithm satisfies a certain consistency property - which had only been shown to be satisfied by the standard greedy and continous greedy algorithms. A separate line of work has studied parallelizability of submodular maximization in the adaptive complexity model, where each thread may have access to the entire ground set. For the size-constrained maximization of a monotone and submodular function, we show that several sublinearly adaptive algorithms satisfy the consistency property required to work in the MR setting, which yields highly practical parallelizable and distributed algorithms. Also, we develop the first linear-time distributed algorithm for this problem with constant MR rounds. Finally, we provide a method to increase the maximum cardinality constraint for MR algorithms at the cost of additional MR rounds.
We consider the problem of identifying, from statistics, a distribution of discrete random variables $X_1,\ldots,X_n$ that is a mixture of $k$ product distributions. The best previous sample complexity for $n \in O(k)$ was $(1/\zeta)^{O(k^2 \log k)}$ (under a mild separation assumption parameterized by $\zeta$). The best known lower bound was $\exp(\Omega(k))$. It is known that $n\geq 2k-1$ is necessary and sufficient for identification. We show, for any $n\geq 2k-1$, how to achieve sample complexity and run-time complexity $(1/\zeta)^{O(k)}$. We also extend the known lower bound of $e^{\Omega(k)}$ to match our upper bound across a broad range of $\zeta$. Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.
We present a generalisation of the theory of quantitative algebras of Mardare, Panangaden and Plotkin where (i) the carriers of quantitative algebras are not restricted to be metric spaces and can be arbitrary fuzzy relations or generalised metric spaces, and (ii) the interpretations of the algebraic operations are not required to be nonexpansive. Our main results include: a novel sound and complete proof system, the proof that free quantitative algebras always exist, the proof of strict monadicity of the induced Free-Forgetful adjunction, the result that all monads (on fuzzy relations) that lift finitary monads (on sets) admit a quantitative equational presentation.
The hitting set problem is a well-known NP-hard optimization problem in which, given a set of elements and a collection of subsets, the goal is to find the smallest selection of elements, such that each subset contains at least one element in the selection. Many geometric set systems enjoy improved approximation ratios, which have recently been shown to be tight with respect to the shallow cell complexity of the set system. The algorithms that exploit the cell complexity, however, tend to be involved and computationally intensive. This paper shows that a slightly improved asymptotic approximation ratio for the hitting set problem can be attained using a much simpler algorithm: solve the linear programming relaxation, take one initial random sample from the set of elements with probabilities proportional to the LP-solution, and, while there is an unhit set, take an additional sample from it proportional to the LP-solution. Our algorithm is a simple generalization of the elegant net-finder algorithm by Nabil Mustafa. To analyze this algorithm for the hitting set problem, we generalize the classic Packing Lemma, and the more recent Shallow Packing Lemma, to the setting of weighted epsilon-nets.
The theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. Under data heterogeneity however, which is the norm in practical scenarios, established lower bounds on the learning error are essentially vacuous and greatly mismatch empirical observations. This is because the heterogeneity model considered is too restrictive and does not cover basic learning tasks such as least-squares regression. We consider in this paper a more realistic heterogeneity model, namely (G,B)-gradient dissimilarity, and show that it covers a larger class of learning problems than existing theory. Notably, we show that the breakdown point under heterogeneity is lower than the classical fraction 1/2. We also prove a new lower bound on the learning error of any distributed learning algorithm. We derive a matching upper bound for a robust variant of distributed gradient descent, and empirically show that our analysis reduces the gap between theory and practice.
We present the first $\varepsilon$-differentially private, computationally efficient algorithm that estimates the means of product distributions over $\{0,1\}^d$ accurately in total-variation distance, whilst attaining the optimal sample complexity to within polylogarithmic factors. The prior work had either solved this problem efficiently and optimally under weaker notions of privacy, or had solved it optimally while having exponential running times.
Join-preserving maps on the discrete time scale $\omega^+$, referred to as time warps, have been proposed as graded modalities that can be used to quantify the growth of information in the course of program execution. The set of time warps forms a simple distributive involutive residuated lattice -- called the time warp algebra -- that is equipped with residual operations relevant to potential applications. In this paper, we show that although the time warp algebra generates a variety that lacks the finite model property, it nevertheless has a decidable equational theory. We also describe an implementation of a procedure for deciding equations in this algebra, written in the OCaml programming language, that makes use of the Z3 theorem prover.
An introductory exposition of the virtual element method (VEM) is provided. The intent is to make this method more accessible to those unfamiliar with VEM. Familiarity with the finite element method for solving 2D linear elasticity problems is assumed. Derivations relevant to successful implementation are covered. Some theory is covered, but the focus here is on implementation and results. Examples are given that illustrate the utility of the method. Numerical results are provided to help researchers implement and verify their own results.
The problem of function approximation by neural dynamical systems has typically been approached in a top-down manner: Any continuous function can be approximated to an arbitrary accuracy by a sufficiently complex model with a given architecture. This can lead to high-complexity controls which are impractical in applications. In this paper, we take the opposite, constructive approach: We impose various structural restrictions on system dynamics and consequently characterize the class of functions that can be realized by such a system. The systems are implemented as a cascade interconnection of a neural stochastic differential equation (Neural SDE), a deterministic dynamical system, and a readout map. Both probabilistic and geometric (Lie-theoretic) methods are used to characterize the classes of functions realized by such systems.
We study the existence of optimal and p-optimal proof systems for classes in the Boolean hierarchy over $\mathrm{NP}$. Our main results concern $\mathrm{DP}$, i.e., the second level of this hierarchy: If all sets in $\mathrm{DP}$ have p-optimal proof systems, then all sets in $\mathrm{coDP}$ have p-optimal proof systems. The analogous implication for optimal proof systems fails relative to an oracle. As a consequence, we clarify such implications for all classes $\mathcal{C}$ and $\mathcal{D}$ in the Boolean hierarchy over $\mathrm{NP}$: either we can prove the implication or show that it fails relative to an oracle. Furthermore, we show that the sets $\mathrm{SAT}$ and $\mathrm{TAUT}$ have p-optimal proof systems, if and only if all sets in the Boolean hierarchy over $\mathrm{NP}$ have p-optimal proof systems which is a new characterization of a conjecture studied by Pudl\'ak.