Interpolators are unstable. For example, the mininum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. In this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. We focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. We introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. The proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. Focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. Our results reveal the statistical roles of sketching and bagging. In particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. However, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. In stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. To further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. We also discuss several extensions.
The prevailing statistical approach to analyzing persistence diagrams is concerned with filtering out topological noise. In this paper, we adopt a different viewpoint and aim at estimating the actual distribution of a random persistence diagram, which captures both topological signal and noise. To that effect, Chazel and Divol (2019) proved that, under general conditions, the expected value of a random persistence diagram is a measure admitting a Lebesgue density, called the persistence intensity function. In this paper, we are concerned with estimating the persistence intensity function and a novel, normalized version of it -- called the persistence density function. We present a class of kernel-based estimators based on an i.i.d. sample of persistence diagrams and derive estimation rates in the supremum norm. As a direct corollary, we obtain uniform consistency rates for estimating linear representations of persistence diagrams, including Betti numbers and persistence surfaces. Interestingly, the persistence density function delivers stronger statistical guarantees.
Quantum computing promises transformational gains for solving some problems, but little to none for others. For anyone hoping to use quantum computers now or in the future, it is important to know which problems will benefit. In this paper, we introduce a framework for answering this question both intuitively and quantitatively. The underlying structure of the framework is a race between quantum and classical computers, where their relative strengths determine when each wins. While classical computers operate faster, quantum computers can sometimes run more efficient algorithms. Whether the speed advantage or the algorithmic advantage dominates determines whether a problem will benefit from quantum computing or not. Our analysis reveals that many problems, particularly those of small to moderate size that can be important for typical businesses, will not benefit from quantum computing. Conversely, larger problems or those with particularly big algorithmic gains will benefit from near-term quantum computing. Since very large algorithmic gains are rare in practice and theorized to be rare even in principle, our analysis suggests that the benefits from quantum computing will flow either to users of these rare cases, or practitioners processing very large data.
We address the computation of the degrees of minors of a noncommutative symbolic matrix of form \[ A[c] := \sum_{k=1}^m A_k t^{c_k} x_k, \] where $A_k$ are matrices over a field $\mathbb{K}$, $x_i$ are noncommutative variables, $c_k$ are integer weights, and $t$ is a commuting variable specifying the degree. This problem extends noncommutative Edmonds' problem (Ivanyos et al. 2017), and can formulate various combinatorial optimization problems. Extending the study by Hirai 2018, and Hirai, Ikeda 2022, we provide novel duality theorems and polyhedral characterization for the maximum degrees of minors of $A[c]$ of all sizes, and develop a strongly polynomial-time algorithm for computing them. This algorithm is viewed as a unified algebraization of the classical Hungarian method for bipartite matching and the weight-splitting algorithm for linear matroid intersection. As applications, we provide polynomial-time algorithms for weighted fractional linear matroid matching and linear optimization over rank-2 Brascamp-Lieb polytopes.
Boundary value problems involving elliptic PDEs such as the Laplace and the Helmholtz equations are ubiquitous in physics and engineering. Many such problems have alternative formulations as integral equations that are mathematically more tractable than their PDE counterparts. However, the integral equation formulation poses a challenge in solving the dense linear systems that arise upon discretization. In cases where iterative methods converge rapidly, existing methods that draw on fast summation schemes such as the Fast Multipole Method are highly efficient and well established. More recently, linear complexity direct solvers that sidestep convergence issues by directly computing an invertible factorization have been developed. However, storage and compute costs are high, which limits their ability to solve large-scale problems in practice. In this work, we introduce a distributed-memory parallel algorithm based on an existing direct solver named ``strong recursive skeletonization factorization.'' The analysis of its parallel scalability applies generally to a class of existing methods that exploit the so-called strong admissibility. Specifically, we apply low-rank compression to certain off-diagonal matrix blocks in a way that minimizes data movement. Given a compression tolerance, our method constructs an approximate factorization of a discretized integral operator (dense matrix), which can be used to solve linear systems efficiently in parallel. Compared to iterative algorithms, our method is particularly suitable for problems involving ill-conditioned matrices or multiple right-hand sides. Large-scale numerical experiments are presented to demonstrate the performance of our implementation using the Julia language.
Suppose a finite, unweighted, combinatorial graph $G = (V,E)$ is the union of several (degree-)regular graphs which are then additionally connected with a few additional edges. $G$ will then have only a small number of vertices $v \in V$ with the property that one of their neighbors $(v,w) \in E$ has a higher degree $\mbox{deg}(w) > \mbox{deg}(v)$. We prove the converse statement: if a graph has few vertices having a neighbor with higher degree and satisfies a mild regularity condition, then, via adding and removing a few edges, the graph can be turned into a disjoint union of (distance-)regular graphs. The number of edge operations depends on the maximum degree and number of vertices with a higher degree neighbor but is independent of the size of $|V|$.
Bidirectional typing is a discipline in which the typing judgment is decomposed explicitly into inference and checking modes, allowing to control the flow of type information in typing rules and to specify algorithmically how they should be used. Bidirectional typing has been fruitfully studied and bidirectional systems have been developed for many type theories. However, the formal development of bidirectional typing has until now been kept confined to specific theories, with general guidelines remaining informal. In this work, we give a generic account of bidirectional typing for a general class of dependent type theories. This is done by first giving a general definition of type theories (or equivalently, a logical framework), for which we define declarative and bidirectional type systems. We then show, in a theory-independent fashion, that the two systems are equivalent. This equivalence is then explored to establish the decidability of typing for weak normalizing theories, yielding a generic type-checking algorithm that has been implemented in a prototype and used in practice with many theories.
New lower order $H(\textrm{div})$-conforming finite elements for symmetric tensors are constructed in arbitrary dimension. The space of shape functions is defined by enriching the symmetric quadratic polynomial space with the $(d+1)$-order normal-normal face bubble space. The reduced counterpart has only $d(d+1)^2$ degrees of freedom. In two dimensions, basis functions are explicitly given in terms of barycentric coordinates. Lower order conforming finite element elasticity complexes starting from the Bell element, are developed in two dimensions. These finite elements for symmetric tensors are applied to devise robust mixed finite element methods for the linear elasticity problem, which possess the uniform error estimates with respect to the Lam\'{e} coefficient $\lambda$, and superconvergence for the displacement. Numerical results are provided to verify the theoretical convergence rates.
Symmetry is a cornerstone of much of mathematics, and many probability distributions possess symmetries characterized by their invariance to a collection of group actions. Thus, many mathematical and statistical methods rely on such symmetry holding and ostensibly fail if symmetry is broken. This work considers under what conditions a sequence of probability measures asymptotically gains such symmetry or invariance to a collection of group actions. Considering the many symmetries of the Gaussian distribution, this work effectively proposes a non-parametric type of central limit theorem. That is, a Lipschitz function of a high dimensional random vector will be asymptotically invariant to the actions of certain compact topological groups. Applications of this include a partial law of the iterated logarithm for uniformly random points in an $\ell_p^n$-ball and an asymptotic equivalence between classical parametric statistical tests and their randomization counterparts even when invariance assumptions are violated.
Tukey's depth (or halfspace depth) is a widely used measure of centrality for multivariate data. However, exact computation of Tukey's depth is known to be a hard problem in high dimensions. As a remedy, randomized approximations of Tukey's depth have been proposed. In this paper we explore when such randomized algorithms return a good approximation of Tukey's depth. We study the case when the data are sampled from a log-concave isotropic distribution. We prove that, if one requires that the algorithm runs in polynomial time in the dimension, the randomized algorithm correctly approximates the maximal depth $1/2$ and depths close to zero. On the other hand, for any point of intermediate depth, any good approximation requires exponential complexity.
We describe a new dependent-rounding algorithmic framework for bipartite graphs. Given a fractional assignment $y$ of values to edges of graph $G = (U \cup V, E)$, the algorithms return an integral solution $Y$ such that each right-node $v \in V$ has at most one neighboring edge $f$ with $Y_f = 1$, and where the variables $Y_e$ also satisfy broad nonpositive-correlation properties. In particular, for any edges $e_1, e_2$ sharing a left-node $u \in U$, the variables $Y_{e_1}, Y_{e_2}$ have strong negative-correlation properties, i.e. the expectation of $Y_{e_1} Y_{e_2}$ is significantly below $y_{e_1} y_{e_2}$. This algorithm is based on generating negatively-correlated Exponential random variables and using them in a contention-resolution scheme inspired by an algorithm Im & Shadloo (2020). Our algorithm gives stronger and much more flexible negative correlation properties. Dependent rounding schemes with negative correlation properties have been used for approximation algorithms for job-scheduling on unrelated machines to minimize weighted completion times (Bansal, Srinivasan, & Svensson (2021), Im & Shadloo (2020), Im & Li (2023)). Using our new dependent-rounding algorithm, among other improvements, we obtain a $1.4$-approximation for this problem. This significantly improves over the prior $1.45$-approximation ratio of Im & Li (2023).