We initiate the study of the algorithmic problem of certifying lower bounds on the discrepancy of random matrices: given an input matrix $A \in \mathbb{R}^{m \times n}$, output a value that is a lower bound on $\mathsf{disc}(A) = \min_{x \in \{\pm 1\}^n} ||Ax||_\infty$ for every $A$, but is close to the typical value of $\mathsf{disc}(A)$ with high probability over the choice of a random $A$. This problem is important because of its connections to conjecturally-hard average-case problems such as negatively-spiked PCA, the number-balancing problem and refuting random constraint satisfaction problems. We give the first polynomial-time algorithms with non-trivial guarantees for two main settings. First, when the entries of $A$ are i.i.d. standard Gaussians, it is known that $\mathsf{disc} (A) = \Theta (\sqrt{n}2^{-n/m})$ with high probability. Our algorithm certifies that $\mathsf{disc}(A) \geq \exp(- O(n^2/m))$ with high probability. As an application, this formally refutes a conjecture of Bandeira, Kunisky, and Wein on the computational hardness of the detection problem in the negatively-spiked Wishart model. Second, we consider the integer partitioning problem: given $n$ uniformly random $b$-bit integers $a_1, \ldots, a_n$, certify the non-existence of a perfect partition, i.e. certify that $\mathsf{disc} (A) \geq 1$ for $A = (a_1, \ldots, a_n)$. Under the scaling $b = \alpha n$, it is known that the probability of the existence of a perfect partition undergoes a phase transition from 1 to 0 at $\alpha = 1$; our algorithm certifies the non-existence of perfect partitions for some $\alpha = O(n)$. We also give efficient non-deterministic algorithms with significantly improved guarantees. Our algorithms involve a reduction to the Shortest Vector Problem.
In this work, we give a statistical characterization of the $\gamma$-regret for arbitrary structured bandit problems, the regret which arises when comparing against a benchmark that is $\gamma$ times the optimal solution. The $\gamma$-regret emerges in structured bandit problems over a function class $\mathcal{F}$ where finding an exact optimum of $f \in \mathcal{F}$ is intractable. Our characterization is given in terms of the $\gamma$-DEC, a statistical complexity parameter for the class $\mathcal{F}$, which is a modification of the constrained Decision-Estimation Coefficient (DEC) of Foster et al., 2023 (and closely related to the original offset DEC of Foster et al., 2021). Our lower bound shows that the $\gamma$-DEC is a fundamental limit for any model class $\mathcal{F}$: for any algorithm, there exists some $f \in \mathcal{F}$ for which the $\gamma$-regret of that algorithm scales (nearly) with the $\gamma$-DEC of $\mathcal{F}$. We provide an upper bound showing that there exists an algorithm attaining a nearly matching $\gamma$-regret. Due to significant challenges in applying the prior results on the DEC to the $\gamma$-regret case, both our lower and upper bounds require novel techniques and a new algorithm.
A numerical procedure providing guaranteed two-sided bounds on the effective coefficients of elliptic partial differential operators is presented. The upper bounds are obtained in a standard manner through the variational formulation of the problem and by applying the finite element method. To obtain the lower bounds we formulate the dual variational problem and introduce appropriate approximation spaces employing the finite element method as well. We deal with the 3D setting, which has been rarely considered in the literature so far. The theoretical justification of the procedure is presented and supported with illustrative examples.
In inverse problems, one attempts to infer spatially variable functions from indirect measurements of a system. To practitioners of inverse problems, the concept of "information" is familiar when discussing key questions such as which parts of the function can be inferred accurately and which cannot. For example, it is generally understood that we can identify system parameters accurately only close to detectors, or along ray paths between sources and detectors, because we have "the most information" for these places. Although referenced in many publications, the "information" that is invoked in such contexts is not a well understood and clearly defined quantity. Herein, we present a definition of information density that is based on the variance of coefficients as derived from a Bayesian reformulation of the inverse problem. We then discuss three areas in which this information density can be useful in practical algorithms for the solution of inverse problems, and illustrate the usefulness in one of these areas -- how to choose the discretization mesh for the function to be reconstructed -- using numerical experiments.
Partial differential equations (PDEs) are important tools to model physical systems and including them into machine learning models is an important way of incorporating physical knowledge. Given any system of linear PDEs with constant coefficients, we propose a family of Gaussian process (GP) priors, which we call EPGP, such that all realizations are exact solutions of this system. We apply the Ehrenpreis-Palamodov fundamental principle, which works as a non-linear Fourier transform, to construct GP kernels mirroring standard spectral methods for GPs. Our approach can infer probable solutions of linear PDE systems from any data such as noisy measurements, or pointwise defined initial and boundary conditions. Constructing EPGP-priors is algorithmic, generally applicable, and comes with a sparse version (S-EPGP) that learns the relevant spectral frequencies and works better for big data sets. We demonstrate our approach on three families of systems of PDEs, the heat equation, wave equation, and Maxwell's equations, where we improve upon the state of the art in computation time and precision, in some experiments by several orders of magnitude.
The problem of low rank approximation is ubiquitous in science. Traditionally this problem is solved in unitary invariant norms such as Frobenius or spectral norm due to existence of efficient methods for building approximations. However, recent results reveal the potential of low rank approximations in Chebyshev norm, which naturally arises in many applications. In this paper we tackle the problem of building optimal rank-1 approximations in the Chebyshev norm. We investigate the properties of alternating minimization algorithm for building the low rank approximations and demonstrate how to use it to construct optimal rank-1 approximation. As a result we propose an algorithm that is capable of building optimal rank-1 approximations in Chebyshev norm for small matrices.
This paper is devoted to the statistical and numerical properties of the geometric median, and its applications to the problem of robust mean estimation via the median of means principle. Our main theoretical results include (a) an upper bound for the distance between the mean and the median for general absolutely continuous distributions in R^d, and examples of specific classes of distributions for which these bounds do not depend on the ambient dimension d; (b) exponential deviation inequalities for the distance between the sample and the population versions of the geometric median, which again depend only on the trace-type quantities and not on the ambient dimension. As a corollary, we deduce improved bounds for the (geometric) median of means estimator that hold for large classes of heavy-tailed distributions. Finally, we address the error of numerical approximation, which is an important practical aspect of any statistical estimation procedure. We demonstrate that the objective function minimized by the geometric median satisfies a "local quadratic growth" condition that allows one to translate suboptimality bounds for the objective function to the corresponding bounds for the numerical approximation to the median itself, and propose a simple stopping rule applicable to any optimization method which yields explicit error guarantees. We conclude with the numerical experiments including the application to estimation of mean values of log-returns for S&P 500 data.
We propose a novel approach to the linear viscoelastic problem of shear-deformable geometrically exact beams. The generalized Maxwell model for one-dimensional solids is here efficiently extended to the case of arbitrarily curved beams undergoing finite displacement and rotations. High efficiency is achieved by combining a series of distinguishing features, that are i) the formulation is displacement-based, therefore no additional unknowns, other than incremental displacements and rotations, are needed for the internal variables associated with the rate-dependent material; ii) the governing equations are discretized in space using the isogeometric collocation method, meaning that elements integration is totally bypassed; iii) finite rotations are updated using the incremental rotation vector, leading to two main benefits: minimum number of rotation unknowns (the three components of the incremental rotation vector) and no singularity problems; iv) the same $\rm SO(3)$-consistent linearization of the governing equations and update procedures as for non-rate-dependent linear elastic material can be used; v) a standard second-order accurate time integration scheme is made consistent with the underlying geometric structure of the kinematic problem. Moreover, taking full advantage of the isogeometric analysis features, the formulation permits accurately representing beams and beam structures with highly complex initial shape and topology, paving the way for a large number of potential applications in the field of architectured materials, meta-materials, morphing/programmable objects, topological optimizations, etc. Numerical applications are finally presented in order to demonstrate attributes and potentialities of the proposed formulation.
We study the operator norm discrepancy of i.i.d. random matrices, initiating the matrix-valued analog of a long line of work on the $\ell^{\infty}$ norm discrepancy of i.i.d. random vectors. First, we give a new analysis of the matrix hyperbolic cosine algorithm of Zouzias (2011), a matrix version of an online vector discrepancy algorithm of Spencer (1977) studied for average-case inputs by Bansal and Spencer (2020), for the case of i.i.d. random matrix inputs. We both give a general analysis and extract concrete bounds on the discrepancy achieved by this algorithm for matrices with independent entries and positive semidefinite matrices drawn from Wishart distributions. Second, using the first moment method, we give lower bounds on the discrepancy of random matrices, in particular showing that the matrix hyperbolic cosine algorithm achieves optimal discrepancy up to logarithmic terms in several cases. We both treat the special case of the Gaussian orthogonal ensemble and give a general result for low-rank matrix distributions that we apply to orthogonally invariant random projections.
We adopt an information-theoretic framework to analyze the generalization behavior of the class of iterative, noisy learning algorithms. This class is particularly suitable for study under information-theoretic metrics as the algorithms are inherently randomized, and it includes commonly used algorithms such as Stochastic Gradient Langevin Dynamics (SGLD). Herein, we use the maximal leakage (equivalently, the Sibson mutual information of order infinity) metric, as it is simple to analyze, and it implies both bounds on the probability of having a large generalization error and on its expected value. We show that, if the update function (e.g., gradient) is bounded in $L_2$-norm and the additive noise is isotropic Gaussian noise, then one can obtain an upper-bound on maximal leakage in semi-closed form. Furthermore, we demonstrate how the assumptions on the update function affect the optimal (in the sense of minimizing the induced maximal leakage) choice of the noise. Finally, we compute explicit tight upper bounds on the induced maximal leakage for other scenarios of interest.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.