The computation of f(A)b, the action of a matrix function on a vector, is a task arising in many areas of scientific computing. In many applications, the matrix A is sparse but so large that only a rather small number of Krylov basis vectors can be stored. Here we discuss a new approach to overcome these limitations by randomized sketching combined with an integral representation of f(A)b. Two different approximations are introduced, one based on sketched FOM and another based on sketched GMRES approximation. The convergence of the latter method is analyzed for Stieltjes functions of positive real matrices. We also derive a closed form expression for the sketched FOM approximant and bound its distance to the full FOM approximant. Numerical experiments demonstrate the potential of the presented sketching approaches.
We give a short argument that yields a new lower bound on the number of subsampled rows from a bounded, orthonormal matrix necessary to form a matrix with the restricted isometry property. We show that a matrix formed by uniformly subsampling rows of an $N \times N$ Walsh matrix contains a $K$-sparse vector in the kernel, unless the number of subsampled rows is $\Omega(K \log K \log (N/K))$ -- our lower bound applies whenever $\min(K, N/K) > \log^C N$. Containing a sparse vector in the kernel precludes not only the restricted isometry property, but more generally the application of those matrices for uniform sparse recovery.
We study the question of local testability of low (constant) degree functions from a product domain $S_1 \times \dots \times {S}_n$ to a field $\mathbb{F}$, where ${S_i} \subseteq \mathbb{F}$ can be arbitrary constant sized sets. We show that this family is locally testable when the grid is "symmetric". That is, if ${S_i} = {S}$ for all i, there is a probabilistic algorithm using constantly many queries that distinguishes whether $f$ has a polynomial representation of degree at most $d$ or is $\Omega(1)$-far from having this property. In contrast, we show that there exist asymmetric grids with $|{S}_1| =\dots= |{S}_n| = 3$ for which testing requires $\omega_n(1)$ queries, thereby establishing that even in the context of polynomials, local testing depends on the structure of the domain and not just the distance of the underlying code. The low-degree testing problem has been studied extensively over the years and a wide variety of tools have been applied to propose and analyze tests. Our work introduces yet another new connection in this rich field, by building low-degree tests out of tests for "junta-degrees". A function $f : {S}_1 \times \dots \times {S}_n \to {G}$, for an abelian group ${G}$ is said to be a junta-degree-$d$ function if it is a sum of $d$-juntas. We derive our low-degree test by giving a new local test for junta-degree-$d$ functions. For the analysis of our tests, we deduce a small-set expansion theorem for spherical noise over large grids, which may be of independent interest.
We show that the sparsified block elimination algorithm for solving undirected Laplacian linear systems from [Kyng-Lee-Peng-Sachdeva-Spielman STOC'16] directly works for directed Laplacians. Given access to a sparsification algorithm that, on graphs with $n$ vertices and $m$ edges, takes time $\mathcal{T}_{\rm S}(m)$ to output a sparsifier with $\mathcal{N}_{\rm S}(n)$ edges, our algorithm solves a directed Eulerian system on $n$ vertices and $m$ edges to $\epsilon$ relative accuracy in time $$ O(\mathcal{T}_{\rm S}(m) + {\mathcal{N}_{\rm S}(n)\log {n}\log(n/\epsilon)}) + \tilde{O}(\mathcal{T}_{\rm S}(\mathcal{N}_{\rm S}(n)) \log n), $$ where the $\tilde{O}(\cdot)$ notation hides $\log\log(n)$ factors. By previous results, this implies improved runtimes for linear systems in strongly connected directed graphs, PageRank matrices, and asymmetric M-matrices. When combined with slower constructions of smaller Eulerian sparsifiers based on short cycle decompositions, it also gives a solver that runs in $O(n \log^{5}n \log(n / \epsilon))$ time after $O(n^2 \log^{O(1)} n)$ pre-processing. At the core of our analyses are constructions of augmented matrices whose Schur complements encode error matrices.
A circuit $\mathcal{C}$ samples a distribution $\mathbf{X}$ with an error $\epsilon$ if the statistical distance between the output of $\mathcal{C}$ on the uniform input and $\mathbf{X}$ is $\epsilon$. We study the hardness of sampling a uniform distribution over the set of $n$-bit strings of Hamming weight $k$ denoted by $\mathbf{U}^n_k$ for _decision forests_, i.e. every output bit is computed as a decision tree of the inputs. For every $k$ there is an $O(\log n)$-depth decision forest sampling $\mathbf{U}^n_k$ with an inverse-polynomial error [Viola 2012, Czumaj 2015]. We show that for every $\epsilon > 0$ there exists $\tau$ such that for decision depth $\tau \log (n/k) / \log \log (n/k)$, the error for sampling $\mathbf{U}_k^n$ is at least $1-\epsilon$. Our result is based on the recent robust sunflower lemma [Alweiss, Lovett, Wu, Zhang 2021, Rao 2019]. Our second result is about matching a set of $n$-bit strings with the image of a $d$-_local_ circuit, i.e. such that each output bit depends on at most $d$ input bits. We study the set of all $n$-bit strings whose Hamming weight is at least $n/2$. We improve the previously known locality lower bound from $\Omega(\log^* n)$ [Beyersdorff, Datta, Krebs, Mahajan, Scharfenberger-Fabian, Sreenivasaiah, Thomas and Vollmer, 2013] to $\Omega(\sqrt{\log n})$, leaving only a quartic gap from the best upper bound of $O(\log^2 n)$.
We present a new approach for computing compact sketches that can be used to approximate the inner product between pairs of high-dimensional vectors. Based on the Weighted MinHash algorithm, our approach admits strong accuracy guarantees that improve on the guarantees of popular linear sketching approaches for inner product estimation, such as CountSketch and Johnson-Lindenstrauss projection. Specifically, while our method admits guarantees that exactly match linear sketching for dense vectors, it yields significantly lower error for sparse vectors with limited overlap between non-zero entries. Such vectors arise in many applications involving sparse data. They are also important in increasingly popular dataset search applications, where inner product sketches are used to estimate data covariance, conditional means, and other quantities involving columns in unjoined tables. We complement our theoretical results by showing that our approach empirically outperforms existing linear sketches and unweighted hashing-based sketches for sparse vectors.
Random smoothing data augmentation is a unique form of regularization that can prevent overfitting by introducing noise to the input data, encouraging the model to learn more generalized features. Despite its success in various applications, there has been a lack of systematic study on the regularization ability of random smoothing. In this paper, we aim to bridge this gap by presenting a framework for random smoothing regularization that can adaptively and effectively learn a wide range of ground truth functions belonging to the classical Sobolev spaces. Specifically, we investigate two underlying function spaces: the Sobolev space of low intrinsic dimension, which includes the Sobolev space in $D$-dimensional Euclidean space or low-dimensional sub-manifolds as special cases, and the mixed smooth Sobolev space with a tensor structure. By using random smoothing regularization as novel convolution-based smoothing kernels, we can attain optimal convergence rates in these cases using a kernel gradient descent algorithm, either with early stopping or weight decay. It is noteworthy that our estimator can adapt to the structural assumptions of the underlying data and avoid the curse of dimensionality. This is achieved through various choices of injected noise distributions such as Gaussian, Laplace, or general polynomial noises, allowing for broad adaptation to the aforementioned structural assumptions of the underlying data. The convergence rate depends only on the effective dimension, which may be significantly smaller than the actual data dimension. We conduct numerical experiments on simulated data to validate our theoretical results.
A modification of Newton's method for solving systems of $n$ nonlinear equations is presented. The new matrix-free method relies on a given decomposition of the invertible Jacobian of the residual into invertible sparse local Jacobians according to the chain rule of differentiation. It is motivated in the context of local Jacobians with bandwidth $2m+1$ for $m\ll n$. A reduction of the computational cost by $\mathcal{O}(\frac{n}{m})$ can be observed. Supporting run time measurements are presented for the tridiagonal case showing a reduction of the computational cost by $\mathcal{O}(n).$ Generalization yields the combinatorial Matrix-Free Newton Step problem. We prove NP-completeness and we present algorithmic components for building methods for the approximate solution. Inspired by adjoint Algorithmic Differentiation, the new method shares several challenges for the latter including the DAG Reversal problem. Further challenges are due to combinatorial problems in sparse linear algebra such as Bandwidth or Directed Elimination Ordering.
Artificial intelligence (AI) has become a part of everyday conversation and our lives. It is considered as the new electricity that is revolutionizing the world. AI is heavily invested in both industry and academy. However, there is also a lot of hype in the current AI debate. AI based on so-called deep learning has achieved impressive results in many problems, but its limits are already visible. AI has been under research since the 1940s, and the industry has seen many ups and downs due to over-expectations and related disappointments that have followed. The purpose of this book is to give a realistic picture of AI, its history, its potential and limitations. We believe that AI is a helper, not a ruler of humans. We begin by describing what AI is and how it has evolved over the decades. After fundamentals, we explain the importance of massive data for the current mainstream of artificial intelligence. The most common representations for AI, methods, and machine learning are covered. In addition, the main application areas are introduced. Computer vision has been central to the development of AI. The book provides a general introduction to computer vision, and includes an exposure to the results and applications of our own research. Emotions are central to human intelligence, but little use has been made in AI. We present the basics of emotional intelligence and our own research on the topic. We discuss super-intelligence that transcends human understanding, explaining why such achievement seems impossible on the basis of present knowledge,and how AI could be improved. Finally, a summary is made of the current state of AI and what to do in the future. In the appendix, we look at the development of AI education, especially from the perspective of contents at our own university.
The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapted, often hierarchical, features capture the appropriate notion of regularity for each task, and second, learning by local gradient-descent type methods, typically implemented as backpropagation. While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not generic, and come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world. This text is concerned with exposing these regularities through unified geometric principles that can be applied throughout a wide spectrum of applications. Such a 'geometric unification' endeavour, in the spirit of Felix Klein's Erlangen Program, serves a dual purpose: on one hand, it provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers. On the other hand, it gives a constructive procedure to incorporate prior physical knowledge into neural architectures and provide principled way to build future architectures yet to be invented.
A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.