The high efficiency of a recently proposed method for computing with Gaussian processes relies on expanding a (translationally invariant) covariance kernel into complex exponentials, with frequencies lying on a Cartesian equispaced grid. Here we provide rigorous error bounds for this approximation for two popular kernels -- Mat\'ern and squared exponential -- in terms of the grid spacing and size. The kernel error bounds are uniform over a hypercube centered at the origin. Our tools include a split into aliasing and truncation errors, and bounds on sums of Gaussians or modified Bessel functions over various lattices. For the Mat\'ern case, motivated by numerical study, we conjecture a stronger Frobenius-norm bound on the covariance matrix error for randomly-distributed data points. Lastly, we prove bounds on, and study numerically, the ill-conditioning of the linear systems arising in such regression problems.
We study the problem of learning nonparametric distributions in a finite mixture, and establish tight bounds on the sample complexity for learning the component distributions in such models. Namely, we are given i.i.d. samples from a pdf $f$ where $$ f=w_1f_1+w_2f_2, \quad w_1+w_2=1, \quad w_1,w_2>0 $$ and we are interested in learning each component $f_i$. Without any assumptions on $f_i$, this problem is ill-posed. In order to identify the components $f_i$, we assume that each $f_i$ can be written as a convolution of a Gaussian and a compactly supported density $\nu_i$ with $\text{supp}(\nu_1)\cap \text{supp}(\nu_2)=\emptyset$. Our main result shows that $(\frac{1}{\varepsilon})^{\Omega(\log\log \frac{1}{\varepsilon})}$ samples are required for estimating each $f_i$. The proof relies on a quantitative Tauberian theorem that yields a fast rate of approximation with Gaussians, which may be of independent interest. To show this is tight, we also propose an algorithm that uses $(\frac{1}{\varepsilon})^{O(\log\log \frac{1}{\varepsilon})}$ samples to estimate each $f_i$. Unlike existing approaches to learning latent variable models based on moment-matching and tensor methods, our proof instead involves a delicate analysis of an ill-conditioned linear system via orthogonal functions. Combining these bounds, we conclude that the optimal sample complexity of this problem properly lies in between polynomial and exponential, which is not common in learning theory.
In this paper, we study the graph induced by the $\textit{2-swap}$ permutation on words with a fixed Parikh vector. A $2$-swap is defined as a pair of positions $s = (i, j)$ where the word $w$ induced by the swap $s$ on $v$ is $v[1] v[2] \dots v[i - 1] v[j] v[i+1] \dots v[j - 1] v[i] v[j + 1] \dots v[n]$. With these permutations, we define the $\textit{Configuration Graph}$, $G(P)$ defined over a given Parikh vector. Each vertex in $G(P)$ corresponds to a unique word with the Parikh vector $P$, with an edge between any pair of words $v$ and $w$ if there exists a swap $s$ such that $v \circ s = w$. We provide several key combinatorial properties of this graph, including the exact diameter of this graph, the clique number of the graph, and the relationships between subgraphs within this graph. Additionally, we show that for every vertex in the graph, there exists a Hamiltonian path starting at this vertex. Finally, we provide an algorithm enumerating these paths from a given input word of length $n$ with a delay of at most $O(\log n)$ between outputting edges, requiring $O(n \log n)$ preprocessing.
Quality diversity~(QD) is a branch of evolutionary computation that gained increasing interest in recent years. The Map-Elites QD approach defines a feature space, i.e., a partition of the search space, and stores the best solution for each cell of this space. We study a simple QD algorithm in the context of pseudo-Boolean optimisation on the ``number of ones'' feature space, where the $i$th cell stores the best solution amongst those with a number of ones in $[(i-1)k, ik-1]$. Here $k$ is a granularity parameter $1 \leq k \leq n+1$. We give a tight bound on the expected time until all cells are covered for arbitrary fitness functions and for all $k$ and analyse the expected optimisation time of QD on \textsc{OneMax} and other problems whose structure aligns favourably with the feature space. On combinatorial problems we show that QD finds a ${(1-1/e)}$-approximation when maximising any monotone sub-modular function with a single uniform cardinality constraint efficiently. Defining the feature space as the number of connected components of a connected graph, we show that QD finds a minimum spanning tree in expected polynomial time.
In the negative perceptron problem we are given $n$ data points $({\boldsymbol x}_i,y_i)$, where ${\boldsymbol x}_i$ is a $d$-dimensional vector and $y_i\in\{+1,-1\}$ is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible \emph{negative} margin. In other words, we want to find a unit norm vector ${\boldsymbol \theta}$ that maximizes $\min_{i\le n}y_i\langle {\boldsymbol \theta},{\boldsymbol x}_i\rangle$. This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which $n,d\to \infty$ with $n/d\to\delta$, and prove upper and lower bounds on the maximum margin $\kappa_{\text{s}}(\delta)$ or -- equivalently -- on its inverse function $\delta_{\text{s}}(\kappa)$. In other words, $\delta_{\text{s}}(\kappa)$ is the overparametrization threshold: for $n/d\le \delta_{\text{s}}(\kappa)-\varepsilon$ a classifier achieving vanishing training error exists with high probability, while for $n/d\ge \delta_{\text{s}}(\kappa)+\varepsilon$ it does not. Our bounds on $\delta_{\text{s}}(\kappa)$ match to the leading order as $\kappa\to -\infty$. We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold $\delta_{\text{lin}}(\kappa)$. We observe a gap between the interpolation threshold $\delta_{\text{s}}(\kappa)$ and the linear programming threshold $\delta_{\text{lin}}(\kappa)$, raising the question of the behavior of other algorithms.
Many algorithms which exactly solve hard problems require branching on more or less complex structures in order to do their job. Those who design such algorithms often find themselves doing a meticulous analysis of numerous different cases in order to identify these structures and design suitable branching rules, all done by hand. This process tends to be error prone and often the resulting algorithm may be difficult to implement in practice. In this work, we aim to automate a part of this process and focus on simplicity of the resulting implementation. We showcase our approach on the following problem. For a constant $d$, the $d$-Path Vertex Cover problem ($d$-PVC) is as follows: Given an undirected graph and an integer $k$, find a subset of at most $k$ vertices of the graph, such that their deletion results in a graph not containing a path on $d$ vertices as a subgraph. We develop a fully automated framework to generate parameterized branching algorithms for the problem and obtain algorithms outperforming those previously known for $3 \le d \le 8$. E.g., we show that $5$-PVC can be solved in $O(2.7^k\cdot n^{O(1)})$ time.
Effective resistances are ubiquitous in graph algorithms and network analysis. In this work, we study sublinear time algorithms to approximate the effective resistance of an adjacent pair $s$ and $t$. We consider the classical adjacency list model for local algorithms. While recent works have provided sublinear time algorithms for expander graphs, we prove several lower bounds for general graphs of $n$ vertices and $m$ edges: 1.It needs $\Omega(n)$ queries to obtain $1.01$-approximations of the effective resistance of an adjacent pair $s$ and $t$, even for graphs of degree at most 3 except $s$ and $t$. 2.For graphs of degree at most $d$ and any parameter $\ell$, it needs $\Omega(m/\ell)$ queries to obtain $c \cdot \min\{d, \ell\}$-approximations where $c>0$ is a universal constant. Moreover, we supplement the first lower bound by providing a sublinear time $(1+\epsilon)$-approximation algorithm for graphs of degree 2 except the pair $s$ and $t$. One of our technical ingredients is to bound the expansion of a graph in terms of the smallest non-trivial eigenvalue of its Laplacian matrix after removing edges. We discover a new lower bound on the eigenvalues of perturbed graphs (resp. perturbed matrices) by incorporating the effective resistance of the removed edge (resp. the leverage scores of the removed rows), which may be of independent interest.
To integrate large systems of nonlinear differential equations in time, we consider a variant of nonlinear waveform relaxation (also known as dynamic iteration or Picard-Lindel\"of iteration), where at each iteration a linear inhomogeneous system of differential equations has to be solved. This is done by the exponential block Krylov subspace (EBK) method. Thus, we have an inner-outer iterative method, where iterative approximations are determined over a certain time interval, with no time stepping involved. This approach has recently been shown to be efficient as a time-parallel integrator within the PARAEXP framework. In this paper, convergence behavior of this method is assessed theoretically and practically. We examine efficiency of the method by testing it on nonlinear Burgers and Liouville-Bratu-Gelfand equations and comparing its performance with that of conventional time-stepping integrators.
Single-use anion-exchange resins can reduce hazardous chromates to safe levels in drinking water. However, since most process control strategies monitor effluent concentrations, detection of any chromate leakage leads to premature resin replacement. Furthermore, variations in the inlet chromate concentration and other process conditions make process control a challenging step. In this work, we capture the uncertainty of the process conditions by applying the Ito process of Brownian motion with drift into a stochastic optimal control strategy. The ion exchange process is modeled using the method of moments which helps capture the process dynamics, later formulated into mathematical objectives representing desired chromate removal. We then solved our developed models as an optimal control problem via Pontryagin's maximum principle. The objectives enabled a successful control via flow rate adjustments leading to higher chromate extraction. Such an approach maximized the capacity of the resin and column efficiency to remove toxic compounds from water while capturing deviations in the process conditions.
A Random Vector Functional Link (RVFL) network is a depth-2 neural network with random inner weights and biases. As only the outer weights of such architectures need to be learned, the learning process boils down to a linear optimization task, allowing one to sidestep the pitfalls of nonconvex optimization problems. In this paper, we prove that an RVFL with ReLU activation functions can approximate Lipschitz continuous functions provided its hidden layer is exponentially wide in the input dimension. Although it has been established before that such approximation can be achieved in $L_2$ sense, we prove it for $L_\infty$ approximation error and Gaussian inner weights. To the best of our knowledge, our result is the first of this kind. We give a nonasymptotic lower bound for the number of hidden layer nodes, depending on, among other things, the Lipschitz constant of the target function, the desired accuracy, and the input dimension. Our method of proof is rooted in probability theory and harmonic analysis.
Given an $n$ by $n$ matrix $A$ and an $n$-vector $b$, along with a rational function $R(z) := D(z )^{-1} N(z)$, we show how to find the optimal approximation to $R(A) b$ from the Krylov space, $\mbox{span}( b, Ab, \ldots , A^{k-1} b)$, using the basis vectors produced by the Arnoldi algorithm. To find this optimal approximation requires running $\max \{ \mbox{deg} (D) , \mbox{deg} (N) \} - 1$ extra Arnoldi steps and solving a $k + \max \{ \mbox{deg} (D) , \mbox{deg} (N) \}$ by $k$ least squares problem. Here {\em optimal} is taken to mean optimal in the $D(A )^{*} D(A)$-norm. Similar to the case for linear systems, we show that eigenvalues alone cannot provide information about the convergence behavior of this algorithm and we discuss other possible error bounds for highly nonnormal matrices.