In the kernel density estimation (KDE) problem one is given a kernel $K(x, y)$ and a dataset $P$ of points in a Euclidean space, and must prepare a data structure that can quickly answer density queries: given a point $q$, output a $(1+\epsilon)$-approximation to $\mu:=\frac1{|P|}\sum_{p\in P} K(p, q)$. The classical approach to KDE is the celebrated fast multipole method of [Greengard and Rokhlin]. The fast multipole method combines a basic space partitioning approach with a multidimensional Taylor expansion, which yields a $\approx \log^d (n/\epsilon)$ query time (exponential in the dimension $d$). A recent line of work initiated by [Charikar and Siminelakis] achieved polynomial dependence on $d$ via a combination of random sampling and randomized space partitioning, with [Backurs et al.] giving an efficient data structure with query time $\approx \mathrm{poly}{\log(1/\mu)}/\epsilon^2$ for smooth kernels. Quadratic dependence on $\epsilon$, inherent to the sampling methods, is prohibitively expensive for small $\epsilon$. This issue is addressed by quasi-Monte Carlo methods in numerical analysis. The high level idea in quasi-Monte Carlo methods is to replace random sampling with a discrepancy based approach -- an idea recently applied to coresets for KDE by [Phillips and Tai]. The work of Phillips and Tai gives a space efficient data structure with query complexity $\approx 1/(\epsilon \mu)$. This is polynomially better in $1/\epsilon$, but exponentially worse in $1/\mu$. We achieve the best of both: a data structure with $\approx \mathrm{poly}{\log(1/\mu)}/\epsilon$ query time for smooth kernel KDE. Our main insight is a new way to combine discrepancy theory with randomized space partitioning inspired by, but significantly more efficient than, that of the fast multipole methods. We hope that our techniques will find further applications to linear algebra for kernel matrices.
Let $n$ be the size of a parameterized problem and $k$ the parameter. We present kernels for Feedback Vertex Set, Path Contraction and Cluster Editing/Deletion whose sizes are all polynomial in $k$ and that are computable in polynomial time and with $O(\rm{poly}(k) \log n)$ bits (of working memory). By using kernel cascades, we obtain the best known kernels in polynomial time with $O(\rm{poly}(k) \log n)$ bits.
The problem of minimizing the maximum of $N$ convex, Lipschitz functions plays significant roles in optimization and machine learning. It has a series of results, with the most recent one requiring $O(N\epsilon^{-2/3} + \epsilon^{-8/3})$ queries to a first-order oracle to compute an $\epsilon$-suboptimal point. On the other hand, quantum algorithms for optimization are rapidly advancing with speedups shown on many important optimization problems. In this paper, we conduct a systematic study for quantum algorithms and lower bounds for minimizing the maximum of $N$ convex, Lipschitz functions. On one hand, we develop quantum algorithms with an improved complexity bound of $\tilde{O}(\sqrt{N}\epsilon^{-5/3} + \epsilon^{-8/3})$. On the other hand, we prove that quantum algorithms must take $\tilde{\Omega}(\sqrt{N}\epsilon^{-2/3})$ queries to a first order quantum oracle, showing that our dependence on $N$ is optimal up to poly-logarithmic factors.
Diffusion models are a remarkably effective way of learning and sampling from a distribution $p(x)$. In posterior sampling, one is also given a measurement model $p(y \mid x)$ and a measurement $y$, and would like to sample from $p(x \mid y)$. Posterior sampling is useful for tasks such as inpainting, super-resolution, and MRI reconstruction, so a number of recent works have given algorithms to heuristically approximate it; but none are known to converge to the correct distribution in polynomial time. In this paper we show that posterior sampling is \emph{computationally intractable}: under the most basic assumption in cryptography -- that one-way functions exist -- there are instances for which \emph{every} algorithm takes superpolynomial time, even though \emph{unconditional} sampling is provably fast. We also show that the exponential-time rejection sampling algorithm is essentially optimal under the stronger plausible assumption that there are one-way functions that take exponential time to invert.
The optimization problem associated to fitting two-layer ReLU networks having $d$~inputs, $k$~neurons, and labels generated by a target network, is considered. Two types of infinite families of spurious minima, giving one minimum per $d$, were recently found. The loss at minima belonging to the first type converges to zero as $d$ increases. In the second type, the loss remains bounded away from zero. That being so, how may one avoid minima belonging to the latter type? Fortunately, such minima are never detected by standard optimization methods. Motivated by questions concerning the nature of this phenomenon, we develop methods to study distinctive analytic properties of hidden minima. By existing analyses, the Hessian spectrum of both types agree modulo $O(d^{-1/2})$-terms -- not promising. Thus, rather, our investigation proceeds by studying curves along which the loss is minimized or maximized, generally referred to as tangency arcs. We prove that apparently far removed group representation-theoretic considerations concerning the arrangement of subspaces invariant to the action of subgroups of $S_d$, the symmetry group over $d$ symbols, relative to ones fixed by the action yield a precise description of all finitely many admissible types of tangency arcs. The general results used for the loss function reveal that arcs emanating from hidden minima differ, characteristically, by their structure and symmetry, precisely on account of the $O(d^{-1/2})$-eigenvalue terms absent in previous work, indicating in particular the subtlety of the analysis. The theoretical results, stated and proved for o-minimal structures, show that the set comprising all tangency arcs is topologically sufficiently tame to enable a numerical construction of tangency arcs and so compare how minima, both types, are positioned relative to adjacent critical points.
We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits. Our algorithms deliver near-optimal regret bounds in both the adversarial and stochastic regimes, without prior knowledge about the environment. In the stochastic regime, we achieve the polylogarithmic rate $\frac{(dK)^2\mathrm{poly}\log(dKT)}{\Delta_{\min}}$, where $\Delta_{\min}$ is the minimum suboptimality gap over the $d$-dimensional context space. In the adversarial regime, we obtain either the first-order $\widetilde{O}(dK\sqrt{L^*})$ bound, or the second-order $\widetilde{O}(dK\sqrt{\Lambda^*})$ bound, where $L^*$ is the cumulative loss of the best action and $\Lambda^*$ is a notion of the cumulative second moment for the losses incurred by the algorithm. Moreover, we develop an algorithm based on FTRL with Shannon entropy regularizer that does not require the knowledge of the inverse of the covariance matrix, and achieves a polylogarithmic regret in the stochastic regime while obtaining $\widetilde{O}\big(dK\sqrt{T}\big)$ regret bounds in the adversarial regime.
We characterize type isomorphisms in the multiplicative-additive fragment of linear logic (MALL), and thus in *-autonomous categories with finite products, extending a result for the multiplicative fragment by Balat and Di Cosmo. This yields a much richer equational theory involving distributivity and cancellation laws. The unit-free case is obtained by relying on the proof-net syntax introduced by Hughes and Van Glabbeek. We use the sequent calculus to extend our results to full MALL, including all units, thanks to a study of cut-elimination and rule commutations.
We construct explicit pseudorandom generators that fool $n$-variate polynomials of degree at most $d$ over a finite field $\mathbb{F}_q$. The seed length of our generators is $O(d \log n + \log q)$, over fields of size exponential in $d$ and characteristic at least $d(d-1)+1$. Previous constructions such as Bogdanov's (STOC 2005) and Derksen and Viola's (FOCS 2022) had either suboptimal seed length or required the field size to depend on $n$. Our approach follows Bogdanov's paradigm while incorporating techniques from Lecerf's factorization algorithm (J. Symb. Comput. 2007) and insights from the construction of Derksen and Viola regarding the role of indecomposability of polynomials.
A Bayesian nonparametric method of James, Lijoi \& Prunster (2009) used to predict future values of observations from normalized random measures with independent increments is modified to a class of models based on negative binomial processes for which the increments are not independent, but are independent conditional on an underlying gamma variable. Like in James et al., the new algorithm is formulated in terms of two variables, one a function of the past observations, and the other an updating by means of a new observation. We outline an application of the procedure to population genetics, for the construction of realisations of genealogical trees and coalescents from samples of alleles.
We consider composition orderings for linear functions of one variable. Given $n$ linear functions $f_1,\dots,f_n$ and a constant $c$, the objective is to find a permutation $\sigma$ that minimizes/maximizes $f_{\sigma(n)}\circ\dots\circ f_{\sigma(1)}(c)$. It was first studied in the area of time-dependent scheduling, and known to be solvable in $O(n\log n)$ time if all functions are nondecreasing. In this paper, we present a complete characterization of optimal composition orderings for this case, by regarding linear functions as two-dimensional vectors. We also show several interesting properties on optimal composition orderings such as the equivalence between local and global optimality. Furthermore, by using the characterization above, we provide a fixed-parameter tractable (FPT) algorithm for the composition ordering problem for general linear functions, with respect to the number of decreasing linear functions. We next deal with matrix multiplication orderings as a generalization of composition of linear functions. Given $n$ matrices $M_1,\dots,M_n\in\mathbb{R}^{m\times m}$ and two vectors $w,y\in\mathbb{R}^m$, where $m$ denotes a positive integer, the objective is to find a permutation $\sigma$ that minimizes/maximizes $w^\top M_{\sigma(n)}\dots M_{\sigma(1)} y$. The problem is also viewed as a generalization of flow shop scheduling through a limit. By this extension, we show that the multiplication ordering problem for $2\times 2$ matrices is solvable in $O(n\log n)$ time if all the matrices are simultaneously triangularizable and have nonnegative determinants, and FPT with respect to the number of matrices with negative determinants, if all the matrices are simultaneously triangularizable. As the negative side, we finally prove that three possible natural generalizations are NP-hard: 1) when $m=2$, 2) when $m\geq 3$, and 3) the target version of the problem.
We study the task of efficiently sampling from a Gibbs distribution $d \pi^* = e^{-h} d {vol}_g$ over a Riemannian manifold $M$ via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is efficiently implementable in practice. The key to our analysis of Langevin MCMC is a bound on the discretization error of the geometric Euler-Murayama scheme, assuming $\nabla h$ is Lipschitz and $M$ has bounded sectional curvature. Our error bound matches the error of Euclidean Euler-Murayama in terms of its stepsize dependence. Combined with a contraction guarantee for the geometric Langevin Diffusion under Kendall-Cranston coupling, we prove that the Langevin MCMC iterates lie within $\epsilon$-Wasserstein distance of $\pi^*$ after $\tilde{O}(\epsilon^{-2})$ steps, which matches the iteration complexity for Euclidean Langevin MCMC. Our results apply in general settings where $h$ can be nonconvex and $M$ can have negative Ricci curvature. Under additional assumptions that the Riemannian curvature tensor has bounded derivatives, and that $\pi^*$ satisfies a $CD(\cdot,\infty)$ condition, we analyze the stochastic gradient version of Langevin MCMC, and bound its iteration complexity by $\tilde{O}(\epsilon^{-2})$ as well.