Union volume estimation is a classical algorithmic problem. Given a family of objects $O_1,\ldots,O_n \subseteq \mathbb{R}^d$, we want to approximate the volume of their union. In the special case where all objects are boxes (also known as hyperrectangles) this is known as Klee's measure problem. The state-of-the-art algorithm [Karp, Luby, Madras '89] for union volume estimation and Klee's measure problem in constant dimension $d$ computes a $(1+\varepsilon)$-approximation with constant success probability by using a total of $O(n/\varepsilon^2)$ queries of the form (i) ask for the volume of $O_i$, (ii) sample a point uniformly at random from $O_i$, and (iii) query whether a given point is contained in $O_i$. We show that if one can only interact with the objects via the aforementioned three queries, the query complexity of [Karp, Luby, Madras '89] is indeed optimal, i.e., $\Omega(n/\varepsilon^2)$ queries are necessary. Our lower bound already holds for estimating the union of equiponderous axis-aligned polygons in $\mathbb{R}^2$, and even if the algorithm is allowed to inspect the coordinates of the points sampled from the polygons, and still holds when a containment query can ask containment of an arbitrary (not sampled) point. Guided by the insights of the lower bound, we provide a more efficient approximation algorithm for Klee's measure problem improving the $O(n/\varepsilon^2)$ time to $O((n+\frac{1}{\varepsilon^2}) \cdot \log^{O(d)}n)$. We achieve this improvement by exploiting the geometry of Klee's measure problem in various ways: (1) Since we have access to the boxes' coordinates, we can split the boxes into classes of boxes of similar shape. (2) Within each class, we show how to sample from the union of all boxes, by using orthogonal range searching. And (3) we exploit that boxes of different classes have small intersection, for most pairs of classes.
In our work, we consider the problem of computing a vector $x \in Z^n$ of minimum $\|\cdot\|_p$-norm such that $a^\top x \not= a_0$, for any vector $(a,a_0)$ from a given subset of $Z^n$ of size $m$. In other words, we search for a vector of minimum norm that avoids a given finite set of hyperplanes, which is natural to call as the $\textit{Hyperplanes Avoiding Problem}$. This problem naturally appears as a subproblem in Barvinok-type algorithms for counting integer points in polyhedra. More precisely, it appears when one needs to evaluate certain rational generating functions in an avoidable critical point. We show that: 1) With respect to $\|\cdot\|_1$, the problem admits a feasible solution $x$ with $\|x\|_1 \leq (m+n)/2$, and show that such solution can be constructed by a deterministic polynomial-time algorithm with $O(n \cdot m)$ operations. Moreover, this inequality is the best possible. This is a significant improvement over the previous randomized algorithm, which computes $x$ with a guaranty $\|x\|_{1} \leq n \cdot m$. The original approach of A.~Barvinok can guarantee only $\|x\|_1 = O\bigl((n \cdot m)^n\bigr)$; 2) The problem is NP-hard with respect to any norm $\|\cdot\|_p$, for $p \in \bigl(R_{\geq 1} \cup \{\infty\}\bigr)$. 3) As an application, we show that the problem to count integer points in a polytope $P = \{x \in R^n \colon A x \leq b\}$, for given $A \in Z^{m \times n}$ and $b \in Q^m$, can be solved by an algorithm with $O\bigl(\nu^2 \cdot n^3 \cdot \Delta^3 \bigr)$ operations, where $\nu$ is the maximum size of a normal fan triangulation of $P$, and $\Delta$ is the maximum value of rank-order subdeterminants of $A$. It refines the previous state-of-the-art $O\bigl(\nu^2 \cdot n^4 \cdot \Delta^3\bigr)$-time algorithm.
We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed $\mathtt{U\text{-}Hop}$, with enhanced memory capacity. Our key contribution is a learnable feature map $\Phi$ which transforms the Hopfield energy function into kernel space. This transformation ensures convergence between the local minima of energy and the fixed points of retrieval dynamics within the kernel space. Consequently, the kernel norm induced by $\Phi$ serves as a novel similarity measure. It utilizes the stored memory patterns as learning data to enhance memory capacity across all modern Hopfield models. Specifically, we accomplish this by constructing a separation loss $\mathcal{L}_\Phi$ that separates the local minima of kernelized energy by separating stored memory patterns in kernel space. Methodologically, $\mathtt{U\text{-}Hop}$ memory retrieval process consists of: (Stage I) minimizing separation loss for a more uniform memory (local minimum) distribution, followed by (Stage II) standard Hopfield energy minimization for memory retrieval. This results in a significant reduction of possible metastable states in the Hopfield energy function, thus enhancing memory capacity by preventing memory confusion. Empirically, with real-world datasets, we demonstrate that $\mathtt{U\text{-}Hop}$ outperforms all existing modern Hopfield models and state-of-the-art similarity measures, achieving substantial improvements in both associative memory retrieval and deep learning tasks. Code is available at //github.com/MAGICS-LAB/UHop ; future updates are on arXiv:2404.03827
We first establish the unique ergodicity of the stochastic theta method (STM) with $\theta \in [1/2, 1]$ for monotone SODEs, without growth restriction on the coefficients, driven by nondegenerate multiplicative noise. The main ingredient of the arguments lies in constructing new Lyapunov functions involving the coefficients, the stepsize, and $\theta$ and deriving a minorization condition for the STM. We then generalize the arguments to the Galerkin-based full discretizations for a class of monotone SPDEs driven by infinite-dimensional nondegenerate multiplicative trace-class noise. Applying these results to the stochastic Allen--Cahn equation indicates that its Galerkin-based full discretizations are uniquely ergodic for any interface thickness. Numerical experiments verify our theoretical results.
Oja's algorithm for Streaming Principal Component Analysis (PCA) for $n$ data-points in a $d$ dimensional space achieves the same sin-squared error $O(r_{\mathsf{eff}}/n)$ as the offline algorithm in $O(d)$ space and $O(nd)$ time and a single pass through the datapoints. Here $r_{\mathsf{eff}}$ is the effective rank (ratio of the trace and the principal eigenvalue of the population covariance matrix $\Sigma$). Under this computational budget, we consider the problem of sparse PCA, where the principal eigenvector of $\Sigma$ is $s$-sparse, and $r_{\mathsf{eff}}$ can be large. In this setting, to our knowledge, \textit{there are no known single-pass algorithms} that achieve the minimax error bound in $O(d)$ space and $O(nd)$ time without either requiring strong initialization conditions or assuming further structure (e.g., spiked) of the covariance matrix. We show that a simple single-pass procedure that thresholds the output of Oja's algorithm (the Oja vector) can achieve the minimax error bound under some regularity conditions in $O(d)$ space and $O(nd)$ time. We present a nontrivial and novel analysis of the entries of the unnormalized Oja vector, which involves the projection of a product of independent random matrices on a random initial vector. This is completely different from previous analyses of Oja's algorithm and matrix products, which have been done when the $r_{\mathsf{eff}}$ is bounded.
Optimization over the set of matrices $X$ that satisfy $X^\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods that require a fully formed $B$. We propose a cheap stochastic iterative method that solves the optimization problem while having access only to random estimates of $B$. Our method does not enforce the constraint in every iteration; instead, it produces iterations that converge to critical points on the generalized Stiefel manifold defined in expectation. The method has lower per-iteration cost, requires only matrix multiplications, and has the same convergence rates as its Riemannian optimization counterparts that require the full matrix $B$. Experiments demonstrate its effectiveness in various machine learning applications involving generalized orthogonality constraints, including CCA, ICA, and the GEVP.
A single-index model (SIM) is a function of the form $\sigma(\mathbf{w}^{\ast} \cdot \mathbf{x})$, where $\sigma: \mathbb{R} \to \mathbb{R}$ is a known link function and $\mathbf{w}^{\ast}$ is a hidden unit vector. We study the task of learning SIMs in the agnostic (a.k.a. adversarial label noise) model with respect to the $L^2_2$-loss under the Gaussian distribution. Our main result is a sample and computationally efficient agnostic proper learner that attains $L^2_2$-error of $O(\mathrm{OPT})+\epsilon$, where $\mathrm{OPT}$ is the optimal loss. The sample complexity of our algorithm is $\tilde{O}(d^{\lceil k^{\ast}/2\rceil}+d/\epsilon)$, where $k^{\ast}$ is the information-exponent of $\sigma$ corresponding to the degree of its first non-zero Hermite coefficient. This sample bound nearly matches known CSQ lower bounds, even in the realizable setting. Prior algorithmic work in this setting had focused on learning in the realizable case or in the presence of semi-random noise. Prior computationally efficient robust learners required significantly stronger assumptions on the link function.
We give a comprehensive description of Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals $\mathcal F_\nu := \text{MMD}_K^2(\cdot, \nu)$ towards given target measures $\nu$ on the real line, where we focus on the negative distance kernel $K(x,y) := -|x-y|$. In one dimension, the Wasserstein-2 space can be isometrically embedded into the cone $\mathcal C(0,1) \subset L_2(0,1)$ of quantile functions leading to a characterization of Wasserstein gradient flows via the solution of an associated Cauchy problem on $L_2(0,1)$. Based on the construction of an appropriate counterpart of $\mathcal F_\nu$ on $L_2(0,1)$ and its subdifferential, we provide a solution of the Cauchy problem. For discrete target measures $\nu$, this results in a piecewise linear solution formula. We prove invariance and smoothing properties of the flow on subsets of $\mathcal C(0,1)$. For certain $\mathcal F_\nu$-flows this implies that initial point measures instantly become absolutely continuous, and stay so over time. Finally, we illustrate the behavior of the flow by various numerical examples using an implicit Euler scheme, which is easily computable by a bisection algorithm. For continuous targets $\nu$, also the explicit Euler scheme can be employed, although with limited convergence guarantees.
We study the query complexity of the metric Steiner Tree problem, where we are given an $n \times n$ metric on a set $V$ of vertices along with a set $T \subseteq V$ of $k$ terminals, and the goal is to find a tree of minimum cost that contains all terminals in $T$. The query complexity for the related minimum spanning tree (MST) problem is well-understood: for any fixed $\varepsilon > 0$, one can estimate the MST cost to within a $(1+\varepsilon)$-factor using only $\tilde{O}(n)$ queries, and this is known to be tight. This implies that a $(2 + \varepsilon)$-approximate estimate of Steiner Tree cost can be obtained with $\tilde{O}(k)$ queries by simply applying the MST cost estimation algorithm on the metric induced by the terminals. Our first result shows that any (randomized) algorithm that estimates the Steiner Tree cost to within a $(5/3 - \varepsilon)$-factor requires $\Omega(n^2)$ queries, even if $k$ is a constant. This lower bound is in sharp contrast to an upper bound of $O(nk)$ queries for computing a $(5/3)$-approximate Steiner Tree, which follows from previous work by Du and Zelikovsky. Our second main result, and the main technical contribution of this work, is a sublinear query algorithm for estimating the Steiner Tree cost to within a strictly better-than-$2$ factor, with query complexity $\tilde{O}(n^{12/7} + n^{6/7}\cdot k)=\tilde{O}(n^{13/7})=o(n^2)$. We complement this result by showing an $\tilde{\Omega}(n + k^{6/5})$ query lower bound for any algorithm that estimates Steiner Tree cost to a strictly better than $2$ factor. Thus $\tilde{\Omega}(n^{6/5})$ queries are needed to just beat $2$-approximation when $k = \Omega(n)$; a sharp contrast to MST cost estimation where a $(1+o(1))$-approximate estimate of cost is achievable with only $\tilde{O}(n)$ queries.
The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ where $ \{ \Phi_n \}$ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3): {(i)} An appropriate Lyapunov function is constructed that implies convergence of the estimates in $L_4$. {(ii)} A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $\textsf{E} [ z_n z_n^T ]$ to the asymptotic covariance in the CLT, where $z_n{=:} (\theta_n-\theta^*)/\sqrt{\alpha_n}$. {(iii)} The CLT holds for the normalized version $z^{\text{PR}}_n{=:} \sqrt{n} [\theta^{\text{PR}}_n -\theta^*]$, of the averaged parameters $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$, subject to standard assumptions on the step-size. Moreover, the covariance in the CLT coincides with the minimal covariance of Polyak and Ruppert. {(iv)} An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and $\Phi$ is a geometrically ergodic Markov chain but does not satisfy (DV3). While the algorithm is convergent, the second moment of $\theta_n$ is unbounded and in fact diverges. {\bf This arXiv version 3 represents a major extension of the results in prior versions.} The main results now allow for parameter-dependent noise, as is often the case in applications to reinforcement learning.
The Euclidean algorithm is the oldest algorithms known to mankind. Given two integral numbers $a_1$ and $a_2$, it computes the greatest common divisor (gcd) of $a_1$ and $a_2$ in a very elegant way. From a lattice perspective, it computes a basis of the sum of two one-dimensional lattices $a_1 \mathbb{Z}$ and $a_2 \mathbb{Z}$ as $\gcd(a_1,a_2) \mathbb{Z} = a_1 \mathbb{Z} + a_2 \mathbb{Z}$. In this paper, we show that the classical Euclidean algorithm can be adapted in a very natural way to compute a basis of a general lattice $L(A_1, \ldots , A_n)$ given vectors $A_1, \ldots , A_n \in \mathbb{Z}^d$ with $n> \mathrm{rank}(a_1, \ldots ,a_d)$. Similar to the Euclidean algorithm, our algorithm is very easy to describe and implement and can be written within 12 lines of pseudocode. As our main result, we obtain an algorithm to compute a lattice basis for given vectors $A_1, \ldots , A_n \in \mathbb{Z}^d$ in time (counting bit operations) $LS + \tilde{O}((n-d)d^2 \cdot \log(||A||)$, where $LS$ is the time required to obtain the exact fractional solution of a certain system of linear equalities. The analysis of the running time of our algorithms relies on fundamental statements on the fractionality of solutions of linear systems of equations. So far, the fastest algorithm for lattice basis computation was due to Storjohann and Labhan [SL96] having a running time of $\tilde{O}(nd^\omega\log ||A||)$. For current upper bounds of $LS$, our algorithm has a running time improvement of a factor of at least $d^{0.12}$ over [SL96]. Our algorithm is therefore the first general algorithmic improvement to this classical problem in nearly 30 years. At last, we present a postprocessing procedure which yields an improved size bound of $\sqrt{d} ||A||$ for vectors of the resulting basis matrix.