Matrix sketching, aimed at approximating a matrix $\boldsymbol{A} \in \mathbb{R}^{N\times d}$ consisting of vector streams of length $N$ with a smaller sketching matrix $\boldsymbol{B} \in \mathbb{R}^{\ell\times d}, \ell \ll N$, has garnered increasing attention in fields such as large-scale data analytics and machine learning. A well-known deterministic matrix sketching method is the Frequent Directions algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound and provides a covariance error guarantee of $\varepsilon = \lVert \boldsymbol{A}^\top \boldsymbol{A} - \boldsymbol{B}^\top \boldsymbol{B} \rVert_2/\lVert \boldsymbol{A} \rVert_F^2$. The matrix sketching problem becomes particularly interesting in the context of sliding windows, where the goal is to approximate the matrix $\boldsymbol{A}_W$, formed by input vectors over the most recent $N$ time units. However, despite recent efforts, whether achieving the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound on sliding windows is possible has remained an open question. In this paper, we introduce the DS-FD algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound for matrix sketching over row-normalized, sequence-based sliding windows. We also present matching upper and lower space bounds for time-based and unnormalized sliding windows, demonstrating the generality and optimality of \dsfd across various sliding window models. This conclusively answers the open question regarding the optimal space bound for matrix sketching over sliding windows. Furthermore, we conduct extensive experiments with both synthetic and real-world datasets, validating our theoretical claims and thus confirming the correctness and effectiveness of our algorithm, both theoretically and empirically.
We consider temporal numeric planning problems $\Pi$ expressed in PDDL2.1 level 3, and show how to produce SMT formulas $(i)$ whose models correspond to valid plans of $\Pi$, and $(ii)$ that extend the recently proposed planning with patterns approach from the numeric to the temporal case. We prove the correctness and completeness of the approach and show that it performs very well on 10 domains with required concurrency.
We present {\em generative clustering} (GC) for clustering a set of documents, $\mathrm{X}$, by using texts $\mathrm{Y}$ generated by large language models (LLMs) instead of by clustering the original documents $\mathrm{X}$. Because LLMs provide probability distributions, the similarity between two documents can be rigorously defined in an information-theoretic manner by the KL divergence. We also propose a natural, novel clustering algorithm by using importance sampling. We show that GC achieves the state-of-the-art performance, outperforming any previous clustering method often by a large margin. Furthermore, we show an application to generative document retrieval in which documents are indexed via hierarchical clustering and our method improves the retrieval accuracy.
Central to rough path theory is the signature transform of a path, an infinite series of tensors given by the iterated integrals of the underlying path. The signature poses an effective way to capture sequentially ordered information, thanks both to its rich analytic and algebraic properties as well as its universality when used as a basis to approximate functions on path space. Whilst a truncated version of the signature can be efficiently computed using Chen's identity, there is a lack of efficient methods for computing a sparse collection of iterated integrals contained in high levels of the signature. We address this problem by leveraging signature kernels, defined as the inner product of two signatures, and computable efficiently by means of PDE-based methods. By forming a filter in signature space with which to take kernels, one can effectively isolate specific groups of signature coefficients and, in particular, a singular coefficient at any depth of the transform. We show that such a filter can be expressed as a linear combination of suitable signature transforms and demonstrate empirically the effectiveness of our approach. To conclude, we give an example use case for sparse collections of signature coefficients based on the construction of N-step Euler schemes for sparse CDEs.
We introduce a novel approach for detecting distribution shifts that negatively impact the performance of machine learning models in continuous production environments, which requires no access to ground truth data labels. It builds upon the work of Podkopaev and Ramdas [2022], who address scenarios where labels are available for tracking model errors over time. Our solution extends this framework to work in the absence of labels, by employing a proxy for the true error. This proxy is derived using the predictions of a trained error estimator. Experiments show that our method has high power and false alarm control under various distribution shifts, including covariate and label shifts and natural shifts over geography and time.
We study frequency domain electromagnetic scattering at a bounded, penetrable, and inhomogeneous obstacle $ \Omega \subset \mathbb{R}^3 $. From the Stratton-Chu integral representation, we derive a new representation formula when constant reference coefficients are given for the interior domain. The resulting integral representation contains the usual layer potentials, but also volume potentials on $\Omega$. Then it is possible to follow a single-trace approach to obtain boundary integral equations perturbed by traces of compact volume integral operators with weakly singular kernels. The coupled boundary and volume integral equations are discretized with a Galerkin approach with usual Curl-conforming and Div-conforming finite elements on the boundary and in the volume. Compression techniques and special quadrature rules for singular integrands are required for an efficient and accurate method. Numerical experiments provide evidence that our new formulation enjoys promising properties.
We consider the problem of releasing a sparse histogram under $(\varepsilon, \delta)$-differential privacy. The stability histogram independently adds noise from a Laplace or Gaussian distribution to the non-zero entries and removes those noisy counts below a threshold. Thereby, the introduction of new non-zero values between neighboring histograms is only revealed with probability at most $\delta$, and typically, the value of the threshold dominates the error of the mechanism. We consider the variant of the stability histogram with Gaussian noise. Recent works ([Joseph and Yu, COLT '24] and [Lebeda, SOSA '25]) reduced the error for private histograms using correlated Gaussian noise. However, these techniques can not be directly applied in the very sparse setting. Instead, we adopt Lebeda's technique and show that adding correlated noise to the non-zero counts only allows us to reduce the magnitude of noise when we have a sparsity bound. This, in turn, allows us to use a lower threshold by up to a factor of $1/2$ compared to the non-correlated noise mechanism. We then extend our mechanism to a setting without a known bound on sparsity. Additionally, we show that correlated noise can give a similar improvement for the more practical discrete Gaussian mechanism.
Let $P$ be a set of $n$ points in $\mathbb{R}^d$, and let $\varepsilon,\psi \in (0,1)$ be parameters. Here, we consider the task of constructing a $(1+\varepsilon)$-spanner for $P$, where every edge might fail (independently) with probability $1-\psi$. For example, for $\psi=0.1$, about $90\%$ of the edges of the graph fail. Nevertheless, we show how to construct a spanner that survives such a catastrophe with near linear number of edges. The measure of reliability of the graph constructed is how many pairs of vertices lose $(1+\varepsilon)$-connectivity. Surprisingly, despite the spanner constructed being of near linear size, the number of failed pairs is close to the number of failed pairs if the underlying graph was a clique. Specifically, we show how to construct such an exact dependable spanner in one dimension of size $O(\tfrac{n}{\psi} \log n)$, which is optimal. Next, we build an $(1+\varepsilon)$-spanners for a set $P \subseteq \mathbb{R}^d$ of $n$ points, of size $O( C n \log n )$, where $C \approx 1/\bigl(\varepsilon^{d} \psi^{4/3}\bigr)$. Surprisingly, these new spanners also have the property that almost all pairs of vertices have a $\leq 4$-hop paths between them realizing this short path.
Noisy matrix completion has attracted significant attention due to its applications in recommendation systems, signal processing and image restoration. Most existing works rely on (weighted) least squares methods under various low-rank constraints. However, minimizing the sum of squared residuals is not always efficient, as it may ignore the potential structural information in the residuals.In this study, we propose a novel residual spectral matching criterion that incorporates not only the numerical but also locational information of residuals. This criterion is the first in noisy matrix completion to adopt the perspective of low-rank perturbation of random matrices and exploit the spectral properties of sparse random matrices. We derive optimal statistical properties by analyzing the spectral properties of sparse random matrices and bounding the effects of low-rank perturbations and partial observations. Additionally, we propose algorithms that efficiently approximate solutions by constructing easily computable pseudo-gradients. The iterative process of the proposed algorithms ensures convergence at a rate consistent with the optimal statistical error bound. Our method and algorithms demonstrate improved numerical performance in both simulated and real data examples, particularly in environments with high noise levels.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.