We study the $k$-th nearest neighbor distance function from a finite point-set in $\mathbb{R}^d$. We provide a Morse theoretic framework to analyze the sub-level set topology. In particular, we present a simple combinatorial-geometric characterization for critical points and their indices, along with detailed information about the possible changes in homology at the critical levels. We conclude by computing the expected number of critical points for a homogeneous Poisson process. Our results deliver significant insights and tools for the analysis of persistent homology in order-$k$ Delaunay mosaics, and random $k$-fold coverage.
Let $S$ be a set of $n$ sites in the plane, so that every site $s \in S$ has an associated radius $r_s > 0$. Let $\mathcal{D}(S)$ be the disk intersection graph defined by $S$, i.e., the graph with vertex set $S$ and an edge between two distinct sites $s, t \in S$ if and only if the disks with centers $s$, $t$ and radii $r_s$, $r_t$ intersect.Our goal is to design data structures that maintain the connectivity structure of $\mathcal{D}(S)$ as sites are inserted and/or deleted in $S$.
Recently, \citeauthor*{akbari2021locality}~(ICALP 2023) studied the locality of graph problems in distributed, sequential, dynamic, and online settings from a {unified} point of view. They designed a novel $O(\log n)$-locality deterministic algorithm for proper 3-coloring bipartite graphs in the $\mathsf{Online}$-$\mathsf{LOCAL}$ model. In this work, we establish the optimality of the algorithm by showing a \textit{tight} deterministic $\Omega(\log n)$ locality lower bound, which holds even on grids. To complement this result, we have the following additional results: \begin{enumerate} \item We show a higher and {tight} $\Omega(\sqrt{n})$ lower bound for 3-coloring toroidal and cylindrical grids. \item Considering the generalization of $3$-coloring bipartite graphs to $(k+1)$-coloring $k$-partite graphs, %where $k \geq 2$ is a constant, we show that the problem also has $O(\log n)$ locality when the input is a $k$-partite graph that admits a \emph{locally inferable unique coloring}. This special class of $k$-partite graphs covers several fundamental graph classes such as $k$-trees and triangular grids. Moreover, for this special class of graphs, we show a {tight} $\Omega(\log n)$ locality lower bound. \item For general $k$-partite graphs with $k \geq 3$, we prove that the problem of $(2k-2)$-coloring $k$-partite graphs exhibits a locality of $\Omega(n)$ in the $\onlineLOCAL$ model, matching the round complexity of the same problem in the $\LOCAL$ model recently shown by \citeauthor*{coiteux2023no}~(STOC 2024). Consequently, the problem of $(k+1)$-coloring $k$-partite graphs admits a locality lower bound of $\Omega(n)$ when $k\geq 3$, contrasting sharply with the $\Theta(\log n)$ locality for the case of $k=2$. \end{enumerate}
Posterior sampling, i.e., exponential mechanism to sample from the posterior distribution, provides $\varepsilon$-pure differential privacy (DP) guarantees and does not suffer from potentially unbounded privacy breach introduced by $(\varepsilon,\delta)$-approximate DP. In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo (MCMC), thus re-introducing the unappealing $\delta$-approximation error into the privacy guarantees. To bridge this gap, we propose the Approximate SAample Perturbation (abbr. ASAP) algorithm which perturbs an MCMC sample with noise proportional to its Wasserstein-infinity ($W_\infty$) distance from a reference distribution that satisfies pure DP or pure Gaussian DP (i.e., $\delta=0$). We then leverage a Metropolis-Hastings algorithm to generate the sample and prove that the algorithm converges in $W_\infty$ distance. We show that by combining our new techniques with a localization step, we obtain the first nearly linear-time algorithm that achieves the optimal rates in the DP-ERM problem with strongly convex and smooth losses.
We study the complexity of a fundamental algorithm for fairly allocating indivisible items, the round-robin algorithm. For $n$ agents and $m$ items, we show that the algorithm can be implemented in time $O(nm\log(m/n))$ in the worst case. If the agents' preferences are uniformly random, we establish an improved (expected) running time of $O(nm + m\log m)$. On the other hand, assuming comparison queries between items, we prove that $\Omega(nm + m\log m)$ queries are necessary to implement the algorithm, even when randomization is allowed. We also derive bounds in noise models where the answers to queries are incorrect with some probability. Our proofs involve novel applications of tools from multi-armed bandit, information theory, as well as posets and linear extensions.
In this paper, the key objects of interest are the sequential covariance matrices $\mathbf{S}_{n,t}$ and their largest eigenvalues. Here, the matrix $\mathbf{S}_{n,t}$ is computed as the empirical covariance associated with observations $\{\mathbf{x}_1,\ldots,\mathbf{x}_{ \lfloor nt \rfloor } \}$, for $t\in [0,1]$. The observations $\mathbf{x}_1,\ldots,\mathbf{x}_n$ are assumed to be i.i.d. $p$-dimensional vectors with zero mean, and a covariance matrix that is a fixed-rank perturbation of the identity matrix. Treating $\{ \mathbf{S}_{n,t}\}_{t \in [0,1]}$ as a matrix-valued stochastic process indexed by $t$, we study the behavior of the largest eigenvalues of $\mathbf{S}_{n,t}$, as $t$ varies, with $n$ and $p$ increasing simultaneously, so that $p/n \to y \in (0,1)$. As a key contribution of this work, we establish the weak convergence of the stochastic process corresponding to the sample spiked eigenvalues, if their population counterparts exceed the critical phase-transition threshold. Our analysis of the limiting process is fully comprehensive revealing, in general, non-Gaussian limiting processes. As an application, we consider a class of change-point problems, where the interest is in detecting structural breaks in the covariance caused by a change in magnitude of the spiked eigenvalues. For this purpose, we propose two different maximal statistics corresponding to centered spiked eigenvalues of the sequential covariances. We show the existence of limiting null distributions for these statistics, and prove consistency of the test under fixed alternatives. Moreover, we compare the behavior of the proposed tests through a simulation study.
We study a fundamental problem in Computational Geometry, the planar two-center problem. In this problem, the input is a set $S$ of $n$ points in the plane and the goal is to find two smallest congruent disks whose union contains all points of $S$. A longstanding open problem has been to obtain an $O(n\log n)$-time algorithm for planar two-center, matching the $\Omega(n\log n)$ lower bound given by Eppstein [SODA'97]. Towards this, researchers have made a lot of efforts over decades. The previous best algorithm, given by Wang [SoCG'20], solves the problem in $O(n\log^2 n)$ time. In this paper, we present an $O(n\log n)$-time (deterministic) algorithm for planar two-center, which completely resolves this open problem.
We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $\mathbb{R}^n$) to TV error $\varepsilon$, with quasi-polynomial ($O(n^{\text{poly log}\left(\frac{n+k}{\varepsilon}\right)})$) time and sample complexity, under a minimum weight assumption. Unlike previous approaches, most of which are algebraic in nature, our approach is analytic and relies on the framework of diffusion models. Diffusion models are a modern paradigm for generative modeling, which typically rely on learning the score function (gradient log-pdf) along a process transforming a pure noise distribution, in our case a Gaussian, to the data distribution. Despite their dazzling performance in tasks such as image generation, there are few end-to-end theoretical guarantees that they can efficiently learn nontrivial families of distributions; we give some of the first such guarantees. We proceed by deriving higher-order Gaussian noise sensitivity bounds for the score functions for a Gaussian mixture to show that that they can be inductively learned using piecewise polynomial regression (up to poly-logarithmic degree), and combine this with known convergence results for diffusion models. Our results extend to continuous mixtures of Gaussians where the mixing distribution is supported on a union of $k$ balls of constant radius. In particular, this applies to the case of Gaussian convolutions of distributions on low-dimensional manifolds, or more generally sets with small covering number.
Exponential histograms, with bins of the form $\left\{ \left(\rho^{k-1},\rho^{k}\right]\right\} _{k\in\mathbb{Z}}$, for $\rho>1$, straightforwardly summarize the quantiles of streaming data sets (Masson et al. 2019). While they guarantee the relative accuracy of their estimates, they appear to use only $\log n$ values to summarize $n$ inputs. We study four aspects of exponential histograms -- size, accuracy, occupancy, and largest gap size -- when inputs are i.i.d. $\mathrm{Exp}\left(\lambda\right)$ or i.i.d. $\mathrm{Pareto}\left(\nu,\beta\right)$, taking $\mathrm{Exp}\left(\lambda\right)$ (or, $\mathrm{Pareto}\left(\nu,\beta\right)$) to represent all light- (or, heavy-) tailed distributions. We show that, in these settings, size grows like $\log n$ and takes on a Gumbel distribution as $n$ grows large. We bound the missing mass to the right of the histogram and the mass of its final bin and show that occupancy grows apace with size. Finally, we approximate the size of the largest number of consecutive, empty bins. Our study gives a deeper and broader view of this low-memory approach to quantile estimation.
We consider the task of locally correcting, and locally list-correcting, multivariate linear functions over the domain $\{0,1\}^n$ over arbitrary fields and more generally Abelian groups. Such functions form error-correcting codes of relative distance $1/2$ and we give local-correction algorithms correcting up to nearly $1/4$-fraction errors making $\widetilde{\mathcal{O}}(\log n)$ queries. This query complexity is optimal up to $\mathrm{poly}(\log\log n)$ factors. We also give local list-correcting algorithms correcting $(1/2 - \varepsilon)$-fraction errors with $\widetilde{\mathcal{O}}_{\varepsilon}(\log n)$ queries. These results may be viewed as natural generalizations of the classical work of Goldreich and Levin whose work addresses the special case where the underlying group is $\mathbb{Z}_2$. By extending to the case where the underlying group is, say, the reals, we give the first non-trivial locally correctable codes (LCCs) over the reals (with query complexity being sublinear in the dimension (also known as message length)). The central challenge in constructing the local corrector is constructing "nearly balanced vectors" over $\{-1,1\}^n$ that span $1^n$ -- we show how to construct $\mathcal{O}(\log n)$ vectors that do so, with entries in each vector summing to $\pm1$. The challenge to the local-list-correction algorithms, given the local corrector, is principally combinatorial, i.e., in proving that the number of linear functions within any Hamming ball of radius $(1/2-\varepsilon)$ is $\mathcal{O}_{\varepsilon}(1)$. Getting this general result covering every Abelian group requires integrating a variety of known methods with some new combinatorial ingredients analyzing the structural properties of codewords that lie within small Hamming balls.
We propose an exact algorithm for the Graph Burning Problem ($\texttt{GBP}$), an NP-hard optimization problem that models the spread of influence on social networks. Given a graph $G$ with vertex set $V$, the objective is to find a sequence of $k$ vertices in $V$, namely, $v_1, v_2, \dots, v_k$, such that $k$ is minimum and $\bigcup_{i = 1}^{k} \{u\! \in\! V\! : d(u, v_i) \leq k - i\} = V$, where $d(u,v)$ denotes the distance between $u$ and $v$. We formulate the problem as a set covering integer programming model and design a row generation algorithm for the $\texttt{GBP}$. Our method exploits the fact that a very small number of covering constraints is often sufficient for solving the integer model, allowing the corresponding rows to be generated on demand. To date, the most efficient exact algorithm for the $\texttt{GBP}$, denoted here by $\texttt{GDCA}$, is able to obtain optimal solutions for graphs with up to 14,000 vertices within two hours of execution. In comparison, our algorithm finds provably optimal solutions approximately 236 times faster, on average, than $\texttt{GDCA}$. For larger graphs, memory space becomes a limiting factor for $\texttt{GDCA}$. Our algorithm, however, solves real-world instances with almost 200,000 vertices in less than 35 seconds, increasing the size of graphs for which optimal solutions are known by a factor of 14.