For a set of points in $\mathbb{R}^d$, the Euclidean $k$-means problems consists of finding $k$ centers such that the sum of distances squared from each data point to its closest center is minimized. Coresets are one the main tools developed recently to solve this problem in a big data context. They allow to compress the initial dataset while preserving its structure: running any algorithm on the coreset provides a guarantee almost equivalent to running it on the full data. In this work, we study coresets in a fully-dynamic setting: points are added and deleted with the goal to efficiently maintain a coreset with which a k-means solution can be computed. Based on an algorithm from Henzinger and Kale [ESA'20], we present an efficient and practical implementation of a fully dynamic coreset algorithm, that improves the running time by up to a factor of 20 compared to our non-optimized implementation of the algorithm by Henzinger and Kale, without sacrificing more than 7% on the quality of the k-means solution.
We study various aspects of Dyck words appearing in binary sequences, where $0$ is treated as a left parenthesis and $1$ as a right parenthesis. We show that binary words that are $7/3$-power-free have bounded nesting level, but this no longer holds for larger repetition exponents. We give an explicit characterization of the factors of the Thue-Morse word that are Dyck, and show how to count them. We also prove tight upper and lower bounds on $f(n)$, the number of Dyck factors of Thue-Morse of length $2n$.
We provide an algorithm that maintains, against an adaptive adversary, a $(1-\varepsilon)$-approximate maximum matching in $n$-node $m$-edge general (not necessarily bipartite) undirected graph undergoing edge deletions with high probability with (amortized) $O(\mathrm{poly}(\varepsilon^{-1}, \log n))$ time per update. We also obtain the same update time for maintaining a fractional approximate weighted matching (and hence an approximation to the value of the maximum weight matching) and an integral approximate weighted matching in dense graphs. Our unweighted result improves upon the prior state-of-the-art which includes a $\mathrm{poly}(\log{n}) \cdot 2^{O(1/\varepsilon^2)}$ update time [Assadi-Bernstein-Dudeja 2022] and an $O(\sqrt{m} \varepsilon^{-2})$ update time [Gupta-Peng 2013], and our weighted result improves upon the $O(\sqrt{m}\varepsilon^{-O(1/\varepsilon)}\log{n})$ update time due to [Gupta-Peng 2013]. To obtain our results, we generalize a recent optimization approach to dynamic algorithms from [Jambulapati-Jin-Sidford-Tian 2022]. We show that repeatedly solving entropy-regularized optimization problems yields a lazy updating scheme for fractional decremental problems with a near-optimal number of updates. To apply this framework we develop optimization methods compatible with it and new dynamic rounding algorithms for the matching polytope.
In this paper, we consider the problem of maintaining a $(1-\varepsilon)$-approximate maximum weight matching in a dynamic graph $G$, while the adversary makes changes to the edges of the graph. In the fully dynamic setting, where both edge insertions and deletions are allowed, Gupta and Peng gave an algorithm for this problem with an update time of $\tilde{O}_{\varepsilon}(\sqrt{m})$. We study a natural relaxation of this problem, namely the decremental model, where the adversary is only allowed to delete edges. For the cardinality version of this problem in general (possibly, non-bipartite) graphs, Assadi, Bernstein, and Dudeja gave a decremental algorithm with update time $O_{\varepsilon}(\text{poly}(\log n))$. However, beating $\tilde{O}_{\varepsilon}(\sqrt{m})$ update time remained an open problem for the \emph{weighted} version in \emph{general graphs}. In this paper, we bridge the gap between unweighted and weighted general graphs for the decremental setting. We give a $O_{\varepsilon}(\text{poly}(\log n))$ update time algorithm that maintains a $(1-\varepsilon)$-approximate maximum weight matching under adversarial deletions. Like the decremental algorithm of Assadi, Bernstein, and Dudeja, our algorithm is randomized, but works against an adaptive adversary. It also matches the time bound for the cardinality version upto dependencies on $\varepsilon$ and a $\log R$ factor, where $R$ is the ratio between the maximum and minimum edge weight in $G$.
We study the seeded domino problem, the recurring domino problem and the $k$-SAT problem on finitely generated groups. These problems are generalization of their original versions on $\mathbb{Z}^2$ that were shown to be undecidable using the domino problem. We show that the seeded and recurring domino problems on a group are invariant under changes in the generating set, are many-one reduced from the respective problems on subgroups, and are positive equivalent to the problems on finite index subgroups. This leads to showing that the recurring domino problem is decidable for free groups. Coupled with the invariance properties, we conjecture that the only groups in which the seeded and recurring domino problems are decidable are virtually free groups. In the case of the $k$-SAT problem, we introduce a new generalization that is compatible with decision problems on finitely generated groups. We show that the subgroup membership problem many-one reduces to the $2$-SAT problem, that in certain cases the $k$-SAT problem many one reduces to the domino problem, and finally that the domino problem reduces to $3$-SAT for the class of scalable groups.
We give a stochastic optimization algorithm that solves a dense $n\times n$ real-valued linear system $Ax=b$, returning $\tilde x$ such that $\|A\tilde x-b\|\leq \epsilon\|b\|$ in time: $$\tilde O((n^2+nk^{\omega-1})\log1/\epsilon),$$ where $k$ is the number of singular values of $A$ larger than $O(1)$ times its smallest positive singular value, $\omega < 2.372$ is the matrix multiplication exponent, and $\tilde O$ hides a poly-logarithmic in $n$ factor. When $k=O(n^{1-\theta})$ (namely, $A$ has a flat-tailed spectrum, e.g., due to noisy data or regularization), this improves on both the cost of solving the system directly, as well as on the cost of preconditioning an iterative method such as conjugate gradient. In particular, our algorithm has an $\tilde O(n^2)$ runtime when $k=O(n^{0.729})$. We further adapt this result to sparse positive semidefinite matrices and least squares regression. Our main algorithm can be viewed as a randomized block coordinate descent method, where the key challenge is simultaneously ensuring good convergence and fast per-iteration time. In our analysis, we use theory of majorization for elementary symmetric polynomials to establish a sharp convergence guarantee when coordinate blocks are sampled using a determinantal point process. We then use a Markov chain coupling argument to show that similar convergence can be attained with a cheaper sampling scheme, and accelerate the block coordinate descent update via matrix sketching.
We describe a $\frac{4}{3}$-approximation algorithm for the traveling salesman problem in which the distances between points are induced by graph-theoretical distances in an unweighted graph. The algorithm is based on finding a minimum cost perfect matching on the odd degree vertices of a carefully computed 2-edge-connected spanning subgraph.
We study a truthful two-facility location problem in which a set of agents have private positions on the line of real numbers and known approval preferences over two facilities. Given the locations of the two facilities, the cost of an agent is the total distance from the facilities she approves. The goal is to decide where to place the facilities from a given finite set of candidate locations so as to (a) approximately optimize desired social objectives, and (b) incentivize the agents to truthfully report their private positions. We focus on the class of deterministic strategyproof mechanisms and pinpoint the ones with the best possible approximation ratio in terms of the social cost (i.e., the total cost of the agents) and the max cost. In particular, for the social cost, we show a tight bound of $1+\sqrt{2}$ when the preferences of the agents are homogeneous (i.e., all agents approve both facilities), and a tight bound of $3$ when the preferences might be heterogeneous. For the max cost, we show tight bounds of $2$ and $3$ for homogeneous and heterogeneous preferences, respectively.
Given a set of $n$ points in the Euclidean plane, the $k$-MinSumRadius problem asks to cover this point set using $k$ disks with the objective of minimizing the sum of the radii of the disks. After a long line of research on related problems, it was finally discovered that this problem admits a polynomial time algorithm [GKKPV~'12]; however, the running time of this algorithm is $O(n^{881})$, and its relevance is thereby mostly of theoretical nature. A practically and structurally interesting special case of the $k$-MinSumRadius problem is that of small $k$. For the $2$-MinSumRadius problem, a near-quadratic time algorithm with expected running time $O(n^2 \log^2 n \log^2 \log n)$ was given over 30 years ago [Eppstein~'92]. We present the first improvement of this result, namely, a near-linear time algorithm to compute the $2$-MinSumRadius that runs in expected $O(n \log^2 n \log^2 \log n)$ time. We generalize this result to any constant dimension $d$, for which we give an $O(n^{2-1/(\lceil d/2\rceil + 1) + \varepsilon})$ time algorithm. Additionally, we give a near-quadratic time algorithm for $3$-MinSumRadius in the plane that runs in expected $O(n^2 \log^2 n \log^2 \log n)$ time. All of these algorithms rely on insights that uncover a surprisingly simple structure of optimal solutions: we can specify a linear number of lines out of which one separates one of the clusters from the remaining clusters in an optimal solution.
We give a fully dynamic algorithm maintaining a $(1-\varepsilon)$-approximate directed densest subgraph in $\tilde{O}(\log^3(n)/\varepsilon^6)$ amortized time or $\tilde{O}(\log^4(n)/\varepsilon^7)$ per edge update (where $\tilde{O}$ hides $\log\log$ factors), based on earlier work by Chekuri and Quanrud [arXiv:2210.02611]. This result improves on earlier work done by Sawlani and Wang [arXiv:1907.03037], which guarantees $O(\log^5(n)/\varepsilon^7)$ worst case time for edge insertions and deletions.
Recent work by Dhulipala, Liu, Raskhodnikova, Shi, Shun, and Yu~\cite{DLRSSY22} initiated the study of the $k$-core decomposition problem under differential privacy. They show that approximate $k$-core numbers can be output while guaranteeing differential privacy, while only incurring a multiplicative error of $(2 +\eta)$ (for any constant $\eta >0$) and additive error of $\poly(\log(n))/\eps$. In this paper, we revisit this problem. Our main result is an $\eps$-edge differentially private algorithm for $k$-core decomposition which outputs the core numbers with no multiplicative error and $O(\text{log}(n)/\eps)$ additive error. This improves upon previous work by a factor of 2 in the multiplicative error, while giving near-optimal additive error. With a little additional work, this implies improved algorithms for densest subgraph and low out-degree ordering under differential privacy. For low out-degree ordering, we give an $\eps$-edge differentially private algorithm which outputs an implicit orientation such that the out-degree of each vertex is at most $d+O(\log{n}/{\eps})$, where $d$ is the degeneracy of the graph. This improves upon the best known guarantees for the problem by a factor of $4$ and gives near-optimal additive error. For densest subgraph, we give an $\eps$-edge differentially private algorithm outputting a subset of nodes that induces a subgraph of density at least ${D^*}/{2}-O(\text{log}(n)/\eps)$, where $D^*$ is the density for the optimal subgraph.