We revisit the task of computing the edit distance in sublinear time. In the $(k,K)$-gap edit distance problem the task is to distinguish whether the edit distance of two strings is at most $k$ or at least $K$. It has been established by Goldenberg, Krauthgamer and Saha (FOCS '19), with improvements by Kociumaka and Saha (FOCS '20), that the $(k,k^2)$-gap problem can be solved in time $\widetilde O(n/k+\operatorname{poly}(k))$. One of the most natural questions in this line of research is whether the $(k,k^2)$-gap is best-possible for the running time $\widetilde O(n/k+\operatorname{poly}(k))$. In this work we answer this question by significantly improving the gap. Specifically, we show that in time $O(n/k+\operatorname{poly}(k))$ we can even solve the $(k,k^{1+o(1)})$-gap problem. This is the first algorithm that breaks the $(k,k^2)$-gap in this running time. Our algorithm is almost optimal in the following sense: In the low distance regime ($k\le n^{0.19}$) our running time becomes $O(n/k)$, which matches a known $n/k^{1+o(1)}$ lower bound for the $(k,k^{1+o(1)})$-gap problem up to lower order factors. Our result also reveals a surprising similarity of Hamming distance and edit distance in the low distance regime: For both, the $(k,k^{1+o(1)})$-gap problem has time complexity $n/k^{1\pm o(1)}$ for small $k$. In contrast to previous work, which employed a subsampled variant of the Landau-Vishkin algorithm, we instead build upon the algorithm of Andoni, Krauthgamer and Onak (FOCS '10). We first simplify their approach and then show how to to effectively prune their computation tree in order to obtain a sublinear-time algorithm in the given time bound. Towards that, we use a variety of structural insights on the (local and global) patterns that can emerge during this process and design appropriate property testers to effectively detect these patterns.
Let a polytope $P$ be defined by a system $A x \leq b$. We consider the problem of counting the number of integer points inside $P$, assuming that $P$ is $\Delta$-modular, where the polytope $P$ is called $\Delta$-modular if all the rank sub-determinants of $A$ are bounded by $\Delta$ in the absolute value. We present a new FPT-algorithm, parameterized by $\Delta$ and by the maximal number of vertices in $P$, where the maximum is taken by all r.h.s. vectors $b$. We show that our algorithm is more efficient for $\Delta$-modular problems than the approach of A. Barvinok et al. To this end, we do not directly compute the short rational generating function for $P \cap Z^n$, which is commonly used for the considered problem. Instead, we use the dynamic programming principle to compute its particular representation in the form of exponential series that depends on a single variable. We completely do not rely to the Barvinok's unimodular sign decomposition technique. Using our new complexity bound, we consider different special cases that may be of independent interest. For example, we give FPT-algorithms for counting the integer points number in $\Delta$-modular simplices and similar polytopes that have $n + O(1)$ facets. As a special case, for any fixed $m$, we give an FPT-algorithm to count solutions of the unbounded $m$-dimensional $\Delta$-modular subset-sum problem.
Given access to the hypergraph through a subset query oracle in the query model, we give sublinear time algorithms for Hitting-Set with almost tight parameterized query complexity. In parameterized query complexity, we estimate the number of queries to the oracle based on the parameter $k$, the size of the Hitting-Set. The subset query oracle we use in this paper is called Generalized $d$-partite Independent Set query oracle (GPIS) and it was introduced by Bishnu et al. (ISAAC'18). GPIS is a generalization to hypergraphs of the Bipartite Independent Set query oracle (BIS) introduced by Beame et al. (ITCS'18 and TALG'20) for estimating the number of edges in graphs. Formally, GPIS is defined as follows: GPIS oracle for a $d$-uniform hypergraph $\mathcal{H}$ takes as input $d$ pairwise disjoint non-empty subsets $A_1, \ldots, A_d$ of vertices in $\cal H$ and answers whether there is a hyperedge in $\mathcal{H}$ that intersects each set $A_i$, where $i \in \{1, \, 2, \, \ldots, d\}$. } For $d=2$, the GPIS oracle is nothing but BIS oracle. We show that $d$-Hitting-Set, the hitting set problem for $d$-uniform hypergraphs, can be solved using $\widetilde{\mathcal{O}}_d(k^{d} \log n)$ GPIS queries. Additionally, we also showed that $d$-Decesion-Hitting-Set, the decision version of $d$-Hitting-Set can be solved with $\widetilde{\mathcal{O}}_d\left( \min \left\{ k^d\log n, k^{2d^2} \right\} \right)$ {\sc GPIS} queries. We complement these parameterized upper bounds with an almost matching parameterized lower bound that states that any algorithm that solves $d$-Decesion-Hitting-Set requires $\Omega \left( \binom{k+d}{d} \right)$ GPIS queries.
We investigate random matrices whose entries are obtained by applying a nonlinear kernel function to pairwise inner products between $n$ independent data vectors, drawn uniformly from the unit sphere in $\mathbb{R}^d$. This study is motivated by applications in machine learning and statistics, where these kernel random matrices and their spectral properties play significant roles. We establish the weak limit of the empirical spectral distribution of these matrices in a polynomial scaling regime, where $d, n \to \infty$ such that $n / d^\ell \to \kappa$, for some fixed $\ell \in \mathbb{N}$ and $\kappa \in (0, \infty)$. Our findings generalize an earlier result by Cheng and Singer, who examined the same model in the linear scaling regime (with $\ell = 1$). Our work reveals an equivalence principle: the spectrum of the random kernel matrix is asymptotically equivalent to that of a simpler matrix model, constructed as a linear combination of a (shifted) Wishart matrix and an independent matrix sampled from the Gaussian orthogonal ensemble. The aspect ratio of the Wishart matrix and the coefficients of the linear combination are determined by $\ell$ and the expansion of the kernel function in the orthogonal Hermite polynomial basis. Consequently, the limiting spectrum of the random kernel matrix can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law. We also extend our results to cases with data vectors sampled from isotropic Gaussian distributions instead of spherical distributions.
Beame et al. [ITCS 2018 & TALG 2021] introduced and used the Bipartite Independent Set (BIS) and Independent Set (IS) oracle access to an unknown, simple, unweighted and undirected graph and solved the edge estimation problem. The introduction of this oracle set forth a series of works in a short span of time that either solved open questions mentioned by Beame et al. or were generalizations of their work as in Dell and Lapinskas [STOC 2018], Dell, Lapinskas and Meeks [SODA 2020], Bhattacharya et al. [ISAAC 2019 & Theory Comput. Syst. 2021], and Chen et al. [SODA 2020]. Edge estimation using BIS can be done using polylogarithmic queries, while IS queries need sub-linear but more than polylogarithmic queries. Chen et al. improved Beame et al.'s upper bound result for edge estimation using IS and also showed an almost matching lower bound. Beame et al. in their introductory work asked a few open questions out of which one was on estimating structures of higher order than edges, like triangles and cliques, using BIS queries. In this work, we completely resolve the query complexity of estimating triangles using BIS oracle. While doing so, we prove a lower bound for an even stronger query oracle called Edge Emptiness (EE) oracle, recently introduced by Assadi, Chakrabarty and Khanna [ESA 2021] to test graph connectivity.
Uniform cost-distance Steiner trees minimize the sum of the total length and weighted path lengths from a dedicated root to the other terminals. They are applied when the tree is intended for signal transmission, e.g. in chip design or telecommunication networks. They are a special case of general cost-distance Steiner trees, where different distance functions are used for total length and path lengths. We improve the best published approximation factor for the uniform cost-distance Steiner tree problem from 2.39 to 2.05. If we can approximate the minimum-length Steiner tree problem arbitrarily well, our algorithm achieves an approximation factor arbitrarily close to $ 1 + \frac{1}{\sqrt{2}} $. This bound is tight in the following sense. We also prove the gap $ 1 + \frac{1}{\sqrt{2}} $ between optimum solutions and the lower bound which we and all previous approximation algorithms for this problem use. Similarly to previous approaches, we start with an approximate minimum-length Steiner tree and split it into subtrees that are later re-connected. To improve the approximation factor, we split it into components more carefully, taking the cost structure into account, and we significantly enhance the analysis.
We consider the multiwinner election problem where the goal is to choose a committee of $k$ candidates given the voters' utility functions. We allow arbitrary additional constraints on the chosen committee, and the utilities of voters to belong to a very general class of set functions called $\beta$-self bounding. When $\beta=1$, this class includes XOS (and hence, submodular and additive) utilities. We define a novel generalization of core stability called restrained core to handle constraints and consider multiplicative approximations on the utility under this notion. Our main result is the following: If a smooth version of Nash Welfare is globally optimized over committees within the constraints, the resulting committee lies in the $e^{\beta}$-approximate restrained core for $\beta$-self bounding utilities and arbitrary constraints. As a result, we obtain the first constant approximation for stability with arbitrary additional constraints even for additive utilities (factor of $e$), and the first analysis of the stability of Nash Welfare with XOS functions even with no constraints. We complement this positive result by showing that the $c$-approximate restrained core can be empty for $c<16/15$ even for approval utilities and one additional constraint. Furthermore, the exponential dependence on $\beta$ in the approximation is unavoidable for $\beta$-self bounding functions even with no constraints. We next present improved and tight approximation results for simpler classes of utility functions and simpler types of constraints. We also present an extension of restrained core to extended justified representation with constraints and show an existence result for matroid constraints. We finally generalize our results to the setting with arbitrary-size candidates and no additional constraints. Our techniques are different from previous analyses and are of independent interest.
In this paper we discuss potentially practical ways to produce expander graphs with good spectral properties and a compact description. We focus on several classes of uniform and bipartite expander graphs defined as random Schreier graphs of the general linear group over the finite field of size two. We perform numerical experiments and show that such constructions produce spectral expanders that can be useful for practical applications. To find a theoretical explanation of the observed experimental results, we used the method of moments to prove upper bounds for the expected second largest eigenvalue of the random Schreier graphs used in our constructions. We focus on bounds for which it is difficult to study the asymptotic behaviour but it is possible to compute non-trivial conclusions for relatively small graphs with parameters from our numerical experiments (e.g., with less than 2^200 vertices and degree at least logarithmic in the number of vertices).
We study principal component analysis (PCA), where given a dataset in $\mathbb{R}^d$ from a distribution, the task is to find a unit vector $v$ that approximately maximizes the variance of the distribution after being projected along $v$. Despite being a classical task, standard estimators fail drastically if the data contains even a small fraction of outliers, motivating the problem of robust PCA. Recent work has developed computationally-efficient algorithms for robust PCA that either take super-linear time or have sub-optimal error guarantees. Our main contribution is to develop a nearly-linear time algorithm for robust PCA with near-optimal error guarantees. We also develop a single-pass streaming algorithm for robust PCA with memory usage nearly-linear in the dimension.
We make two contributions to the study of theory combination in satisfiability modulo theories. The first is a table of examples for the combinations of the most common model-theoretic properties in theory combination, namely stable infiniteness, smoothness, convexity, finite witnessability, and strong finite witnessability (and therefore politeness and strong politeness as well). All of our examples are sharp, in the sense that we also offer proofs that no theories are available within simpler signatures. This table significantly progresses the current understanding of the various properties and their interactions. The most remarkable example in this table is of a theory over a single sort that is polite but not strongly polite (the existence of such a theory was only known until now for two-sorted signatures). The second contribution is a new combination theorem showing that in order to apply polite theory combination, it is sufficient for one theory to be stably infinite and strongly finitely witnessable, thus showing that smoothness is not a critical property in this combination method. This result has the potential to greatly simplify the process of showing which theories can be used in polite combination, as showing stable infiniteness is considerably simpler than showing smoothness.
Subset selection tasks, arise in recommendation systems and search engines and ask to select a subset of items that maximize the value for the user. The values of subsets often display diminishing returns, and hence, submodular functions have been used to model them. If the inputs defining the submodular function are known, then existing algorithms can be used. In many applications, however, inputs have been observed to have social biases that reduce the utility of the output subset. Hence, interventions to improve the utility are desired. Prior works focus on maximizing linear functions -- a special case of submodular functions -- and show that fairness constraint-based interventions can not only ensure proportional representation but also achieve near-optimal utility in the presence of biases. We study the maximization of a family of submodular functions that capture functions arising in the aforementioned applications. Our first result is that, unlike linear functions, constraint-based interventions cannot guarantee any constant fraction of the optimal utility for this family of submodular functions. Our second result is an algorithm for submodular maximization. The algorithm provably outputs subsets that have near-optimal utility for this family under mild assumptions and that proportionally represent items from each group. In empirical evaluation, with both synthetic and real-world data, we observe that this algorithm improves the utility of the output subset for this family of submodular functions over baselines.