A dynamic graph algorithm is a data structure that answers queries about a property of the current graph while supporting graph modifications such as edge insertions and deletions. Prior work has shown strong conditional lower bounds for general dynamic graphs, yet graph families that arise in practice often exhibit structural properties that the existing lower bound constructions do not possess. We study three specific graph families that are ubiquitous, namely constant-degree graphs, power-law graphs, and expander graphs, and give the first conditional lower bounds for them. Our results show that even when restricting our attention to one of these graph classes, any algorithm for fundamental graph problems such as distance computation or approximation or maximum matching, cannot simultaneously achieve a sub-polynomial update time and query time. For example, we show that the same lower bounds as for general graphs hold for maximum matching and ($s,t$)-distance in constant-degree graphs, power-law graphs or expanders. Namely, in an $m$-edge graph, there exists no dynamic algorithms with both $O(m^{1/2 - \epsilon})$ update time and $ O(m^{1 -\epsilon})$ query time, for any small $\epsilon > 0$. Note that for ($s,t$)-distance the trivial dynamic algorithm achieves an almost matching upper bound of constant update time and $O(m)$ query time. We prove similar bounds for the other graph families and for other fundamental problems such as densest subgraph detection and perfect matching.
In applications of group testing in networks, e.g. identifying individuals who are infected by a disease spread over a network, exploiting correlation among network nodes provides fundamental opportunities in reducing the number of tests needed. We model and analyze group testing on $n$ correlated nodes whose interactions are specified by a graph $G$. We model correlation through an edge-faulty random graph formed from $G$ in which each edge is dropped with probability $1-r$, and all nodes in the same component have the same state. We consider three classes of graphs: cycles and trees, $d$-regular graphs and stochastic block models or SBM, and obtain lower and upper bounds on the number of tests needed to identify the defective nodes. Our results are expressed in terms of the number of tests needed when the nodes are independent and they are in terms of $n$, $r$, and the target error. In particular, we quantify the fundamental improvements that exploiting correlation offers by the ratio between the total number of nodes $n$ and the equivalent number of independent nodes in a classic group testing algorithm. The lower bounds are derived by illustrating a strong dependence of the number of tests needed on the expected number of components. In this regard, we establish a new approximation for the distribution of component sizes in "$d$-regular trees" which may be of independent interest and leads to a lower bound on the expected number of components in $d$-regular graphs. The upper bounds are found by forming dense subgraphs in which nodes are more likely to be in the same state. When $G$ is a cycle or tree, we show an improvement by a factor of $log(1/r)$. For grid, a graph with almost $2n$ edges, the improvement is by a factor of ${(1-r) \log(1/r)}$, indicating drastic improvement compared to trees. When $G$ has a larger number of edges, as in SBM, the improvement can scale in $n$.
It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias and variance. Although this balancing is widely observed, little is known whether methods exist that could avoid the trade-off between bias and variance. We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a prespecified bound. This shows to which extent the bias-variance trade-off is unavoidable and allows to quantify the loss of performance for methods that do not obey it. The approach is based on a number of abstract lower bounds for the variance involving the change of expectation with respect to different probability measures as well as information measures such as the Kullback-Leibler or $\chi^2$-divergence. In a second part of the article, the abstract lower bounds are applied to several statistical models including the Gaussian white noise model, a boundary estimation problem, the Gaussian sequence model and the high-dimensional linear regression model. For these specific statistical applications, different types of bias-variance trade-offs occur that vary considerably in their strength. For the trade-off between integrated squared bias and integrated variance in the Gaussian white noise model, we propose to combine the general strategy for lower bounds with a reduction technique. This allows us to reduce the original problem to a lower bound on the bias-variance trade-off for estimators with additional symmetry properties in a simpler statistical model. In the Gaussian sequence model, different phase transitions of the bias-variance trade-off occur. Although there is a non-trivial interplay between bias and variance, the rate of the squared bias and the variance do not have to be balanced in order to achieve the minimax estimation rate.
Recent progress was made in characterizing the generalization error of gradient methods for general convex loss by the learning theory community. In this work, we focus on how training longer might affect generalization in smooth stochastic convex optimization (SCO) problems. We first provide tight lower bounds for general non-realizable SCO problems. Furthermore, existing upper bound results suggest that sample complexity can be improved by assuming the loss is realizable, i.e. an optimal solution simultaneously minimizes all the data points. However, this improvement is compromised when training time is long and lower bounds are lacking. Our paper examines this observation by providing excess risk lower bounds for gradient descent (GD) and stochastic gradient descent (SGD) in two realizable settings: 1) realizable with $T = O(n)$, and (2) realizable with $T = \Omega(n)$, where $T$ denotes the number of training iterations and $n$ is the size of the training dataset. These bounds are novel and informative in characterizing the relationship between $T$ and $n$. In the first small training horizon case, our lower bounds almost tightly match and provide the first optimal certificates for the corresponding upper bounds. However, for the realizable case with $T = \Omega(n)$, a gap exists between the lower and upper bounds. We provide a conjecture to address this problem, that the gap can be closed by improving upper bounds, which is supported by our analyses in one-dimensional and linear regression scenarios.
We study the sample complexity of identifying an approximate equilibrium for two-player zero-sum $n\times 2$ matrix games. That is, in a sequence of repeated game plays, how many rounds must the two players play before reaching an approximate equilibrium (e.g., Nash)? We derive instance-dependent bounds that define an ordering over game matrices that captures the intuition that the dynamics of some games converge faster than others. Specifically, we consider a stochastic observation model such that when the two players choose actions $i$ and $j$, respectively, they both observe each other's played actions and a stochastic observation $X_{ij}$ such that $\mathbb E[ X_{ij}] = A_{ij}$. To our knowledge, our work is the first case of instance-dependent lower bounds on the number of rounds the players must play before reaching an approximate equilibrium in the sense that the number of rounds depends on the specific properties of the game matrix $A$ as well as the desired accuracy. We also prove a converse statement: there exist player strategies that achieve this lower bound.
We investigate the parameterized complexity of several problems formalizing cluster identification in graphs. In other words we ask whether a graph contains a large enough and sufficiently connected subgraph. We study here three relaxations of CLIQUE: $s$-CLUB and $s$-CLIQUE, in which the relaxation is focused on the distances in respectively the cluster and the original graph, and $\gamma$-COMPLETE SUBGRAPH in which the relaxation is made on the minimal degree in the cluster. As these three problems are known to be NP-hard, we study here their parameterized complexities. We prove that $s$-CLUB and $s$-CLIQUE are NP-hard even restricted to graphs of degeneracy $\le 3$ whenever $s \ge 3$, and to graphs of degeneracy $\le 2$ whenever $s \ge 5$, which is a strictly stronger result than its W[1]-hardness parameterized by the degeneracy. We also obtain that these problems are solvable in polynomial time on graphs of degeneracy $1$. Concerning $\gamma$-COMPLETE SUBGRAPH, we prove that it is W[1]-hard parameterized by both the degeneracy, which implies the W[1]-hardness parameterized by the number of vertices in the $\gamma$-complete-subgraph, and the number of elements outside the $\gamma$-complete subgraph.
We study the hidden-action principal-agent problem in an online setting. In each round, the principal posts a contract that specifies the payment to the agent based on each outcome. The agent then makes a strategic choice of action that maximizes her own utility, but the action is not directly observable by the principal. The principal observes the outcome and receives utility from the agent's choice of action. Based on past observations, the principal dynamically adjusts the contracts with the goal of maximizing her utility. We introduce an online learning algorithm and provide an upper bound on its Stackelberg regret. We show that when the contract space is $[0,1]^m$, the Stackelberg regret is upper bounded by $\widetilde O(\sqrt{m} \cdot T^{1-1/(2m+1)})$, and lower bounded by $\Omega(T^{1-1/(m+2)})$, where $\widetilde O$ omits logarithmic factors. This result shows that exponential-in-$m$ samples are sufficient and necessary to learn a near-optimal contract, resolving an open problem on the hardness of online contract design. Moreover, when contracts are restricted to some subset $\mathcal{F} \subset [0,1]^m$, we define an intrinsic dimension of $\mathcal{F}$ that depends on the covering number of the spherical code in the space and bound the regret in terms of this intrinsic dimension. When $\mathcal{F}$ is the family of linear contracts, we show that the Stackelberg regret grows exactly as $\Theta(T^{2/3})$. The contract design problem is challenging because the utility function is discontinuous. Bounding the discretization error in this setting has been an open problem. In this paper, we identify a limited set of directions in which the utility function is continuous, allowing us to design a new discretization method and bound its error. This approach enables the first upper bound with no restrictions on the contract and action space.
We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). As a by-product of our investigations, we settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.
We revisit the task of computing the edit distance in sublinear time. In the $(k,K)$-gap edit distance problem the task is to distinguish whether the edit distance of two strings is at most $k$ or at least $K$. It has been established by Goldenberg, Krauthgamer and Saha (FOCS '19), with improvements by Kociumaka and Saha (FOCS '20), that the $(k,k^2)$-gap problem can be solved in time $\widetilde O(n/k+\operatorname{poly}(k))$. One of the most natural questions in this line of research is whether the $(k,k^2)$-gap is best-possible for the running time $\widetilde O(n/k+\operatorname{poly}(k))$. In this work we answer this question by significantly improving the gap. Specifically, we show that in time $O(n/k+\operatorname{poly}(k))$ we can even solve the $(k,k^{1+o(1)})$-gap problem. This is the first algorithm that breaks the $(k,k^2)$-gap in this running time. Our algorithm is almost optimal in the following sense: In the low distance regime ($k\le n^{0.19}$) our running time becomes $O(n/k)$, which matches a known $n/k^{1+o(1)}$ lower bound for the $(k,k^{1+o(1)})$-gap problem up to lower order factors. Our result also reveals a surprising similarity of Hamming distance and edit distance in the low distance regime: For both, the $(k,k^{1+o(1)})$-gap problem has time complexity $n/k^{1\pm o(1)}$ for small $k$. In contrast to previous work, which employed a subsampled variant of the Landau-Vishkin algorithm, we instead build upon the algorithm of Andoni, Krauthgamer and Onak (FOCS '10). We first simplify their approach and then show how to to effectively prune their computation tree in order to obtain a sublinear-time algorithm in the given time bound. Towards that, we use a variety of structural insights on the (local and global) patterns that can emerge during this process and design appropriate property testers to effectively detect these patterns.
Mining subgraphs with interesting structural properties from networks (or graphs) is a computationally challenging task. In this paper, we propose two algorithms for enumerating all connected induced subgraphs of a given cardinality from networks (or connected undirected graphs in networks). The first algorithm is a variant of a previous well-known algorithm. The algorithm enumerates all connected induced subgraphs of cardinality $k$ in a bottom-up manner. The data structures that lead to unit time element checking and linear space are presented. Different from previous algorithms that either work in a bottom-up manner or a reverse search manner, an algorithm that enumerates all connected induced subgraphs of cardinality $k$ in a top-down manner is proposed. The correctness and complexity of the top-down algorithm are theoretically analyzed and proven. In the experiments, we evaluate the efficiency of the algorithms using a set of real-world networks from various fields. Experimental results show that the variant bottom-up algorithm outperforms the state-of-the-art algorithms for enumerating connected induced subgraphs of small cardinality, and the top-down algorithm can achieve an order of magnitude speedup over the state-of-the-art algorithms for enumerating connected induced subgraphs of large cardinality.
The matrix sensing problem is an important low-rank optimization problem that has found a wide range of applications, such as matrix completion, phase synchornization/retrieval, robust PCA, and power system state estimation. In this work, we focus on the general matrix sensing problem with linear measurements that are corrupted by random noise. We investigate the scenario where the search rank $r$ is equal to the true rank $r^*$ of the unknown ground truth (the exact parametrized case), as well as the scenario where $r$ is greater than $r^*$ (the overparametrized case). We quantify the role of the restricted isometry property (RIP) in shaping the landscape of the non-convex factorized formulation and assisting with the success of local search algorithms. First, we develop a global guarantee on the maximum distance between an arbitrary local minimizer of the non-convex problem and the ground truth under the assumption that the RIP constant is smaller than $1/(1+\sqrt{r^*/r})$. We then present a local guarantee for problems with an arbitrary RIP constant, which states that any local minimizer is either considerably close to the ground truth or far away from it. More importantly, we prove that this noisy, overparametrized problem exhibits the strict saddle property, which leads to the global convergence of perturbed gradient descent algorithm in polynomial time. The results of this work provide a comprehensive understanding of the geometric landscape of the matrix sensing problem in the noisy and overparametrized regime.