Throughput is a main performance objective in communication networks. This paper considers a fundamental maximum throughput routing problem -- the all-or-nothing multicommodity flow (ANF) problem -- in arbitrary directed graphs and in the practically relevant but challenging setting where demands can be (much) larger than the edge capacities. Hence, in addition to assigning requests to valid flows for each routed commodity, an admission control mechanism is required which prevents overloading the network when routing commodities. We make several contributions. On the theoretical side we obtain substantially improved bi-criteria approximation algorithms for this NP-hard problem. We present two non-trivial linear programming relaxations and show how to convert their fractional solutions into integer solutions via randomized rounding. One is an exponential-size formulation (solvable in polynomial time using a separation oracle) that considers a ``packing'' view and allows a more flexible approach, while the other is a generalization of the compact LP formulation of Liu et al. (INFOCOM'19) that allows for easy solving via standard LP solvers. We obtain a polynomial-time randomized algorithm that yields an arbitrarily good approximation on the weighted throughput while violating the edge capacity constraints by only a small multiplicative factor. We also describe a deterministic rounding algorithm by derandomization, using the method of pessimistic estimators. We complement our theoretical results with a proof of concept empirical evaluation.
We present a new local-search algorithm for the $k$-median clustering problem. We show that local optima for this algorithm give a $(2.836+\epsilon)$-approximation; our result improves upon the $(3+\epsilon)$-approximate local-search algorithm of Arya et al. [STOC 01]. Moreover, a computer-aided analysis of a natural extension suggests that this approach may lead to an improvement over the best-known approximation guarantee for the problem. The new ingredient in our algorithm is the use of a potential function based on both the closest and second-closest facilities to each client. Specifically, the potential is the sum over all clients, of the distance of the client to its closest facility, plus (a small constant times) the truncated distance to its second-closest facility. We move from one solution to another only if the latter can be obtained by swapping a constant number of facilities, and has a smaller potential than the former. This refined potential allows us to avoid the bad local optima given by Arya et al. for the local-search algorithm based only on the cost of the solution.
A Level Ancestory query LA($u$, $d$) asks for the the ancestor of the node $u$ at a depth $d$. We present a simple solution, which pre-processes the tree in $O(n)$ time with $O(n)$ extra space, and answers the queries in $O(\log\ {n})$ time. Though other optimal algorithms exist, this is a simple enough solution that could be taught and implemented easily.
Frequency estimation is one of the most fundamental problems in streaming algorithms. Given a stream $S$ of elements from some universe $U=\{1 \ldots n\}$, the goal is to compute, in a single pass, a short sketch of $S$ so that for any element $i \in U$, one can estimate the number $x_i$ of times $i$ occurs in $S$ based on the sketch alone. Two state of the art solutions to this problems are the Count-Min and Count-Sketch algorithms. The frequency estimator $\tilde{x}$ produced by Count-Min, using $O(1/\varepsilon \cdot \log n)$ dimensions, guarantees that $\|\tilde{x}-x\|_{\infty} \le \varepsilon \|x\|_1$ with high probability, and $\tilde{x} \ge x$ holds deterministically. Also, Count-Min works under the assumption that $x \ge 0$. On the other hand, Count-Sketch, using $O(1/\varepsilon^2 \cdot \log n)$ dimensions, guarantees that $\|\tilde{x}-x\|_{\infty} \le \varepsilon \|x\|_2$ with high probability. A natural question is whether it is possible to design the best of both worlds sketching method, with error guarantees depending on the $\ell_2$ norm and space comparable to Count-Sketch, but (like Count-Min) also has the no-underestimation property. Our main set of results shows that the answer to the above question is negative. We show this in two incomparable computational models: linear sketching and streaming algorithms. We also study the complementary problem, where the sketch is required to not over-estimate, i.e., $\tilde{x} \le x$ should hold always.
Branchwidth determines how graphs, and more generally, arbitrary connectivity (basically symmetric and submodular) functions could be decomposed into a tree-like structure by specific cuts. We develop a general framework for designing fixed-parameter tractable (FPT) 2-approximation algorithms for branchwidth of connectivity functions. The first ingredient of our framework is combinatorial. We prove a structural theorem establishing that either a sequence of particular refinement operations could decrease the width of a branch decomposition or that the width of the decomposition is already within a factor of 2 from the optimum. The second ingredient is an efficient implementation of the refinement operations for branch decompositions that support efficient dynamic programming. We present two concrete applications of our general framework. $\bullet$ An algorithm that for a given $n$-vertex graph $G$ and integer $k$ in time $2^{2^{O(k)}} n^2$ either constructs a rank decomposition of $G$ of width at most $2k$ or concludes that the rankwidth of $G$ is more than $k$. It also yields a $(2^{2k+1}-1)$-approximation algorithm for cliquewidth within the same time complexity, which in turn, improves to $f(k)n^2$ the running times of various algorithms on graphs of cliquewidth $k$. Breaking the "cubic barrier" for rankwidth and cliquewidth was an open problem in the area. $\bullet$ An algorithm that for a given $n$-vertex graph $G$ and integer $k$ in time $2^{O(k)} n$ either constructs a branch decomposition of $G$ of width at most $2k$ or concludes that the branchwidth of $G$ is more than $k$. This improves over the 3-approximation that follows from the recent treewidth 2-approximation of Korhonen [FOCS 2021].
In this paper, we develop deterministic fully dynamic algorithms for computing approximate distances in a graph with worst-case update time guarantees. In particular we obtain improved dynamic algorithms that, given an unweighted and undirected graph $G=(V,E)$ undergoing edge insertions and deletions, and a parameter $0 < \epsilon \leq 1$, maintain $(1+\epsilon)$-approximations of the $st$ distance of a single pair of nodes, the distances from a single source to all nodes ("SSSP"), the distances from multiple sources to all nodes ("MSSP''), or the distances between all nodes ("APSP"). Our main result is a deterministic algorithm for maintaining $(1+\epsilon)$-approximate single-source distances with worst-case update time $O(n^{1.529})$ (for the current best known bound on the matrix multiplication coefficient $\omega$). This matches a conditional lower bound by [BNS, FOCS 2019]. We further show that we can go beyond this SSSP bound for the problem of maintaining approximate $st$ distances by providing a deterministic algorithm with worst-case update time $O(n^{1.447})$. This even improves upon the fastest known randomized algorithm for this problem. At the core, our approach is to combine algebraic distance maintenance data structures with near-additive emulator constructions. This also leads to novel dynamic algorithms for maintaining $(1+\epsilon, \beta)$-emulators that improve upon the state of the art, which might be of independent interest. Our techniques also lead to improvements for randomized approximate diameter maintenance.
This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound. With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision. We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.
This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i.e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. We develop specialized verification algorithms with provable tightness guarantees under special assumptions and demonstrate the practical significance of our general verification approach on a variety of verification tasks.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
Many resource allocation problems in the cloud can be described as a basic Virtual Network Embedding Problem (VNEP): finding mappings of request graphs (describing the workloads) onto a substrate graph (describing the physical infrastructure). In the offline setting, the two natural objectives are profit maximization, i.e., embedding a maximal number of request graphs subject to the resource constraints, and cost minimization, i.e., embedding all requests at minimal overall cost. The VNEP can be seen as a generalization of classic routing and call admission problems, in which requests are arbitrary graphs whose communication endpoints are not fixed. Due to its applications, the problem has been studied intensively in the networking community. However, the underlying algorithmic problem is hardly understood. This paper presents the first fixed-parameter tractable approximation algorithms for the VNEP. Our algorithms are based on randomized rounding. Due to the flexible mapping options and the arbitrary request graph topologies, we show that a novel linear program formulation is required. Only using this novel formulation the computation of convex combinations of valid mappings is enabled, as the formulation needs to account for the structure of the request graphs. Accordingly, to capture the structure of request graphs, we introduce the graph-theoretic notion of extraction orders and extraction width and show that our algorithms have exponential runtime in the request graphs' maximal width. Hence, for request graphs of fixed extraction width, we obtain the first polynomial-time approximations. Studying the new notion of extraction orders we show that (i) computing extraction orders of minimal width is NP-hard and (ii) that computing decomposable LP solutions is in general NP-hard, even when restricting request graphs to planar ones.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.