We consider the problem of designing fundamental graph algorithms on the model of Massive Parallel Computation (MPC). The input to the problem is an undirected graph $G$ with $n$ vertices and $m$ edges, and with $D$ being the maximum diameter of any connected component in $G$. We consider the MPC with low local space, allowing each machine to store only $\Theta(n^\delta)$ words for an arbitrarily constant $\delta > 0$, and with linear global space (which is equal to the number of machines times the local space available), that is, with optimal utilization. In a recent breakthrough, Andoni et al. (FOCS 18) and Behnezhad et al. (FOCS 19) designed parallel randomized algorithms that in $O(\log D + \log \log n)$ rounds on an MPC with low local space determine all connected components of an input graph, improving upon the classic bound of $O(\log n)$ derived from earlier works on PRAM algorithms. In this paper, we show that asymptotically identical bounds can be also achieved for deterministic algorithms: we present a deterministic MPC low local space algorithm that in $O(\log D + \log \log n)$ rounds determines all connected components of the input graph.
We address the problem of computing a Steiner Arborescence on a directed hypercube. The directed hypercube enjoys a special connectivity structure among its node set $\{0,1\}^m$, but its exponential in $m$ size renders traditional Steiner tree algorithms inefficient. Even though the problem was known to be NP-complete, parameterized complexity of the problem was unknown. With applications in evolutionary tree reconstruction algorithms and incremental algorithms for computing a property on multiple input graphs, any algorithm for this problem would open up new ways to study these applications. In this paper, we present the first algorithms, to the best our knowledge, that prove the problem to be fixed parameter tractable (FPT) wrt two natural parameters -- number of input terminals and penalty of the arborescence. Given any directed $m$-dimensional hypercube, rooted at the zero node, and a set of input terminals $R$ that needs to be spanned by the Steiner arborescence, we prove that the problem is FPT wrt the penalty parameter $q$, by providing a randomized algorithm that computes an optimal arborescence $T$ in $O\left(q^44^{q\left(q+1\right)}+q\left|R\right|m^2\right)$ with probability at least $4^{-q}$. If we trade-off exact solution for an additive approximate one, then we can design a parameterized approximation algorithm with better running time - computing an arborescence $T$ with cost at most $OPT+(\left|R\right|-4)(q_{opt}-1)$ in time $O(\left|R\right|m^2+1.2738^{q_{opt}})$. We also present a dynamic programming algorithm that computes an optimal arborescence in $O(3^{\left|R\right|}\left|R\right|m)$ time, thus proving that the problem is FPT on the parameter $\left|R\right|$.
Inference of directed relations given some unspecified interventions, that is, the target of each intervention is not known, is important yet challenging. For instance, it is of high interest to unravel the regulatory roles of genes with inherited genetic variants like single-nucleotide polymorphisms (SNPs), which can be unspecified interventions because of their regulatory function on some unknown genes. In this article, we test hypothesized directed relations with unspecified interventions. First, we derive conditions to yield an identifiable model. Unlike classical inference, hypothesis testing requires identifying ancestral relations and relevant interventions for each hypothesis-specific primary variable, referring to as causal discovery. Towards this end, we propose a peeling algorithm to establish a hierarchy of primary variables as nodes, starting with leaf nodes at the hierarchy's bottom, for which we derive a difference-of-convex (DC) algorithm for nonconvex minimization. Moreover, we prove that the peeling algorithm yields consistent causal discovery, and the DC algorithm is a low-order polynomial algorithm capable of finding a global minimizer almost surely under the data generating distribution. Second, we propose a modified likelihood ratio test, eliminating nuisance parameters to increase power. To enhance finite-sample performance, we integrate the modified likelihood ratio test with a data perturbation scheme by accounting for the uncertainty of identifying ancestral relations and relevant interventions. Also, we show that the distribution of a data-perturbation test statistic converges to the target distribution in high dimensions. Numerical examples demonstrate the utility and effectiveness of the proposed methods, including an application to infer gene regulatory networks.
The Gromov-Hausdorff distance ($d_\mathrm{GH}$) provides a natural way of quantifying the dissimilarity between two given metric spaces. It is known that computing $d_\mathrm{GH}$ between two finite metric spaces is NP-hard, even in the case of finite ultrametric spaces which are highly structured metric spaces in the sense that they satisfy the so-called \emph{strong triangle inequality}. Ultrametric spaces naturally arise in many applications such as hierarchical clustering, phylogenetics, genomics, and even linguistics. By exploiting the special structures of ultrametric spaces, (1) we identify a one parameter family $\{d_\mathrm{GH}^{(p)}\}_{p\in[1,\infty]}$ of distances defined in a flavor similar to the Gromov-Hausdorff distance on the collection of finite ultrametric spaces, and in particular $d_\mathrm{GH}^{(1)} =d_\mathrm{GH}$. The extreme case when $p=\infty$, which we also denote by $u_\mathrm{GH}$, turns out to be an ultrametric on the collection of ultrametric spaces. Whereas for all $p\in[1,\infty)$, $d_\mathrm{GH}^{(p)}$ yields NP-hard problems, we prove that surprisingly $u_\mathrm{GH}$ can be computed in polynomial time. The proof is based on a structural theorem for $u_\mathrm{GH}$ established in this paper; (2) inspired by the structural theorem for $u_\mathrm{GH}$, and by carefully leveraging properties of ultrametric spaces, we also establish a structural theorem for $d_\mathrm{GH}$ when restricted to ultrametric spaces. This structural theorem allows us to identify special families of ultrametric spaces on which $d_\mathrm{GH}$ is computationally tractable. These families are determined by properties related to the doubling constant of metric space. Based on these families, we devise a fixed-parameter tractable (FPT) algorithm for computing the exact value of $d_\mathrm{GH}$ between ultrametric spaces. We believe ours is the first such algorithm to be identified.
Random 2-cell embeddings of a given graph $G$ are obtained by choosing a random local rotation around every vertex. We analyze the expected number of faces, $\mathbb{E}[F_G]$, of such an embedding which is equivalent to studying its average genus. So far, tight results are known for two families called monopoles and dipoles. We extend the dipole result to a more general family called multistars, i.e., loopless multigraphs in which there is a vertex incident with all the edges. In particular, we show that the expected number of faces of every multistar with $n$ nonleaf edges lies in an interval of length $2/(n + 1)$ centered at the expected number of faces of an $n$-edge dipole. This allows us to derive bounds on $\mathbb{E}[F_G]$ for any given graph $G$ in terms of vertex degrees. We conjecture that $\mathbb{E}[F_G ] \le O(n)$ for any simple $n$-vertex graph $G$.
Finite-state transducers (FSTs) are frequently used in speech recognition. Transducer composition is an essential operation for combining different sources of information at different granularities. However, composition is also one of the more computationally expensive operations. Due to the heterogeneous structure of FSTs, parallel algorithms for composition are suboptimal in efficiency, generality, or both. We propose an algorithm for parallel composition and implement it on graphics processing units. We benchmark our parallel algorithm on the composition of random graphs and the composition of graphs commonly used in speech recognition. The parallel composition scales better with the size of the input graphs and for large graphs can be as much as 10 to 30 times faster than a sequential CPU algorithm.
For extensive coverage areas, multi-beam high throughput satellite (MB-HTS) communication is a promising technology that plays a crucial role in delivering broadband services to many users with diverse Quality of Service (QoS) requirements. This paper focuses on MB-HTS systems where all beams reuse the same spectrum. In particular, we propose a novel user scheduling and power allocation design capable of providing guarantees in terms of the individual QoS requirements while maximizing the system throughput under a limited power budget. Precoding is employed in the forward link to mitigate mutual interference at the users in multiple-access scenarios over different coherence time intervals. The combinatorial optimization structure from user scheduling requires an extremely high cost to obtain the global optimum even when a reduced number of users fit into a time slot. Therefore, we propose a heuristic algorithm yielding good trade-off between performance and computational complexity, applicable to a static operation framework of geostationary (GEO) satellite networks. Although the power allocation optimization is signomial programming, non-convex on a standard form, the solution can be lower bounded by the global optimum of a geometric program with a hidden convex structure. A local solution to the joint user scheduling and power allocation problem is consequently obtained by a successive optimization approach. Numerical results demonstrate the effectiveness of our algorithms on large-scale systems by providing better QoS satisfaction combined with outstanding overall system throughput.
Inspired by the developments in quantum computing, building quantum-inspired classical hardware to solve computationally hard problems has been receiving increasing attention. By introducing systematic sparsification techniques, we propose and demonstrate a massively parallel architecture, termed sIM or the sparse Ising Machine. Exploiting the sparsity of the resultant problem graphs, the sIM achieves ideal parallelism: the key figure of merit $-$ flips per second $-$ scales linearly with the total number of probabilistic bits (p-bit) in the system. This makes sIM up to 6 orders of magnitude faster than a CPU implementing standard Gibbs sampling. When compared to optimized implementations in TPUs and GPUs, the sIM delivers up to ~ 5 - 18x measured speedup. In benchmark combinatorial optimization problems such as integer factorization, the sIM can reliably factor semi-primes up to 32-bits, far larger than previous attempts from D-Wave and other probabilistic solvers. Strikingly, the sIM beats competition-winning SAT solvers (by up to ~ 4 - 700x in runtime to reach 95% accuracy) in solving hard instances of the 3SAT problem. A surprising observation is that even when the asynchronous sampling is made inexact with simultaneous updates using faster clocks, sIM can find the correct ground state with further speedup. The problem encoding and sparsification techniques we introduce can be readily applied to other Ising Machines (classical and quantum) and the asynchronous architecture we present can be used for scaling the demonstrated 5,000$-$10,000 p-bits to 1,000,000 or more through CMOS or emerging nanodevices.
We consider a contextual bandit problem with a combinatorial action set and time-varying base arm availability. At the beginning of each round, the agent observes the set of available base arms and their contexts and then selects an action that is a feasible subset of the set of available base arms to maximize its cumulative reward in the long run. We assume that the mean outcomes of base arms are samples from a Gaussian Process indexed by the context set ${\cal X}$, and the expected reward is Lipschitz continuous in expected base arm outcomes. For this setup, we propose an algorithm called Optimistic Combinatorial Learning and Optimization with Kernel Upper Confidence Bounds (O'CLOK-UCB) and prove that it incurs $\tilde{O}(K\sqrt{T\overline{\gamma}_{T}} )$ regret with high probability, where $\overline{\gamma}_{T}$ is the maximum information gain associated with the set of base arm contexts that appeared in the first $T$ rounds and $K$ is the maximum cardinality of any feasible action over all rounds. To dramatically speed up the algorithm, we also propose a variant of O'CLOK-UCB that uses sparse GPs. Finally, we experimentally show that both algorithms exploit inter-base arm outcome correlation and vastly outperform the previous state-of-the-art UCB-based algorithms in realistic setups.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to parallel inference either don't converge to the correct posterior or require storage of large dense matrices in memory. We present a novel sampler that overcomes both problems, and we show that this sampler is faster, both empirically and theoretically, than previous Gibbs samplers for LDA. We do so by employing a novel P\'olya-urn-based approximation in the sparse partially collapsed sampler for LDA. We prove that the approximation error vanishes with data size, making our algorithm asymptotically exact, a property of importance for large-scale topic models. In addition, we show, via an explicit example, that -- contrary to popular belief in the topic modeling literature -- partially collapsed samplers can be more efficient than fully collapsed samplers. We conclude by comparing the performance of our algorithm with that of other approaches on well-known corpora.