We consider the problem of efficiently scheduling jobs with precedence constraints on a set of identical machines in the presence of a uniform communication delay. In this setting, if two precedence-constrained jobs $u$ and $v$, with ($u \prec v$), are scheduled on different machines, then $v$ must start at least $\rho$ time units after $u$ completes. The scheduling objective is to minimize makespan, i.e. the total time between when the first job starts and the last job completes. The focus of this paper is to provide an efficient approximation algorithm with near-linear running time. We build on the algorithm of Lepere and Rapine [STACS 2002] for this problem to give an $O\left(\frac{\ln \rho}{\ln \ln \rho} \right)$-approximation algorithm that runs in $\tilde{O}(|V| + |E|)$ time.
Dynamic time warping distance (DTW) is a widely used distance measure between time series $x, y \in \Sigma^n$. It was shown by Abboud, Backurs, and Williams that in the \emph{binary case}, where $|\Sigma| = 2$, DTW can be computed in time $O(n^{1.87})$. We improve this running time $O(n)$. Moreover, if $x$ and $y$ are run-length encoded, then there is an algorithm running in time $\tilde{O}(k + \ell)$, where $k$ and $\ell$ are the number of runs in $x$ and $y$, respectively. This improves on the previous best bound of $O(k\ell)$ due to Dupont and Marteau.
We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed nodes. The goal is to reduce the average execution time of the computational job. We provide a connection between the problem of characterizing the average execution time of a coded distributed computing system and the problem of analyzing the error probability of codes of length $n$ used over erasure channels. Accordingly, we present closed-form expressions for the execution time using binary random linear codes and the best execution time any linear-coded distributed computing system can achieve. It is also shown that there exist \textit{good} binary linear codes that not only attain (asymptotically) the best performance that any linear code (not necessarily binary) can achieve but also are numerically stable against the inevitable rounding errors in practice. We then develop a low-complexity algorithm for decoding Reed-Muller (RM) codes over erasure channels. Our decoder only involves additions, subtractions, {and inversion of relatively small matrices of dimensions at most $\log n+1$}, and enables coded computation over real-valued data. Extensive numerical analysis of the fundamental results as well as RM- and polar-coded computing schemes demonstrate the excellence of the RM-coded computation in achieving close-to-optimal performance while having a low-complexity decoding and explicit construction. The proposed framework in this paper enables efficient designs of distributed computing systems given the rich literature in the channel coding theory.
Introduction of PMUs to cyber-physical system pro-vides accurate data acquisition, while posing additional risk of being the victim of cyber attack. Both False Data Injection Attack (FDIA) and GPS-spoofing or timing attack can provide malicious data to the cyber system, though these two attacks require different post-attack contingency plan. Thus accurate detection of timing attack and separating it from conventional FDIA has become a very important research area. In this article, a successful detection of timing attack mechanism is proposed. Firstly, a method to distinguish timing attack and FDIA using unwrapped phase angle data is developed. Secondly, utilizing low rank Henkel matrix property to differentiate timing attack from electrical events is also presented. Finally, an experimental validation of proposed model is performed on IEEE 13 bus system using simulated GPS-spoofing attack. It can be observed that the timing attack can increase the rank 1 approximation error of Henkel matrix of unwrapped angles by 700% for 3 sec delay in GPS time-stamp. The rank 1 approximation error is increased by 500% for 2 sec delay and the increase is insignificant for 1sec delay attack. FDIA doesn't show any significant change in the low rank approximation profile of Henkel matrix.
In this paper, we present a polynomial-time algorithm for the maximum clique problem, which implies P = NP. Our algorithm is based on a continuous game-theoretic representation of this problem and at its heart lies a discrete-time dynamical system. The rule of our dynamical system depends on a parameter such that if this parameter is equal to the maximum-clique size, the iterates of our dynamical system are guaranteed to converge to a maximum clique.
The distributed subgraph detection asks, for a fixed graph $H$, whether the $n$-node input graph contains $H$ as a subgraph or not. In the standard CONGEST model of distributed computing, the complexity of clique/cycle detection and listing has received a lot of attention recently. In this paper we consider the induced variant of subgraph detection, where the goal is to decide whether the $n$-node input graph contains $H$ as an \emph{induced} subgraph or not. We first show a $\tilde{\Omega}(n)$ lower bound for detecting the existence of an induced $k$-cycle for any $k\geq 4$ in the CONGEST model. This lower bound is tight for $k=4$, and shows that the induced variant of $k$-cycle detection is much harder than the non-induced version. This lower bound is proved via a reduction from two-party communication complexity. We complement this result by showing that for $5\leq k\leq 7$, this $\tilde{\Omega}(n)$ lower bound cannot be improved via the two-party communication framework. We then show how to prove stronger lower bounds for larger values of $k$. More precisely, we show that detecting an induced $k$-cycle for any $k\geq 8$ requires $\tilde{\Omega}(n^{2-\Theta{(1/k)}})$ rounds in the CONGEST model, nearly matching the known upper bound $\tilde{O}(n^{2-\Theta{(1/k)}})$ of the general $k$-node subgraph detection (which also applies to the induced version) by Eden, Fiat, Fischer, Kuhn, and Oshman~[DISC 2019]. Finally, we investigate the case where $H$ is the diamond (the diamond is obtained by adding an edge to a 4-cycle, or equivalently removing an edge from a 4-clique), and show non-trivial upper and lower bounds on the complexity of the induced version of diamond detecting and listing.
We consider the problem of scheduling to minimize mean response time in M/G/1 queues where only estimated job sizes (processing times) are known to the scheduler, where a job of true size $s$ has estimated size in the interval $[\beta s, \alpha s]$ for some $\alpha \geq \beta > 0$. We evaluate each scheduling policy by its approximation ratio, which we define to be the ratio between its mean response time and that of Shortest Remaining Processing Time (SRPT), the optimal policy when true sizes are known. Our question: is there a scheduling policy that (a) has approximation ratio near 1 when $\alpha$ and $\beta$ are near 1, (b) has approximation ratio bounded by some function of $\alpha$ and $\beta$ even when they are far from 1, and (c) can be implemented without knowledge of $\alpha$ and $\beta$? We first show that naively running SRPT using estimated sizes in place of true sizes is not such a policy: its approximation ratio can be arbitrarily large for any fixed $\beta < 1$. We then provide a simple variant of SRPT for estimated sizes that satisfies criteria (a), (b), and (c). In particular, we prove its approximation ratio approaches 1 uniformly as $\alpha$ and $\beta$ approach 1. This is the first result showing this type of convergence for M/G/1 scheduling. We also study the Preemptive Shortest Job First (PSJF) policy, a cousin of SRPT. We show that, unlike SRPT, naively running PSJF using estimated sizes in place of true sizes satisfies criteria (b) and (c), as well as a weaker version of (a).
This paper studies a classic maximum entropy sampling problem (MESP), which aims to select the most informative principal submatrix of a prespecified size from a covariance matrix. MESP has been widely applied to many areas, including healthcare, power system, manufacturing and data science. By investigating its Lagrangian dual and primal characterization, we derive a novel convex integer program for MESP and show that its continuous relaxation yields a near-optimal solution. The results motivate us to study an efficient sampling algorithm and develop its approximation bound for MESP, which improves the best-known bound in literature. We then provide an efficient deterministic implementation of the sampling algorithm with the same approximation bound. By developing new mathematical tools for the singular matrices and analyzing the Lagrangian dual of the proposed convex integer program, we investigate the widely-used local search algorithm and prove its first-known approximation bound for MESP. The proof techniques further inspire us with an efficient implementation of the local search algorithm. Our numerical experiments demonstrate that these approximation algorithms can efficiently solve medium-sized and large-scale instances to near-optimality. Our proposed algorithms are coded and released as open-source software. Finally, we extend the analyses to the A-Optimal MESP (A-MESP), where the objective is to minimize the trace of the inverse of the selected principal submatrix.
We develop a methodology to automatically compute worst-case performance bounds for a class of decentralized algorithms that optimize the average of local functions distributed across a network. We extend the recently proposed PEP approach to decentralized optimization. This approach allows computing the exact worst-case performance and worst-case instance of centralized algorithms by solving an SDP. We obtain an exact formulation when the network matrix is given, and a relaxation when considering entire classes of network matrices characterized by their spectral range. We apply our methodology to the decentralized (sub)gradient method, obtain a nearly tight worst-case performance bound that significantly improves over the literature, and gain insights into the worst communication networks for a given spectral range.
We introduce a notion of "simulation" for labelled graphs, in which edges of the simulated graph are realized by regular expressions in the simulating graph, and prove that the tiling problem (aka "domino problem") for the simulating graph is at least as difficult as that for the simulated graph. We apply this to the Cayley graph of the "lamplighter group" $L=\mathbb Z/2\wr\mathbb Z$, and more generally to "Diestel-Leader graphs". We prove that these graphs simulate the plane, and thus deduce that the seeded tiling problem is unsolvable on the group $L$. We note that $L$ does not contain any plane in its Cayley graph, so our undecidability criterion by simulation covers cases not covered by Jeandel's criterion based on translation-like action of a product of finitely generated infinite groups. Our approach to tiling problems is strongly based on categorical constructions in graph theory.
We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.