We show the equivalence of two distributed computing models, namely reconfigurable broadcast networks (RBN) and asynchronous shared-memory systems (ASMS), that were introduced independently. Both RBN and ASMS are systems in which a collection of anonymous, finite-state processes run the same protocol. In RBN, the processes communicate by selective broadcast: a process can broadcast a message which is received by all of its neighbors, and the set of neighbors of a process can change arbitrarily over time. In ASMS, the processes communicate by shared memory: a process can either write to or read from a shared register. Our main result is that RBN and ASMS can simulate each other, i.e. they are equivalent with respect to parameterized reachability, where we are given two (possibly infinite) sets of configurations C and C' defined by upper and lower bounds on the number of processes in each state and we would like to decide if some configuration in C can reach some configuration in C'. Using this simulation equivalence, we transfer results of RBN to ASMS and vice versa. Finally, we show that RBN and ASMS can simulate a third distributed model called immediate observation (IO) nets. Moreover, for a slightly stronger notion of simulation (which is satisfied by all the simulations given in this paper), we show that IO nets cannot simulate RBN.
We revisit on-average algorithmic stability of GD for training overparameterised shallow neural networks and prove new generalisation and excess risk bounds without the NTK or PL assumptions. In particular, we show oracle type bounds which reveal that the generalisation and excess risk of GD is controlled by an interpolating network with the shortest GD path from initialisation (in a sense, an interpolating network with the smallest relative norm). While this was known for kernelised interpolants, our proof applies directly to networks trained by GD without intermediate kernelisation. At the same time, by relaxing oracle inequalities developed here we recover existing NTK-based risk bounds in a straightforward way, which demonstrates that our analysis is tighter. Finally, unlike most of the NTK-based analyses we focus on regression with label noise and show that GD with early stopping is consistent.
Deciding whether a diagram of a knot can be untangled with a given number of moves (as a part of the input) is known to be NP-complete. In this paper we determine the parameterized complexity of this problem with respect to a natural parameter called defect. Roughly speaking, it measures the efficiency of the moves used in the shortest untangling sequence of Reidemeister moves. We show that the II- moves in a shortest untangling sequence can be essentially performed greedily. Using that, we show that this problem belongs to W[P] when parameterized by the defect. We also show that this problem is W[P]-hard by a reduction from Minimum axiom set.
We consider a Gathering problem for n autonomous mobile robots with persistent memory called light in an asynchronous scheduler (ASYNC). It is well known that Gathering is impossible when robots have no lights in basic common models, if the system is semi-synchronous (SSYNC) or even centralized (only one robot is active in each time). It is known that Gathering can be solved by robots with 10 colors of lights in ASYNC. This result is obtained by combining the following results. (1) The simulation of SSYNC robots with k colors by ASYNC robots with 5k colors, and (2) Gathering is solved by SSYNC robots with 2 colors. In this paper, we improve the result by reducing the number of colors and show that Gathering can be solved by ASYNC robots with 3 colors of lights. We also show that we can construct a simulation algorithm of any unfair SSYNC algorithm using k colors by ASYNC robots with 3k colors, where unfairness does not guarantee that every robot is activated infinitely often. Combining this simulation and the Gathering algorithm by SSYNC robots with 2 colors, we obtain a Gathering algorithm by ASYNC robots with 6 colors. Our main result can be obtained by reducing the number of colors from 6 to 3.
Large ML models and datasets have necessitated the use of multi-GPU systems for distributed model training. To harness the power offered by multi-GPU systems, it is critical to eliminate bottlenecks in inter-GPU communication - a problem made challenging by the heterogeneous nature of interconnects. In this work, we present TACCL, a synthesizer for collective communication primitives for large-scale multi-GPU systems. TACCL encodes a profiled topology and input size into a synthesis problem to generate optimized communication algorithms. TACCL is built on top of the standard NVIDIA Collective Communication Library (NCCL), allowing it to be a drop-in replacement for GPU communication in frameworks like PyTorch with minimal changes. TACCL generates algorithms for communication primitives like Allgather, Alltoall, and Allreduce that are up to $3\times$ faster than NCCL. Using TACCL's algorithms speeds up the end-to-end training of an internal mixture of experts model by $17\%$. By decomposing the optimization problem into parts and leveraging the symmetry in multi-GPU topologies, TACCL synthesizes collectives for up to 80-GPUs in less than 3 minutes, at least two orders of magnitude faster than other synthesis-based state-of-the-art collective communication libraries.
This paper studies a Group Influence with Minimum cost which aims to find a seed set with smallest cost that can influence all target groups, where each user is associated with a cost and a group is influenced if the total score of the influenced users belonging to the group is at least a certain threshold. As the group-influence function is neither submodular nor supermodular, theoretical bounds on the quality of solutions returned by the well-known greedy approach may not be guaranteed. To address this challenge, we propose a bi-criteria polynomial-time approximation algorithm with high certainty. At the heart of the algorithm is a novel group reachable reverse sample concept, which helps speed up the estimation of the group influence function. Finally, extensive experiments conducted on real social networks show that our proposed algorithm outperform the state-of-the-art algorithms in terms of the objective value and the running time.
Given a digraph $G$, a set $X\subseteq V(G)$ is said to be absorbing set (resp. dominating set) if every vertex in the graph is either in $X$ or is an in-neighbour (resp. out-neighbour) of a vertex in $X$. A set $S\subseteq V(G)$ is said to be an independent set if no two vertices in $S$ are adjacent in $G$. A kernel (resp. solution) of $G$ is an independent and absorbing (resp. dominating) set in $G$. We explore the algorithmic complexity of these problems in the well known class of interval digraphs. A digraph $G$ is an interval digraph if a pair of intervals $(S_u,T_u)$ can be assigned to each vertex $u$ of $G$ such that $(u,v)\in E(G)$ if and only if $S_u\cap T_v\neq\emptyset$. Many different subclasses of interval digraphs have been defined and studied in the literature by restricting the kinds of pairs of intervals that can be assigned to the vertices. We observe that several of these classes, like interval catch digraphs, interval nest digraphs, adjusted interval digraphs and chronological interval digraphs, are subclasses of the more general class of reflexive interval digraphs -- which arise when we require that the two intervals assigned to a vertex have to intersect. We show that all the problems mentioned above are efficiently solvable, in most of the cases even linear-time solvable, in the class of reflexive interval digraphs, but are APX-hard on even the very restricted class of interval digraphs called point-point digraphs, where the two intervals assigned to each vertex are required to be degenerate, i.e. they consist of a single point each. The results we obtain improve and generalize several existing algorithms and structural results for subclasses of reflexive interval digraphs.
This is a paper in the intersection of time series analysis and complexity theory that presents new results on permutation complexity in general and permutation entropy in particular. In this context, permutation complexity refers to the characterization of time series by means of ordinal patterns (permutations), entropic measures, decay rates of missing ordinal patterns, and more. Since the inception of this \textquotedblleft ordinal\textquotedblright\ methodology, its practical application to any type of scalar time series and real-valued processes have proven to be simple and useful. However, the theoretical aspects have remained limited to noiseless deterministic series and dynamical systems, the main obstacle being the super-exponential growth of visible permutations with length when randomness (also in form of observational noise) is present in the data. To overcome this difficulty, we take a new approach through complexity classes, which are precisely defined by the growth of visible permutations with length, regardless of the deterministic or noisy nature of the data. We consider three major classes: exponential, sub-factorial and factorial. The next step is to adapt the concept of Z-entropy to each of those classes, which we call permutation entropy because it coincides with the conventional permutation entropy on the exponential class. Z-entropies are a family of group entropies, each of them extensive on a given complexity class. The result is a unified approach to the ordinal analysis of deterministic and random processes, from dynamical systems to white noise, with new concepts and tools. Numerical simulations show that permutation entropy discriminates time series from all complexity classes.
We present an algorithm for the repair of parameterized systems. The repair problem is, for a given process implementation, to find a refinement such that a given property is satisfied by the resulting parameterized system, and deadlocks are avoided. Our algorithm employs a constraint-based search for candidate repairs, and uses a parameterized model checker to determine their correctness and update the constraint system in case errors are reachable. We apply this algorithm on systems that can be represented as well-structured transition systems (WSTS), including disjunctive systems, pairwise rendezvous systems, and broadcast protocols. Moreover, we show that parameterized deadlock detection can be decided in NEXPTIME for disjunctive systems, vastly improving on the best known lower bound, and that it is in general undecidable for broadcast protocols.
The emerging Industrial Internet of Things (IIoT) is driving an ever increasing demand for providing low latency services to massive devices over wireless channels. As a result, how to assure the quality-of-service (QoS) for a large amount of mobile users is becoming a challenging issue in the envisioned sixth-generation (6G) network. In such networks, the delay-optimal wireless access will require a joint channel and queue aware scheduling, whose complexity increases exponentially with the number of users. In this paper, we adopt the mean field approximation to conceive a buffer-aware multi-user diversity or opportunistic access protocol, which serves all backlogged packets of a user if its channel gain is beyond a threshold. A theoretical analysis and numerical results will demonstrate that not only the cross-layer scheduling policy is of low complexity but is also asymptotically optimal for a huge number of devices.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.