We investigate the effect of omnipresent cloud storage on distributed computing. We specify a network model with links of prescribed bandwidth that connect standard processing nodes, and, in addition, passive storage nodes. Each passive node represents a cloud storage system, such as Dropbox, Google Drive etc. We study a few tasks in this model, assuming a single cloud node connected to all other nodes, which are connected to each other arbitrarily. We give implementations for basic tasks of collaboratively writing to and reading from the cloud, and for more advanced applications such as matrix multiplication and federated learning. Our results show that utilizing node-cloud links as well as node-node links can considerably speed up computations, compared to the case where processors communicate either only through the cloud or only through the network links. We provide results for general directed graphs, and for graphs with ``fat'' links between processing nodes. For the general case, we provide optimal algorithms for uploading and downloading files using flow techniques. We use these primitives to derive algorithms for \emph{combining}, where every processor node has an input value and the task is to compute a combined value under some given associative operator. In the case of fat links, we assume that links between processors are bidirectional and have high bandwidth, and we give near-optimal algorithms for any commutative combining operator (such as vector addition). For the task of matrix multiplication (or other non-commutative combining operators), where the inputs are ordered, we present sharp results in the simple ``wheel'' network, where procesing nodes are arranged in a ring, and are all connected to a single cloud node.
We consider the problem of secure distributed matrix computation (SDMC), where a \textit{user} queries a function of data matrices generated at distributed \textit{source} nodes. We assume the availability of $N$ honest but curious computation servers, which are connected to the sources, the user, and each other through orthogonal and reliable communication links. Our goal is to minimize the amount of data that must be transmitted from the sources to the servers, called the \textit{upload cost}, while guaranteeing that no $T$ colluding servers can learn any information about the source matrices, and the user cannot learn any information beyond the computation result. We first focus on secure distributed matrix multiplication (SDMM), considering two matrices, and propose a novel polynomial coding scheme using the properties of finite field discrete Fourier transform, which achieves an upload cost significantly lower than the existing results in the literature. We then generalize the proposed scheme to include straggler mitigation, and to the multiplication of multiple matrices while keeping the input matrices, the intermediate computation results, as well as the final result secure against any $T$ colluding servers. We also consider a special case, called computation with own data, where the data matrices used for computation belong to the user. In this case, we drop the security requirement against the user, and show that the proposed scheme achieves the minimal upload cost. We then propose methods for performing other common matrix computations securely on distributed servers, including changing the parameters of secret sharing, matrix transpose, matrix exponentiation, solving a linear system, and matrix inversion, which are then used to show how arbitrary matrix polynomials can be computed securely on distributed servers using the proposed procedure.
Ubiquitous artificial intelligence (AI) is considered one of the key services in 6G systems. AI services typically rely on deep neural network (DNN) requiring heavy computation. Hence, in order to support ubiquitous AI, it is crucial to provide a solution for offloading or distributing computational burden due to DNN, especially at end devices with limited resources. We develop a framework for assigning the computation tasks of DNN inference jobs to the nodes with computing resources in the network, so as to reduce the inference latency in the presence of limited computing power at end devices. To this end, we propose a layered graph model that enables to solve the problem of assigning computation tasks of a single DNN inference job via simple conventional routing. Using this model, we develop algorithms for routing DNN inference jobs over the distributed computing network. We show through numerical evaluations that our algorithms can select nodes and paths adaptively to the computational attributes of given DNN inference jobs in order to reduce the end-to-end latency.
We study sparse linear regression over a network of agents, modeled as an undirected graph (with no centralized node). The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint -- the latter being instrumental to obtain distributed solution methods. While penalty-based consensus methods have been extensively studied in the optimization literature, their statistical and computational guarantees in the high dimensional setting remain unclear. This work provides an answer to this open problem. Our contribution is two-fold. First, we establish statistical consistency of the estimator: under a suitable choice of the penalty parameter, the optimal solution of the penalized problem achieves near optimal minimax rate $\mathcal{O}(s \log d/N)$ in $\ell_2$-loss, where $s$ is the sparsity value, $d$ is the ambient dimension, and $N$ is the total sample size in the network -- this matches centralized sample rates. Second, we show that the proximal-gradient algorithm applied to the penalized problem, which naturally leads to distributed implementations, converges linearly up to a tolerance of the order of the centralized statistical error -- the rate scales as $\mathcal{O}(d)$, revealing an unavoidable speed-accuracy dilemma.Numerical results demonstrate the tightness of the derived sample rate and convergence rate scalings.
We prove several new tight distributed lower bounds for classic symmetry breaking graph problems. As a basic tool, we first provide a new insightful proof that any deterministic distributed algorithm that computes a $\Delta$-coloring on $\Delta$-regular trees requires $\Omega(\log_\Delta n)$ rounds and any randomized algorithm requires $\Omega(\log_\Delta\log n)$ rounds. We prove this result by showing that a natural relaxation of the $\Delta$-coloring problem is a fixed point in the round elimination framework. As a first application, we show that our $\Delta$-coloring lower bound proof directly extends to arbdefective colorings. We exactly characterize which variants of the arbdefective coloring problem are "easy", and which of them instead are "hard". As a second application, which we see as our main contribution, we use the structure of the fixed point as a building block to prove lower bounds as a function of $\Delta$ for a large class of distributed symmetry breaking problems. For example, we obtain a tight lower bound for the fundamental problem of computing a $(2,\beta)$-ruling set. This is an exponential improvement over the best existing lower bound for the problem, which was proven in [FOCS '20]. Our lower bound even applies to a much more general family of problems that allows for almost arbitrary combinations of natural constraints from coloring problems, orientation problems, and independent set problems, and provides a single unified proof for known and new lower bound results for these types of problems. Our lower bounds as a function of $\Delta$ also imply lower bounds as a function of $n$. We obtain, for example, that maximal independent set, on trees, requires $\Omega(\log n / \log \log n)$ rounds for deterministic algorithms, which is tight.
The aim of this work is to develop a fully-distributed algorithmic framework for training graph convolutional networks (GCNs). The proposed method is able to exploit the meaningful relational structure of the input data, which are collected by a set of agents that communicate over a sparse network topology. After formulating the centralized GCN training problem, we first show how to make inference in a distributed scenario where the underlying data graph is split among different agents. Then, we propose a distributed gradient descent procedure to solve the GCN training problem. The resulting model distributes computation along three lines: during inference, during back-propagation, and during optimization. Convergence to stationary solutions of the GCN training problem is also established under mild conditions. Finally, we propose an optimization criterion to design the communication topology between agents in order to match with the graph describing data relationships. A wide set of numerical results validate our proposal. To the best of our knowledge, this is the first work combining graph convolutional neural networks with distributed optimization.
The demand for artificial intelligence has grown significantly over the last decade and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, in order to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges, first and foremost the efficient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the field by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available.
In recent years, mobile devices have gained increasingly development with stronger computation capability and larger storage. Some of the computation-intensive machine learning and deep learning tasks can now be run on mobile devices. To take advantage of the resources available on mobile devices and preserve users' privacy, the idea of mobile distributed machine learning is proposed. It uses local hardware resources and local data to solve machine learning sub-problems on mobile devices, and only uploads computation results instead of original data to contribute to the optimization of the global model. This architecture can not only relieve computation and storage burden on servers, but also protect the users' sensitive information. Another benefit is the bandwidth reduction, as various kinds of local data can now participate in the training process without being uploaded to the server. In this paper, we provide a comprehensive survey on recent studies of mobile distributed machine learning. We survey a number of widely-used mobile distributed machine learning methods. We also present an in-depth discussion on the challenges and future directions in this area. We believe that this survey can demonstrate a clear overview of mobile distributed machine learning and provide guidelines on applying mobile distributed machine learning to real applications.
We present the problem of selecting relevant premises for a proof of a given statement. When stated as a binary classification task for pairs (conjecture, axiom), it can be efficiently solved using artificial neural networks. The key difference between our advance to solve this problem and previous approaches is the use of just functional signatures of premises. To further improve the performance of the model, we use dimensionality reduction technique, to replace long and sparse signature vectors with their compact and dense embedded versions. These are obtained by firstly defining the concept of a context for each functor symbol, and then training a simple neural network to predict the distribution of other functor symbols in the context of this functor. After training the network, the output of its hidden layer is used to construct a lower dimensional embedding of a functional signature (for each premise) with a distributed representation of features. This allows us to use 512-dimensional embeddings for conjecture-axiom pairs, containing enough information about the original statements to reach the accuracy of 76.45% in premise selection task, only with simple two-layer densely connected neural networks.
The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents' autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.