In this work, we study the multi-agent decision problem where agents try to coordinate to optimize a given system-level objective. While solving for the global optimal is intractable in many cases, the greedy algorithm is a well-studied and efficient way to provide good approximate solutions - notably for submodular optimization problems. Executing the greedy algorithm requires the agents to be ordered and execute a local optimization based on the solutions of the previous agents. However, in limited information settings, passing the solution from the previous agents may be nontrivial, as some agents may not be able to directly communicate with each other. Thus the communication time required to execute the greedy algorithm is closely tied to the order that the agents are given. In this work, we characterize interplay between the communication complexity and agent orderings by showing that the complexity using the best ordering is O(n) and increases considerably to O(n^2) when using the worst ordering. Motivated by this, we also propose an algorithm that can find an ordering and execute the greedy algorithm quickly, in a distributed fashion. We also show that such an execution of the greedy algorithm is advantageous over current methods for distributed submodular maximization.
In many machine learning tasks, a common approach for dealing with large-scale data is to build a small summary, {\em e.g.,} coreset, that can efficiently represent the original input. However, real-world datasets usually contain outliers and most existing coreset construction methods are not resilient against outliers (in particular, an outlier can be located arbitrarily in the space by an adversarial attacker). In this paper, we propose a novel robust coreset method for the {\em continuous-and-bounded learning} problems (with outliers) which includes a broad range of popular optimization objectives in machine learning, {\em e.g.,} logistic regression and $ k $-means clustering. Moreover, our robust coreset can be efficiently maintained in fully-dynamic environment. To the best of our knowledge, this is the first robust and fully-dynamic coreset construction method for these optimization problems. Another highlight is that our coreset size can depend on the doubling dimension of the parameter space, rather than the VC dimension of the objective function which could be very large or even challenging to compute. Finally, we conduct the experiments on real-world datasets to evaluate the effectiveness of our proposed robust coreset method.
For the approximation and simulation of twofold iterated stochastic integrals and the corresponding L\'{e}vy areas w.r.t. a multi-dimensional Wiener process, we review four algorithms based on a Fourier series approach. Especially, the very efficient algorithm due to Wiktorsson and a newly proposed algorithm due to Mrongowius and R\"ossler are considered. To put recent advances into context, we analyse the four Fourier-based algorithms in a unified framework to highlight differences and similarities in their derivation. A comparison of theoretical properties is complemented by a numerical simulation that reveals the order of convergence for each algorithm. Further, concrete instructions for the choice of the optimal algorithm and parameters for the simulation of solutions for stochastic (partial) differential equations are given. Additionally, we provide advice for an efficient implementation of the considered algorithms and incorporated these insights into an open source toolbox that is freely available for both Julia and MATLAB programming languages. The performance of this toolbox is analysed by comparing it to some existing implementations, where we observe a significant speed-up.
Federated learning (FL) is a useful tool in distributed machine learning that utilizes users' local datasets in a privacy-preserving manner. When deploying FL in a constrained wireless environment; however, training models in a time-efficient manner can be a challenging task due to intermittent connectivity of devices, heterogeneous connection quality, and non-i.i.d. data. In this paper, we provide a novel convergence analysis of non-convex loss functions using FL on both i.i.d. and non-i.i.d. datasets with arbitrary device selection probabilities for each round. Then, using the derived convergence bound, we use stochastic optimization to develop a new client selection and power allocation algorithm that minimizes a function of the convergence bound and the average communication time under a transmit power constraint. We find an analytical solution to the minimization problem. One key feature of the algorithm is that knowledge of the channel statistics is not required and only the instantaneous channel state information needs to be known. Using the FEMNIST and CIFAR-10 datasets, we show through simulations that the communication time can be significantly decreased using our algorithm, compared to uniformly random participation.
We present a framework for a controlled Markov chain where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies therefore involve the choice of observation times as well as the subsequent control values. We show that the corresponding value function satisfies a dynamic programming principle, which leads to a system of quasi-variational inequalities (QVIs). Next, we give an extension where the model parameters are not known a priori but are inferred from the costly observations by Bayesian updates. We then prove a comparison principle for a larger class of QVIs, which implies uniqueness of solutions to our proposed problem. We utilise penalty methods to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications which illustrate our framework.
Decentralized stochastic gradient descent (SGD) is a driving engine for decentralized federated learning (DFL). The performance of decentralized SGD is jointly influenced by inter-node communications and local updates. In this paper, we propose a general DFL framework, which implements both multiple local updates and multiple inter-node communications periodically, to strike a balance between communication efficiency and model consensus. It can provide a general decentralized SGD analytical framework. We establish strong convergence guarantees for the proposed DFL algorithm without the assumption of convex objectives. The convergence rate of DFL can be optimized to achieve the balance of communication and computing costs under constrained resources. For improving communication efficiency of DFL, compressed communication is further introduced to the proposed DFL as a new scheme, named DFL with compressed communication (C-DFL). The proposed C-DFL exhibits linear convergence for strongly convex objectives. Experiment results based on MNIST and CIFAR-10 datasets illustrate the superiority of DFL over traditional decentralized SGD methods and show that C-DFL further enhances communication efficiency.
With advances in scientific computing and mathematical modeling, complex phenomena can now be reliably simulated. Such simulations can however be very time-intensive, requiring millions of CPU hours to perform. One solution is multi-fidelity emulation, which uses data of varying accuracies (or fidelities) to train an efficient predictive model (or emulator) for the expensive simulator. In complex problems, simulation data with different fidelities are often connected scientifically via a directed acyclic graph (DAG), which cannot be integrated within existing multi-fidelity emulator models. We thus propose a new Graphical Multi-fidelity Gaussian process (GMGP) model, which embeds this scientific DAG information within a Gaussian process framework. We show that the GMGP has desirable modeling traits via two Markov properties, and admits a scalable formulation for recursively computing the posterior predictive distribution along sub-graphs. We also present an experimental design framework over the DAG given a computational budget, and propose a nonlinear extension of the GMGP model via deep Gaussian processes. The advantages of the GMGP model over existing methods are then demonstrated via a suite of numerical experiments and an application to emulation of heavy-ion collisions, which can be used to study the conditions of matter in the Universe shortly after the Big Bang.
Safety exploration can be regarded as a constrained Markov decision problem where the expected long-term cost is constrained. Previous off-policy algorithms convert the constrained optimization problem into the corresponding unconstrained dual problem by introducing the Lagrangian relaxation technique. However, the cost function of the above algorithms provides inaccurate estimations and causes the instability of the Lagrange multiplier learning. In this paper, we present a novel off-policy reinforcement learning algorithm called Conservative Distributional Maximum a Posteriori Policy Optimization (CDMPO). At first, to accurately judge whether the current situation satisfies the constraints, CDMPO adapts distributional reinforcement learning method to estimate the Q-function and C-function. Then, CDMPO uses a conservative value function loss to reduce the number of violations of constraints during the exploration process. In addition, we utilize Weighted Average Proportional Integral Derivative (WAPID) to update the Lagrange multiplier stably. Empirical results show that the proposed method has fewer violations of constraints in the early exploration process. The final test results also illustrate that our method has better risk control.
We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous works. State density has a clear physical and mathematical interpretation, and is able to express a wide variety of constraints such as resource limits and safety requirements. Density constraints can also avoid the time-consuming process of designing and tuning cost functions required by value function-based constraints to encode system specifications. We leverage the duality between density functions and Q functions to develop an effective algorithm to solve the density constrained RL problem optimally and the constrains are guaranteed to be satisfied. We prove that the proposed algorithm converges to a near-optimal solution with a bounded error even when the policy update is imperfect. We use a set of comprehensive experiments to demonstrate the advantages of our approach over state-of-the-art CRL methods, with a wide range of density constrained tasks as well as standard CRL benchmarks such as Safety-Gym.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.