This paper considers the problem of Byzantine dispersion and extends previous work along several parameters. The problem of Byzantine dispersion asks: given $n$ robots, up to $f$ of which are Byzantine, initially placed arbitrarily on an $n$ node anonymous graph, design a terminating algorithm to be run by the robots such that they eventually reach a configuration where each node has at most one non-Byzantine robot on it. Previous work solved this problem for rings and tolerated up to $n-1$ Byzantine robots. In this paper, we investigate the problem on more general graphs. We first develop an algorithm that tolerates up to $n-1$ Byzantine robots and works for a more general class of graphs. We then develop an algorithm that works for any graph but tolerates a lesser number of Byzantine robots. We subsequently turn our focus to the strength of the Byzantine robots. Previous work considers only ``weak" Byzantine robots that cannot fake their IDs. We develop an algorithm that solves the problem when Byzantine robots are not weak and can fake IDs. Finally, we study the situation where the number of the robots is not $n$ but some $k$. We show that in such a scenario, the number of Byzantine robots that can be tolerated is severely restricted. Specifically, we show that it is impossible to deterministically solve Byzantine dispersion when $\lceil k/n \rceil > \lceil (k-f)/n \rceil$.
We study a fundamental question concerning adversarial noise models in statistical problems where the algorithm receives i.i.d. draws from a distribution $\mathcal{D}$. The definitions of these adversaries specify the type of allowable corruptions (noise model) as well as when these corruptions can be made (adaptivity); the latter differentiates between oblivious adversaries that can only corrupt the distribution $\mathcal{D}$ and adaptive adversaries that can have their corruptions depend on the specific sample $S$ that is drawn from $\mathcal{D}$. In this work, we investigate whether oblivious adversaries are effectively equivalent to adaptive adversaries, across all noise models studied in the literature. Specifically, can the behavior of an algorithm $\mathcal{A}$ in the presence of oblivious adversaries always be well-approximated by that of an algorithm $\mathcal{A}'$ in the presence of adaptive adversaries? Our first result shows that this is indeed the case for the broad class of statistical query algorithms, under all reasonable noise models. We then show that in the specific case of additive noise, this equivalence holds for all algorithms. Finally, we map out an approach towards proving this statement in its fullest generality, for all algorithms and under all reasonable noise models.
Patankar-type schemes are linearly implicit time integration methods designed to be unconditionally positivity-preserving by going outside of the class of general linear methods. Thus, classical stability concepts cannot be applied and there is no satisfying stability or robustness theory for these schemes. We develop a new approach to study a few related issues that impact some Patankar-type methods. In particular, we demonstrate problematic behaviors of these methods that can lead to undesired oscillations or order reduction on very simple linear problems. Extreme cases of the latter manifest as spurious steady states. We investigate various classes of Patankar-type schemes based on classical Runge-Kutta methods, strong stability preserving Runge-Kutta methods, and deferred correction schemes using our approach. Finally, we strengthen our analysis with challenging applications including stiff nonlinear problems.
Riemannian manifold Hamiltonian Monte Carlo (RMHMC) is a sampling algorithm that seeks to adapt proposals to the local geometry of the posterior distribution. The specific form of the Hamiltonian used in RMHMC necessitates {\it implicitly-defined} numerical integrators in order to sustain reversibility and volume-preservation, two properties that are necessary to establish detailed balance of RMHMC. In practice, these implicit equations are solved to a non-zero convergence tolerance via fixed-point iteration. However, the effect of these convergence thresholds on the ergodicity and computational efficiency properties of RMHMC are not well understood. The purpose of this research is to elucidate these relationships through numerous case studies. Our analysis reveals circumstances wherein the RMHMC algorithm is sensitive, and insensitive, to these convergence tolerances. Our empirical analysis examines several aspects of the computation: (i) we examine the ergodicity of the RMHMC Markov chain by employing statistical methods for comparing probability measures based on collections of samples; (ii) we investigate the degree to which detailed balance is violated by measuring errors in reversibility and volume-preservation; (iii) we assess the efficiency of the RMHMC Markov chain in terms of time-normalized ESS. In each of these cases, we investigate the sensitivity of these metrics to the convergence threshold and further contextualize our results in terms of comparison against Euclidean HMC. We propose a method by which one may select the convergence tolerance within a Bayesian inference application using techniques of stochastic approximation and we examine Newton's method, an alternative to fixed point iterations, which can eliminate much of the sensitivity of RMHMC to the convergence threshold.
We consider a linear relaxation of a generalized minimum-cost network flow problem with binary input dependencies. In this model the flows through certain arcs are bounded by linear (or more generally, piecewise linear concave) functions of the flows through other arcs. This formulation can be used to model interrelated systems in which the components of one system require the delivery of material from another system in order to function (for example, components of a subway system may require delivery of electrical power from a separate system). We propose and study randomized rounding schemes for how this model can be used to approximate solutions to a related mixed integer linear program for modeling binary input dependencies. The introduction of side constraints prevents this problem from being solved using the well-known network simplex algorithm, however by characterizing its basis structure we develop a generalization of network simplex algorithm that can be used for its {computationally} efficient solution.
We consider the consensus interdiction problem (CIP), in which the goal is to maximize the convergence time of consensus averaging dynamics subject to removing a limited number of network edges. We first show that CIP can be cast as an effective resistance interdiction problem (ERIP), in which the goal is to remove a limited number of network edges to maximize the effective resistance between a source node and a sink node. We show that ERIP is strongly NP-hard, even for bipartite graphs of diameter three with fixed source/sink edges, and establish the same hardness result for the CIP. We then show that both ERIP and CIP cannot be approximated up to a (nearly) polynomial factor assuming exponential time hypothesis. Subsequently, we devise a polynomial-time $mn$-approximation algorithm for the ERIP that only depends on the number of nodes $n$ and the number of edges $m$, but is independent of the size of edge resistances. Finally, using a quadratic program formulation for the CIP, we devise an iterative approximation algorithm to find a local optimal solution for the CIP.
Let $G$ be a graph with vertex set $V$. {Two disjoint sets $V_1, V_2 \subseteq V$ form a coalition in $G$ if none of them is a dominating set of $G$ but their union $V_1\cup V_2$ is. A vertex partition $\Psi=\{V_1,\ldots, V_k\}$ of $V$ is called a coalition partition of $G$ if every set~$V_i\in \Psi$ is either a dominating set of $G$ with the cardinality $|V_i|=1$, or is not a dominating set but for some $V_j\in \Psi$, $V_i$ and $V_j$ form a coalition.} The maximum cardinality of a coalition partition of $G$ is called the coalition number of $G$, denoted by $\mathcal{C}(G)$. A $\mathcal{C}(G)$-partition is a coalition partition of $G$ with cardinality $\mathcal{C}(G)$. Given a coalition partition $\Psi=\{V_1, V_2,\ldots, V_r\}$ of~$G$, a coalition graph $CG(G, \Psi)$ is associated on $\Psi$ such that there is a one-to-one correspondence between its vertices and the members of $\Psi$. Two vertices of $CG(G, \Psi)$ are adjacent if and only if the corresponding sets form a coalition in $G$. In this paper, we first show that for any graph $G$ with $\delta(G)=1$, $\mathcal{C}(G)\leq 2\Delta(G)+2$, where $\delta(G)$ and $\Delta(G)$ are the minimum degree and the maximum degree of $G$, respectively. Moreover, we characterize all graphs~$G$ with $\delta(G)\leq 1$ and $\mathcal{C}(G)=n$, where $n$ is the number of vertices of $G$. Furthermore, we characterize all trees $T$ with $\mathcal{C}(T)=n$ and all trees $T$ with $\mathcal{C}(T)=n-1$. This solves partially one of the open problem posed in~\cite{coal0}. On the other hand, we theoretically and empirically determine the number of coalition graphs that can be defined by all coalition partitions of a given path $P_k$. Furthermore, we show that there is no universal coalition path, a path whose coalition partitions defines all possible coalition graphs. These solve two open problems posed by Haynes et al.~\cite{coal1}.
Drug Discovery is a fundamental and ever-evolving field of research. The design of new candidate molecules requires large amounts of time and money, and computational methods are being increasingly employed to cut these costs. Machine learning methods are ideal for the design of large amounts of potential new candidate molecules, which are naturally represented as graphs. Graph generation is being revolutionized by deep learning methods, and molecular generation is one of its most promising applications. In this paper, we introduce a sequential molecular graph generator based on a set of graph neural network modules, which we call MG^2N^2. At each step, a node or a group of nodes is added to the graph, along with its connections. The modular architecture simplifies the training procedure, also allowing an independent retraining of a single module. Sequentiality and modularity make the generation process interpretable. The use of graph neural networks maximizes the information in input at each generative step, which consists of the subgraph produced during the previous steps. Experiments of unconditional generation on the QM9 and Zinc datasets show that our model is capable of generalizing molecular patterns seen during the training phase, without overfitting. The results indicate that our method is competitive, and outperforms challenging baselines for unconditional generation.
Graph Neural Networks (GNN) come in many flavors, but should always be either invariant (permutation of the nodes of the input graph does not affect the output) or equivariant (permutation of the input permutes the output). In this paper, we consider a specific class of invariant and equivariant networks, for which we prove new universality theorems. More precisely, we consider networks with a single hidden layer, obtained by summing channels formed by applying an equivariant linear operator, a pointwise non-linearity and either an invariant or equivariant linear operator. Recently, Maron et al. (2019) showed that by allowing higher-order tensorization inside the network, universal invariant GNNs can be obtained. As a first contribution, we propose an alternative proof of this result, which relies on the Stone-Weierstrass theorem for algebra of real-valued functions. Our main contribution is then an extension of this result to the equivariant case, which appears in many practical applications but has been less studied from a theoretical point of view. The proof relies on a new generalized Stone-Weierstrass theorem for algebra of equivariant functions, which is of independent interest. Finally, unlike many previous settings that consider a fixed number of nodes, our results show that a GNN defined by a single set of parameters can approximate uniformly well a function defined on graphs of varying size.
Asynchronous distributed machine learning solutions have proven very effective so far, but always assuming perfectly functioning workers. In practice, some of the workers can however exhibit Byzantine behavior, caused by hardware failures, software bugs, corrupt data, or even malicious attacks. We introduce \emph{Kardam}, the first distributed asynchronous stochastic gradient descent (SGD) algorithm that copes with Byzantine workers. Kardam consists of two complementary components: a filtering and a dampening component. The first is scalar-based and ensures resilience against $\frac{1}{3}$ Byzantine workers. Essentially, this filter leverages the Lipschitzness of cost functions and acts as a self-stabilizer against Byzantine workers that would attempt to corrupt the progress of SGD. The dampening component bounds the convergence rate by adjusting to stale information through a generic gradient weighting scheme. We prove that Kardam guarantees almost sure convergence in the presence of asynchrony and Byzantine behavior, and we derive its convergence rate. We evaluate Kardam on the CIFAR-100 and EMNIST datasets and measure its overhead with respect to non Byzantine-resilient solutions. We empirically show that Kardam does not introduce additional noise to the learning procedure but does induce a slowdown (the cost of Byzantine resilience) that we both theoretically and empirically show to be less than $f/n$, where $f$ is the number of Byzantine failures tolerated and $n$ the total number of workers. Interestingly, we also empirically observe that the dampening component is interesting in its own right for it enables to build an SGD algorithm that outperforms alternative staleness-aware asynchronous competitors in environments with honest workers.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.