This paper considers the problem of distributed multi-agent learning, where the global aim is to minimize a sum of local objective (empirical loss) functions through local optimization and information exchange between neighbouring nodes. We introduce a Newton-type fully distributed optimization algorithm, Network-GIANT, which is based on GIANT, a Federated learning algorithm that relies on a centralized parameter server. The Network-GIANT algorithm is designed via a combination of gradient-tracking and a Newton-type iterative algorithm at each node with consensus based averaging of local gradient and Newton updates. We prove that our algorithm guarantees semi-global and exponential convergence to the exact solution over the network assuming strongly convex and smooth loss functions. We provide empirical evidence of the superior convergence performance of Network-GIANT over other state-of-art distributed learning algorithms such as Network-DANE and Newton-Raphson Consensus.
A new multivariate integer-valued Generalized AutoRegressive Conditional Heteroscedastic process based on a multivariate Poisson generalized inverse Gaussian distribution is proposed. The estimation of parameters of the proposed multivariate heavy-tailed count time series model via maximum likelihood method is challenging since the likelihood function involves a Bessel function that depends on the multivariate counts and its dimension. As a consequence, numerical instability is often experienced in optimization procedures. To overcome this computational problem, two feasible variants of the Expectation-Maximization (EM) algorithm are proposed for estimating parameters of our model under low and high-dimensional settings. These EM algorithm variants provide computational benefits and help avoid the difficult direct optimization of the likelihood function from the proposed model. Our model and proposed estimation procedures can handle multiple features such as modeling of multivariate counts, heavy-taildness, overdispersion, accommodation of outliers, allowances for both positive and negative autocorrelations, estimation of cross/contemporaneous-correlation, and the efficient estimation of parameters from both statistical and computational points of view. Extensive Monte Carlo simulation studies are presented to assess the performance of the proposed EM algorithms. An application to modeling bivariate count time series data on cannabis possession-related offenses in Australia is discussed.
We address the computational efficiency in solving the A-optimal Bayesian design of experiments problems for which the observational model is based on partial differential equations and, consequently, is computationally expensive to evaluate. A-optimality is a widely used and easy-to-interpret criterion for the Bayesian design of experiments. The criterion seeks the optimal experiment design by minimizing the expected conditional variance, also known as the expected posterior variance. This work presents a novel likelihood-free method for seeking the A-optimal design of experiments without sampling or integrating the Bayesian posterior distribution. In our approach, the expected conditional variance is obtained via the variance of the conditional expectation using the law of total variance, while we take advantage of the orthogonal projection property to approximate the conditional expectation. Through an asymptotic error estimation, we show that the intractability of the posterior does not affect the performance of our approach. We use an artificial neural network (ANN) to approximate the nonlinear conditional expectation to implement our method. For dealing with continuous experimental design parameters, we integrate the training process of the ANN into minimizing the expected conditional variance. Specifically, we propose a non-local approximation of the conditional expectation and apply transfer learning to reduce the number of evaluations of the observation model. Through numerical experiments, we demonstrate that our method significantly reduces the number of observational model evaluations compared with common importance sampling-based approaches. This reduction is crucial, considering the computationally expensive nature of these models.
Magnitude pruning is one of the mainstream methods in lightweight architecture design whose goal is to extract subnetworks with the largest weight connections. This method is known to be successful, but under very high pruning regimes, it suffers from topological inconsistency which renders the extracted subnetworks disconnected, and this hinders their generalization ability. In this paper, we devise a novel magnitude pruning method that allows extracting subnetworks while guarantying their topological consistency. The latter ensures that only accessible and co-accessible -- impactful -- connections are kept in the resulting lightweight networks. Our solution is based on a novel reparametrization and two supervisory bi-directional networks which implement accessibility/co-accessibility and guarantee that only connected subnetworks will be selected during training. This solution allows enhancing generalization significantly, under very high pruning regimes, as corroborated through extensive experiments, involving graph convolutional networks, on the challenging task of skeleton-based action recognition.
In this work, we study the Uncertainty Quantification (UQ) of an algorithmic estimator of the saddle point of a strongly-convex strongly-concave objective. Specifically, we show that the averaged iterates of a Stochastic Extra-Gradient (SEG) method for a Saddle Point Problem (SPP) converges almost surely to the saddle point and follows a Central Limit Theorem (CLT) with optimal covariance under two different noise settings, namely the martingale-difference noise and the state-dependent Markov noise. To ensure the stability of the algorithm dynamics under the state-dependent Markov noise, we propose a variant of SEG with truncated varying sets. Our work opens the door for online inference of SPP with numerous potential applications in GAN, robust optimization, and reinforcement learning to name a few. We illustrate our results through numerical experiments.
A novel numerical strategy is introduced for computing approximations of solutions to a Cahn-Hilliard model with degenerate mobilities. This model has recently been introduced as a second-order phase-field approximation for surface diffusion flows. Its numerical discretization is challenging due to the degeneracy of the mobilities, which generally requires an implicit treatment to avoid stability issues at the price of increased complexity costs. To mitigate this drawback, we consider new first- and second-order Scalar Auxiliary Variable (SAV) schemes that, differently from existing approaches, focus on the relaxation of the mobility, rather than the Cahn-Hilliard energy. These schemes are introduced and analysed theoretically in the general context of gradient flows and then specialised for the Cahn-Hilliard equation with mobilities. Various numerical experiments are conducted to highlight the advantages of these new schemes in terms of accuracy, effectiveness and computational cost.
Operational risk is challenging to quantify because of the broad range of categories (fraud, technological issues, natural disasters) and the heavy-tailed nature of realized losses. Operational risk modeling requires quantifying how these broad loss categories are related. We focus on the issue of loss frequencies having different time scales (e.g., daily, yearly, monthly basis), specifically on estimating the statistics of losses on arbitrary time horizons. We present a frequency model where mathematical techniques can be feasibly applied to analytically calculate the mean, variance, and co-variances that are accurate compared to more time-consuming Monte Carlo simulations. We show that the analytic calculations of cumulative loss statistics in an arbitrary time window are feasible here and would otherwise be intractable due to temporal correlations. Our work has potential value because these statistics are crucial for approximating correlations of losses via copulas. We systematically vary all model parameters to demonstrate the accuracy of our methods for calculating all first and second order statistics of aggregate loss distributions. Finally, using combined data from a consortium of institutions, we show that different time horizons can lead to a large range of loss statistics that can significantly affect calculations of capital requirements.
In recent years statistical physics has proven to be a valuable tool to probe into large dimensional inference problems such as the ones occurring in machine learning. Statistical physics provides analytical tools to study fundamental limitations in their solutions and proposes algorithms to solve individual instances. In these notes, based on the lectures by Marc M\'ezard in 2022 at the summer school in Les Houches, we will present a general framework that can be used in a large variety of problems with weak long-range interactions, including the compressed sensing problem, or the problem of learning in a perceptron. We shall see how these problems can be studied at the replica symmetric level, using developments of the cavity methods, both as a theoretical tool and as an algorithm.
The recurrent neural network has been greatly developed for effectively solving time-varying problems corresponding to complex environments. However, limited by the way of centralized processing, the model performance is greatly affected by factors like the silos problems of the models and data in reality. Therefore, the emergence of distributed artificial intelligence such as federated learning (FL) makes it possible for the dynamic aggregation among models. However, the integration process of FL is still server-dependent, which may cause a great risk to the overall model. Also, it only allows collaboration between homogeneous models, and does not have a good solution for the interaction between heterogeneous models. Therefore, we propose a Distributed Computation Model (DCM) based on the consortium blockchain network to improve the credibility of the overall model and effective coordination among heterogeneous models. In addition, a Distributed Hierarchical Integration (DHI) algorithm is also designed for the global solution process. Within a group, permissioned nodes collect the local models' results from different permissionless nodes and then sends the aggregated results back to all the permissionless nodes to regularize the processing of the local models. After the iteration is completed, the secondary integration of the local results will be performed between permission nodes to obtain the global results. In the experiments, we verify the efficiency of DCM, where the results show that the proposed model outperforms many state-of-the-art models based on a federated learning framework.
We study distributed estimation and learning problems in a networked environment in which agents exchange information to estimate unknown statistical properties of random variables from their privately observed samples. By exchanging information about their private observations, the agents can collectively estimate the unknown quantities, but they also face privacy risks. The goal of our aggregation schemes is to combine the observed data efficiently over time and across the network, while accommodating the privacy needs of the agents and without any coordination beyond their local neighborhoods. Our algorithms enable the participating agents to estimate a complete sufficient statistic from private signals that are acquired offline or online over time, and to preserve the privacy of their signals and network neighborhoods. This is achieved through linear aggregation schemes with adjusted randomization schemes that add noise to the exchanged estimates subject to differential privacy (DP) constraints. In every case, we demonstrate the efficiency of our algorithms by proving convergence to the estimators of a hypothetical, omniscient observer that has central access to all of the signals. We also provide convergence rate analysis and finite-time performance guarantees and show that the noise that minimizes the convergence time to the best estimates is the Laplace noise, with parameters corresponding to each agent's sensitivity to their signal and network characteristics. Finally, to supplement and validate our theoretical results, we run experiments on real-world data from the US Power Grid Network and electric consumption data from German Households to estimate the average power consumption of power stations and households under all privacy regimes.
An important problem in network science is finding an optimal placement of sensors in nodes in order to uniquely detect failures in the network. This problem can be modelled as an identifying code set (ICS) problem, introduced by Karpovsky et al. in 1998. The ICS problem aims to find a cover of a set $S$, s.t. the elements in the cover define a unique signature for each of the elements of $S$, and to minimise the cover's cardinality. In this work, we study a generalised identifying code set (GICS) problem, where a unique signature must be found for each subset of $S$ that has a cardinality of at most $k$ (instead of just each element of $S$). The concept of an independent support of a Boolean formula was introduced by Chakraborty et al. in 2014 to speed up propositional model counting, by identifying a subset of variables whose truth assignments uniquely define those of the other variables. In this work, we introduce an extended version of independent support, grouped independent support (GIS), and show how to reduce the GICS problem to the GIS problem. We then propose a new solving method for finding a GICS, based on finding a GIS. We show that the prior state-of-the-art approaches yield integer-linear programming (ILP) models whose sizes grow exponentially with the problem size and $k$, while our GIS encoding only grows polynomially with the problem size and $k$. While the ILP approach can solve the GICS problem on networks of at most 494 nodes, the GIS-based method can handle networks of up to 21363 nodes; a $\sim 40\times$ improvement. The GIS-based method shows up to a $520\times$ improvement on the ILP-based method in terms of median solving time. For the majority of the instances that can be encoded and solved by both methods, the cardinality of the solution returned by the GIS-based method is less than $10\%$ larger than the cardinality of the solution found by the ILP method.