亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we consider distributed optimization problems where $n$ agents, each possessing a local cost function, collaboratively minimize the average of the local cost functions over a connected network. To solve the problem, we propose a distributed random reshuffling (D-RR) algorithm that invokes the random reshuffling (RR) update in each agent. We show that D-RR inherits favorable characteristics of RR for both smooth strongly convex and smooth nonconvex objective functions. In particular, for smooth strongly convex objective functions, D-RR achieves $\mathcal{O}(1/T^2)$ rate of convergence (where $T$ counts epoch number) in terms of the squared distance between the iterate and the global minimizer. When the objective function is assumed to be smooth nonconvex, we show that D-RR drives the squared norm of gradient to $0$ at a rate of $\mathcal{O}(1/T^{2/3})$. These convergence results match those of centralized RR (up to constant factors) and outperform the distributed stochastic gradient descent (DSGD) algorithm if we run a relatively large number of epochs. Finally, we conduct a set of numerical experiments to illustrate the efficiency of the proposed D-RR method on both strongly convex and nonconvex distributed optimization problems.

相關內容

Physics-informed Neural Networks (PINNs) have been shown to be effective in solving partial differential equations by capturing the physics induced constraints as a part of the training loss function. This paper shows that a PINN can be sensitive to errors in training data and overfit itself in dynamically propagating these errors over the domain of the solution of the PDE. It also shows how physical regularizations based on continuity criteria and conservation laws fail to address this issue and rather introduce problems of their own causing the deep network to converge to a physics-obeying local minimum instead of the global minimum. We introduce Gaussian Process (GP) based smoothing that recovers the performance of a PINN and promises a robust architecture against noise/errors in measurements. Additionally, we illustrate an inexpensive method of quantifying the evolution of uncertainty based on the variance estimation of GPs on boundary data. Robust PINN performance is also shown to be achievable by choice of sparse sets of inducing points based on sparsely induced GPs. We demonstrate the performance of our proposed methods and compare the results from existing benchmark models in literature for time-dependent Schr\"odinger and Burgers' equations.

Recent works in contexts like the Internet of Things (IoT) and large-scale Cyber-Physical Systems (CPS) propose the idea of programming distributed systems by focussing on their global behaviour across space and time. In this view, a potentially vast and heterogeneous set of devices is considered as an "aggregate" to be programmed as a whole, while abstracting away the details of individual behaviour and exchange of messages, which are expressed declaratively. One such a paradigm, known as aggregate programming, builds on computational models inspired by field-based coordination. Existing models such as the field calculus capture interaction with neighbours by a so-called "neighbouring field" (a map from neighbours to values). This requires ad-hoc mechanisms to smoothly compose with standard values, thus complicating programming and introducing clutter in aggregate programs, libraries and domain-specific languages (DSLs). To address this key issue we introduce the novel notion of "computation against a neighbour", whereby the evaluation of certain subexpressions of the aggregate program are affected by recent corresponding evaluations in neighbours. We capture this notion in the neighbours calculus (NC), a new field calculus variant which is shown to smoothly support declarative specification of interaction with neighbours, and correspondingly facilitate the embedding of field computations as internal DSLs in common general-purpose programming languages -- as exemplified by a Scala implementation, called ScaFi. This paper formalises NC, thoroughly compares it with respect to the classic field calculus, and shows its expressiveness by means of a case study in edge computing, developed in ScaFi.

Nowadays, big datasets are spread over many machines which compute in parallel and communicate with a central machine through short messages. We consider a sparse regression setting in our paper and develop a new procedure for selective inference with distributed data. While there are many distributed procedures for point estimation in the sparse setting, not many options exist for estimating uncertainties or conducting hypothesis tests in models based on the estimated sparsity. We solve a generalized linear regression on each machine which communicates a selected set of predictors to the central machine. The central machine forms a generalized linear model with the selected predictors. How do we conduct selective inference for the selected regression coefficients? Is it possible to reuse distributed data, in an aggregated form, for selective inference? Our proposed procedure bases approximately-valid selective inference on an asymptotic likelihood. The proposal seeks only aggregated information, in relatively few dimensions, from each machine which is merged at the central machine to construct selective inference. Our procedure is also broadly applicable as a solution to the p-value lottery problem that arises with model selection on random splits of data.

In this paper, we consider solving the distributed optimization problem over a multi-agent network under the communication restricted setting. We study a compressed decentralized stochastic gradient method, termed ``compressed exact diffusion with adaptive stepsizes (CEDAS)", and show the method asymptotically achieves comparable convergence rate as centralized SGD for both smooth strongly convex objective functions and smooth nonconvex objective functions under unbiased compression operators. In particular, to our knowledge, CEDAS enjoys so far the shortest transient time (with respect to the graph specifics) for achieving the convergence rate of centralized SGD, which behaves as $\mathcal{O}(nC^3/(1-\lambda_2)^{2})$ under smooth strongly convex objective functions, and $\mathcal{O}(n^3C^6/(1-\lambda_2)^4)$ under smooth nonconvex objective functions, where $(1-\lambda_2)$ denotes the spectral gap of the mixing matrix, and $C>0$ is the compression-related parameter. Numerical experiments further demonstrate the effectiveness of the proposed algorithm.

Concomitant with the tremendous prevalence of online social media platforms, the interactions among individuals are unprecedentedly enhanced. People are free to interact with acquaintances, express and exchange their own opinions through commenting, liking, retweeting on online social media, leading to resistance, controversy and other important phenomena over controversial social issues, which have been the subject of many recent works. In this paper, we study the problem of minimizing risk of conflict in social networks by modifying the initial opinions of a small number of nodes. We show that the objective function of the combinatorial optimization problem is monotone and supermodular. We then propose a na\"{\i}ve greedy algorithm with a $(1-1/e)$ approximation ratio that solves the problem in cubic time. To overcome the computation challenge for large networks, we further integrate several effective approximation strategies to provide a nearly linear time algorithm with a $(1-1/e-\epsilon)$ approximation ratio for any error parameter $\epsilon>0$. Extensive experiments on various real-world datasets demonstrate both the efficiency and effectiveness of our algorithms. In particular, the fast one scales to large networks with more than two million nodes, and achieves up to $20\times$ speed-up over the state-of-the-art algorithm.

Federated Learning (FL) is a distributed machine learning paradigm where clients collaboratively train a model using their local (human-generated) datasets. While existing studies focus on FL algorithm development to tackle data heterogeneity across clients, the important issue of data quality (e.g., label noise) in FL is overlooked. This paper aims to fill this gap by providing a quantitative study on the impact of label noise on FL. We derive an upper bound for the generalization error that is linear in the clients' label noise level. Then we conduct experiments on MNIST and CIFAR-10 datasets using various FL algorithms. Our empirical results show that the global model accuracy linearly decreases as the noise level increases, which is consistent with our theoretical analysis. We further find that label noise slows down the convergence of FL training, and the global model tends to overfit when the noise level is high.

We develop a novel randomised block coordinate descent primal-dual algorithm for a class of non-smooth ill-posed convex programs. Lying in the midway between the celebrated Chambolle-Pock primal-dual algorithm and Tseng's accelerated proximal gradient method, we establish global convergence of the last iterate as well optimal $O(1/k)$ and $O(1/k^{2})$ complexity rates in the convex and strongly convex case, respectively, $k$ being the iteration count. Motivated by distributed and data-driven control of power systems, we test the performance of our method on a second-order cone relaxation of an AC-OPF problem. Distributed control is achieved via the distributed locational marginal prices (DLMPs), which are obtained dual variables in our optimisation framework.

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

The aim of this work is to develop a fully-distributed algorithmic framework for training graph convolutional networks (GCNs). The proposed method is able to exploit the meaningful relational structure of the input data, which are collected by a set of agents that communicate over a sparse network topology. After formulating the centralized GCN training problem, we first show how to make inference in a distributed scenario where the underlying data graph is split among different agents. Then, we propose a distributed gradient descent procedure to solve the GCN training problem. The resulting model distributes computation along three lines: during inference, during back-propagation, and during optimization. Convergence to stationary solutions of the GCN training problem is also established under mild conditions. Finally, we propose an optimization criterion to design the communication topology between agents in order to match with the graph describing data relationships. A wide set of numerical results validate our proposal. To the best of our knowledge, this is the first work combining graph convolutional neural networks with distributed optimization.

The demand for artificial intelligence has grown significantly over the last decade and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, in order to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges, first and foremost the efficient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the field by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available.

北京阿比特科技有限公司