亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<form id='e4l24'></form>

<bdo id='e4l24'><sup id='e4l24'><div id='e4l24'><bdo id='e4l24'></bdo></div></sup></bdo>

·

泛化理論 · 分離的 · Learning · 損失 · FAST ·

2022 年 12 月 30 日

Decentralized Learning with Separable Data: Generalization and Fast Algorithms

Hossein Taheri,Christos Thrampoulidis

Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent (DGD) and a variety of loss functions that asymptote to zero at infinity (including exponential and logistic losses), we derive novel finite-time generalization bounds. This complements a long line of recent work that studies the generalization performance and the implicit bias of gradient descent over separable data, but has thus far been limited to centralized learning scenarios. Notably, our generalization bounds approximately match in order their centralized counterparts. Critical behind this, and of independent interest, is establishing novel bounds on the training loss and the rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the algorithmic front, we design improved gradient-based routines for decentralized learning with separable data and empirically demonstrate orders-of-magnitude of speed-up in terms of both training and generalization performance.

相關內容

泛化理論

線性的 · 模型評估 · Performer · Integration · 離散化 ·

2023 年 2 月 28 日

Spectrally-tuned compact finite-difference schemes with domain decomposition and applications to numerical relativity

Compact finite-difference (FD) schemes specify derivative approximations implicitly, thus to achieve parallelism with domain-decomposition suitable partitioning of linear systems is required. Consistent order of accuracy, dispersion, and dissipation is crucial to maintain in wave propagation problems such that deformation of the associated spectra of the discretized problems is not too severe. In this work we consider numerically tuning spectral error, at fixed formal order of accuracy to automatically devise new compact FD schemes. Grid convergence tests indicate error reduction of at least an order of magnitude over standard FD. A proposed hybrid matching-communication strategy maintains the aforementioned properties under domain-decomposition. Under evolution of linear wave-propagation problems utilizing exponential integration or explicit Runge-Kutta methods improvement is found to remain robust. A first demonstration that compact FD methods may be applied to the Z4c formulation of numerical relativity is provided where we couple our header-only, templated C++ implementation to the highly performant GR-Athena++ code. Evolving Z4c on test-bed problems shows at least an order in magnitude reduction in phase error compared to FD for propagated metric components. Stable binary-black-hole evolution utilizing compact FD together with improved convergence is also demonstrated.

Learning · 樣本 · MoDELS · 方差 · 服務器 ·

2023 年 2 月 27 日

MoDeST: Bridging the Gap between Federated and Decentralized Learning with Decentralized Sampling

Martijn de Vos,Akash Dhasade,Anne-Marie Kermarrec,Erick Lavoie,Johan Pouwelse

Federated and decentralized machine learning leverage end-user devices for privacy-preserving training of models at lower operating costs than within a data center. In a round of Federated Learning (FL), a random sample of participants trains locally, then a central server aggregates the local models to produce a single model for the next round. In a round of Decentralized Learning (DL), all participants train locally and then aggregate with their immediate neighbors, resulting in many local models with residual variance between them. On the one hand, FL's sampling and lower model variance provides lower communication costs and faster convergence. On the other hand, DL removes the need for a central server and distributes the communication costs more evenly amongst nodes, albeit at a larger total communication cost and slower convergence. In this paper, we present MoDeST: Mostly-Consistent Decentralized Sampling Training. MoDeST implements decentralized sampling in which a random subset of nodes is responsible for training and aggregation every round: this provides the benefits of both FL and DL without their traditional drawbacks. Our evaluation of MoDeST on four common learning tasks: (i) confirms convergence as fast as FL, (ii) shows a 3x-14x reduction in communication costs compared to DL, and (iii) demonstrates that MoDeST quickly adapts to nodes joining, leaving, or failing, even when 80% of all nodes become unresponsive.

Agent · 貪心 · 貪心逐層預訓練 · Better · Obvious ·

2023 年 2 月 27 日

On the Connection between Greedy Algorithms and Imperfect Rationality

Diodato Ferraioli,Carmine Ventre

The design of algorithms or protocols that are able to align the goals of the planner with the selfish interests of the agents involved in these protocols is of paramount importance in almost every decentralized setting (such as, computer networks, markets, etc.) as shown by the rich literature in Mechanism Design. Recently, huge interest has been devoted to the design of mechanisms for imperfectly rational agents, i.e., mechanisms for which agents are able to easily grasp that there is no action different from following the protocol that would satisfy their interests better. This work has culminated in the definition of Obviously Strategyproof (OSP) Mechanisms, that have been shown to capture the incentives of agents without contingent reasoning skills. Without an understanding of the algorithmic nature of OSP mechanisms, it is hard to assess how well these mechanisms can satisfy the goals of the planner. For the case of binary allocation problems and agents whose private type is a single number, recent work has shown that a generalization of greedy completely characterizes OSP. In this work, we strengthen the connection between greedy and OSP by providing a characterization of OSP mechanisms for all optimization problems involving these single-parameter agents. Specifically, we prove that OSP mechanisms must essentially work as follows: they either greedily look for agents with ``better'' types and allocate them larger outcomes; or reverse greedily look for agents with ``worse'' types and allocate them smaller outcomes; or, finally, split the domain of agents in ``good'' and ``bad'' types, and subsequently proceed in a reverse greedy fashion for the former and greedily for the latter. We further demonstrate how to use this characterization to give bounds on the approximation guarantee of OSP mechanisms for the well known scheduling related machines problem.

Agent · Learning · 數據集 · MoDELS · 可約的 ·

2023 年 2 月 25 日

Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data Distributions

Sai Aparna Aketi,Sangamesh Kodge,Kaushik Roy

from arxiv, 29 pages, 5 figures, 16 tables. arXiv admin note: text overlap with arXiv:2103.02051 by other authors

Decentralized learning over distributed datasets can have significantly different data distributions across the agents. The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed. This paper focuses on improving decentralized learning over non-IID data. We propose \textit{Neighborhood Gradient Clustering (NGC)}, a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information. Cross-gradients for a pair of neighboring agents are the derivatives of the model parameters of an agent with respect to the dataset of the other agent. In particular, the proposed method replaces the local gradients of the model with the weighted mean of the self-gradients, model-variant cross-gradients (derivatives of the neighbors' parameters with respect to the local dataset), and data-variant cross-gradients (derivatives of the local model with respect to its neighbors' datasets). The data-variant cross-gradients are aggregated through an additional communication round without breaking the privacy constraints. Further, we present \textit{CompNGC}, a compressed version of \textit{NGC} that reduces the communication overhead by $32 \times$. We theoretically analyze the convergence rate of the proposed algorithm and demonstrate its efficiency over non-IID data sampled from {various vision and language} datasets trained. Our experiments demonstrate that \textit{NGC} and \textit{CompNGC} outperform (by $0-6\%$) the existing SoTA decentralized learning algorithm over non-IID data with significantly less compute and memory requirements. Further, our experiments show that the model-variant cross-gradient information available locally at each agent can improve the performance over non-IID data by $1-35\%$ without additional communication cost.

PDE · Learning · 數據可用性 · 線性的 · Machine Learning ·

2023 年 2 月 24 日

Elliptic PDE learning is provably data-efficient

Nicolas Boullé,Diana Halikias,Alex Townsend

from arxiv, 25 pages, 2 figures

PDE learning is an emerging field that combines physics and machine learning to recover unknown physical systems from experimental data. While deep learning models traditionally require copious amounts of training data, recent PDE learning techniques achieve spectacular results with limited data availability. Still, these results are empirical. Our work provides theoretical guarantees on the number of input-output training pairs required in PDE learning, explaining why these methods can be data-efficient. Specifically, we exploit randomized numerical linear algebra and PDE theory to derive a provably data-efficient algorithm that recovers solution operators of 3D elliptic PDEs from input-output data and achieves an exponential convergence rate with respect to the size of the training dataset with an exceptionally high probability of success.

Learning · 聯邦學習 · Principle · Analysis · 線性的 ·

2023 年 2 月 24 日

From Noisy Fixed-Point Iterations to Private ADMM for Centralized and Federated Learning

Edwige Cyffers,Aurelien Bellet,Debabrota Basu

We study differentially private (DP) machine learning algorithms as instances of noisy fixed-point iterations, in order to derive privacy and utility results from this well-studied framework. We show that this new perspective recovers popular private gradient-based methods like DP-SGD and provides a principled way to design and analyze new private optimization algorithms in a flexible manner. Focusing on the widely-used Alternating Directions Method of Multipliers (ADMM) method, we use our general framework to derive novel private ADMM algorithms for centralized, federated and fully decentralized learning. For these three algorithms, we establish strong privacy guarantees leveraging privacy amplification by iteration and by subsampling. Finally, we provide utility guarantees using a unified analysis that exploits a recent linear convergence result for noisy fixed-point iterations.

Learning · Neural Networks · Networking · Networks · 元學習 ·

2023 年 2 月 24 日

Meta Learning in Decentralized Neural Networks: Towards More General AI

from arxiv, Accepted for AAAI 2023 Doctoral Consortium

Meta-learning usually refers to a learning algorithm that learns from other learning algorithms. The problem of uncertainty in the predictions of neural networks shows that the world is only partially predictable and a learned neural network cannot generalize to its ever-changing surrounding environments. Therefore, the question is how a predictive model can represent multiple predictions simultaneously. We aim to provide a fundamental understanding of learning to learn in the contents of Decentralized Neural Networks (Decentralized NNs) and we believe this is one of the most important questions and prerequisites to building an autonomous intelligence machine. To this end, we shall demonstrate several pieces of evidence for tackling the problems above with Meta Learning in Decentralized NNs. In particular, we will present three different approaches to building such a decentralized learning system: (1) learning from many replica neural networks, (2) building the hierarchy of neural networks for different functions, and (3) leveraging different modality experts to learn cross-modal representations.

泛化理論 · Analysis · Learning · contrastive · Lipschitz ·

2023 年 2 月 24 日

Generalization Analysis for Contrastive Representation Learning

Yunwen Lei,Tianbao Yang,Yiming Ying,Ding-Xuan Zhou

Recently, contrastive learning has found impressive success in advancing the state of the art in solving various machine learning tasks. However, the existing generalization analysis is very limited or even not meaningful. In particular, the existing generalization error bounds depend linearly on the number $k$ of negative examples while it was widely shown in practice that choosing a large $k$ is necessary to guarantee good generalization of contrastive learning in downstream tasks. In this paper, we establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms. Our analysis uses structural results on empirical covering numbers and Rademacher complexities to exploit the Lipschitz continuity of loss functions. For self-bounding Lipschitz loss functions, we further improve our results by developing optimistic bounds which imply fast rates in a low noise condition. We apply our results to learning with both linear representation and nonlinear representation by deep neural networks, for both of which we derive Rademacher complexity bounds to get improved generalization bounds.

Learning · Neural Networks · Networking · 可約的 · Networks ·

2022 年 9 月 1 日

Learning with Differentiable Algorithms

from arxiv, PhD thesis (summa cum laude), University of Konstanz, 162 pages

Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.

主動學習 · 自由能 · Extensibility · 學成 · TAP ·

2021 年 12 月 2 日

Active Learning for Domain Adaptation: An Energy-based Approach

Binhui Xie,Longhui Yuan,Shuang Li,Chi Harold Liu,Xinjing Cheng,Guoren Wang

from arxiv, Accepted by AAAI 2022. Code is available at //github.com/BIT-DA/EADA

Unsupervised domain adaptation has recently emerged as an effective paradigm for generalizing deep neural networks to new target domains. However, there is still enormous potential to be tapped to reach the fully supervised performance. In this paper, we present a novel active learning strategy to assist knowledge transfer in the target domain, dubbed active domain adaptation. We start from an observation that energy-based models exhibit free energy biases when training (source) and test (target) data come from different distributions. Inspired by this inherent mechanism, we empirically reveal that a simple yet efficient energy-based sampling strategy sheds light on selecting the most valuable target samples than existing approaches requiring particular architectures or computation of the distances. Our algorithm, Energy-based Active Domain Adaptation (EADA), queries groups of targe data that incorporate both domain characteristic and instance uncertainty into every selection round. Meanwhile, by aligning the free energy of target data compact around the source domain via a regularization term, domain gap can be implicitly diminished. Through extensive experiments, we show that EADA surpasses state-of-the-art methods on well-known challenging benchmarks with substantial improvements, making it a useful option in the open world. Code is available at //github.com/BIT-DA/EADA.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<li id='K5u9P'></li>

_{^{<dd id='8sqjt'><tbody id='eGpDv'><td id='LmfBk'><optgroup id='LStq9'><strong id='YGLm4'></strong></optgroup><address id='WDJ6R'><ul id='Kymof'></ul></address><big id='Xm1oc'></big></td><table id='cuskF'></table></tbody><pre id='OL8VW'></pre></dd><span id='JdVUN'><b id='AQeqG'></b></span>}}


<dfn id='xaURM'><optgroup id='odlZ8'></optgroup></dfn><tfoot id='aSRfl'><bdo id='25VzG'><div id='dd5la'></div><i id='1Q7Px'><dt id='g50CL'></dt></i></bdo></tfoot>

_{<fieldset id='NvT5Q'></fieldset>}