A级日本乱理伦片免费入口_国产一区二区日韩欧美在线_欧美日韩综合激情一区二区_超碰人人白浆国产五十_91精品国产91久久久久久青草_国产高清精品一区二区三区_91亚洲国产成人精品网站

from arxiv, 66 pages, 6 figures. Changes in V2: the presentation of the results was changed, extra experiments were added. Code: //github.com/IgorSokoloff/rr_with_compression_experiments_source_code

Gradient compression is a popular technique for improving communication complexity of stochastic first-order methods in distributed training of machine learning models. However, the existing works consider only with-replacement sampling of stochastic gradients. In contrast, it is well-known in practice and recently confirmed in theory that stochastic methods based on without-replacement sampling, e.g., Random Reshuffling (RR) method, perform better than ones that sample the gradients with-replacement. In this work, we close this gap in the literature and provide the first analysis of methods with gradient compression and without-replacement sampling. We first develop a na\"ive combination of random reshuffling with gradient compression (Q-RR). Perhaps surprisingly, but the theoretical analysis of Q-RR does not show any benefits of using RR. Our extensive numerical experiments confirm this phenomenon. This happens due to the additional compression variance. To reveal the true advantages of RR in the distributed learning with compression, we propose a new method called DIANA-RR that reduces the compression variance and has provably better convergence rates than existing counterparts with with-replacement sampling of stochastic gradients. Next, to have a better fit to Federated Learning applications, we incorporate local computation, i.e., we propose and analyze the variants of Q-RR and DIANA-RR -- Q-NASTYA and DIANA-NASTYA that use local gradient steps and different local and global stepsizes. Finally, we conducted several numerical experiments to illustrate our theoretical results.

相關內容

Better

關注 1

binary · Extensibility · CASE · Performer · 規范化的 ·

2022 年 12 月 23 日

An extension of the Unified Skew-Normal family of distributions and application to Bayesian binary regression

Paolo Onorati,Brunero Liseo

We consider the general problem of Bayesian binary regression and we introduce a new class of distributions, the Perturbed Unified Skew Normal (pSUN), which generalizes the SUN class. We show that the new class is conjugate to any binary regression model, provided that the link function may be expressed as a scale mixture of Gaussian densities. We discuss in detail the popular logit case, and we show that, when a logistic regression model is combined with a Gaussian prior, posterior summaries such as cumulants and normalizing constant, can be easily obtained, opening the way to straightforward variable selection procedures. For more general priors, the proposed methodology is based on a straightforward Gibbs sampler algorithm. We also claim that, in the p > n case, it shows better performances both in terms of mixing and accuracy, compared to the existing methods. We illustrate the performance of the proposal through a simulation study and two real datasets, one covering the standard case with n >> p and the other related to the p >> n situation.

Boosting（一種模型訓練加速方式） · 估計/估計量 · 值域 · 預測器/決策函數 · CASES ·

2022 年 12 月 21 日

Gradient boosting for extreme quantile regression

Jasper Velthoen,Clément Dombry,Juan-Juan Cai,Sebastian Engelke

Extreme quantile regression provides estimates of conditional quantiles outside the range of the data. Classical quantile regression performs poorly in such cases since data in the tail region are too scarce. Extreme value theory is used for extrapolation beyond the range of observed values and estimation of conditional extreme quantiles. Based on the peaks-over-threshold approach, the conditional distribution above a high threshold is approximated by a generalized Pareto distribution with covariate dependent parameters. We propose a gradient boosting procedure to estimate a conditional generalized Pareto distribution by minimizing its deviance. Cross-validation is used for the choice of tuning parameters such as the number of trees and the tree depths. We discuss diagnostic plots such as variable importance and partial dependence plots, which help to interpret the fitted models. In simulation studies we show that our gradient boosting procedure outperforms classical methods from quantile regression and extreme value theory, especially for high-dimensional predictor spaces and complex parameter response surfaces. An application to statistical post-processing of weather forecasts with precipitation data in the Netherlands is proposed.

泛函 · 凸函數 · 優化器 · 約束優化 · 約束 ·

2022 年 12 月 21 日

Efficient First-order Methods for Convex Optimization with Strongly Convex Function Constraints

Zhenwei Lin,Qi Deng

from arxiv, 27 pages, 3 figures

Convex function constrained optimization has received growing research interests lately. For a special convex problem which has strongly convex function constraints, we develop a new accelerated primal-dual first-order method that obtains an $\Ocal(1/\sqrt{\vep})$ complexity bound, improving the $\Ocal(1/{\vep})$ result for the state-of-the-art first-order methods. The key ingredient to our development is some novel techniques to progressively estimate the strong convexity of the Lagrangian function, which enables adaptive step-size selection and faster convergence performance. In addition, we show that the complexity is further improvable in terms of the dependence on some problem parameter, via a restart scheme that calls the accelerated method repeatedly. As an application, we consider sparsity-inducing constrained optimization which has a separable convex objective and a strongly convex loss constraint. In addition to achieving fast convergence, we show that the restarted method can effectively identify the sparsity pattern (active-set) of the optimal solution in finite steps. To the best of our knowledge, this is the first active-set identification result for sparsity-inducing constrained optimization.

優化器 · 可約的 · Performer · Analysis · Less ·

2022 年 12 月 21 日

Differentially Private Decentralized Optimization with Relay Communication

Luqing Wang,Luyao Guo,Shaofu Yang,Xinli Shi

To address the privacy leakage problem in decentralized composite convex optimization, we proposes a novel differentially private decentralized primal--dual algorithm named DP-RECAL with operator splitting method and relay communication mechanism. We study the relationship between communication and privacy leakage, thus defining a new measure: local communication involvement (LCI). To the best of our knowledge, compared with existing differentially private algorithms, DP-RECAL is the first to take advantage of the relay communication mechanism to experience less LCI so as to reduce the overall privacy budget. In addition, we prove that DP-RECAL is convergent with uncoordinated network-independent stepsizes and establish the linear convergence rate of DP-RECAL under metric subregularity. Furthermore, taking the least squares problem as an example, DP-RECAL presents better privacy performance and communication complexity than listed differentially private decentralized algorithms. Numerical experiments on real-world datasets verify our analysis results and demonstrate that DP-RECAL can defend deep leakage from gradients (DLG) attacks.

優化器 · Extensibility · 線性的 · 知識 (knowledge) · Learning ·

2022 年 12 月 21 日

On the Convergence of Momentum-Based Algorithms for Federated Bilevel Optimization Problems

Hongchang Gao

In this paper, we studied the federated bilevel optimization problem, which has widespread applications in machine learning. In particular, we developed two momentum-based algorithms for optimizing this kind of problem and established the convergence rate of our two algorithms, providing the sample and communication complexities. Importantly, to the best of our knowledge, our convergence rate is the first one achieving the linear speedup with respect to the number of devices for federated bilevel optimization algorithms. At last, our extensive experimental results confirm the effectiveness of our two algorithms.

結構化學習 · DAG · Learning · INFORMS · 有向 ·

2022 年 12 月 21 日

FedDAG: Federated DAG Structure Learning

Erdun Gao,Junjia Chen,Li Shen,Tongliang Liu,Mingming Gong,Howard Bondell

from arxiv, Accepted to Transactions on Machine Learning Research

To date, most directed acyclic graphs (DAGs) structure learning approaches require data to be stored in a central server. However, due to the consideration of privacy protection, data owners gradually refuse to share their personalized raw data to avoid private information leakage, making this task more troublesome by cutting off the first step. Thus, a puzzle arises: \textit{how do we discover the underlying DAG structure from decentralized data?} In this paper, focusing on the additive noise models (ANMs) assumption of data generation, we take the first step in developing a gradient-based learning framework named FedDAG, which can learn the DAG structure without directly touching the local data and also can naturally handle the data heterogeneity. Our method benefits from a two-level structure of each local model. The first level structure learns the edges and directions of the graph and communicates with the server to get the model information from other clients during the learning procedure, while the second level structure approximates the mechanisms among variables and personally updates on its own data to accommodate the data heterogeneity. Moreover, FedDAG formulates the overall learning task as a continuous optimization problem by taking advantage of an equality acyclicity constraint, which can be solved by gradient descent methods to boost the searching efficiency. Extensive experiments on both synthetic and real-world datasets verify the efficacy of the proposed method.

估計/估計量 · 核化 · Weight · MoDELS · 核函數 ·

2022 年 9 月 12 日

Locally Weighted Regression with different Kernel Smoothers for Software Effort Estimation

Yousef Alqasrawi,Mohammad Azzeh,Yousef Elsheikh

Estimating software effort has been a largely unsolved problem for decades. One of the main reasons that hinders building accurate estimation models is the often heterogeneous nature of software data with a complex structure. Typically, building effort estimation models from local data tends to be more accurate than using the entire data. Previous studies have focused on the use of clustering techniques and decision trees to generate local and coherent data that can help in building local prediction models. However, these approaches may fall short in some aspect due to limitations in finding optimal clusters and processing noisy data. In this paper we used a more sophisticated locality approach that can mitigate these shortcomings that is Locally Weighted Regression (LWR). This method provides an efficient solution to learn from local data by building an estimation model that combines multiple local regression models in k-nearest-neighbor based model. The main factor affecting the accuracy of this method is the choice of the kernel function used to derive the weights for local regression models. This paper investigates the effects of choosing different kernels on the performance of Locally Weighted Regression of a software effort estimation problem. After comprehensive experiments with 7 datasets, 10 kernels, 3 polynomial degrees and 4 bandwidth values with a total of 840 Locally Weighted Regression variants, we found that: 1) Uniform kernel functions cannot outperform non-uniform kernel functions, and 2) kernel type, polynomial degrees and bandwidth parameters have no specific effect on the estimation accuracy.

Learning · Neural Networks · Networking · 可約的 · Networks ·

2022 年 9 月 1 日

Learning with Differentiable Algorithms

Felix Petersen

from arxiv, PhD thesis (summa cum laude), University of Konstanz, 162 pages

Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.

優化器 · Extensibility · 最優化 · Automator · Neural Networks ·

2020 年 3 月 12 日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Tong Yu,Hong Zhu

Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.

優化器 · Machine Learning · MoDELS · 學成 · 數學優化 ·

2019 年 1 月 16 日

Optimization Models for Machine Learning: A Survey

Claudio Gambella,Bissan Ghaddar,Joe Naoum-Sawaya

This paper surveys the machine learning literature and presents machine learning as optimization models. Such models can benefit from the advancement of numerical optimization techniques which have already played a distinctive role in several machine learning settings. Particularly, mathematical optimization models are presented for commonly used machine learning approaches for regression, classification, clustering, and deep neural networks as well new emerging applications in machine teaching and empirical model learning. The strengths and the shortcomings of these models are discussed and potential research directions are highlighted.