国产欧美日韩视频一区二区_男女一边脱一边亲一边膜_亚洲视频华人在线播放_一区二区三区国产精品_涩涩伊人久久无码欧美_黄色视频一级真人_乱女乱妇熟女熟妇综合网站

We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive optimizers that are currently used for the fine tuning of deep learning models. Its underpinning theory relies on recent advances of Euler's polygonal approximations for stochastic differential equations (SDEs) with monotone coefficients. As a result, it inherits the stability properties of tamed algorithms, while it addresses other known issues, e.g. vanishing gradients in neural networks. In particular, we provide a nonasymptotic analysis and full theoretical guarantees for the convergence properties of an algorithm of this novel class, which we named TH$\varepsilon$O POULA (or, simply, TheoPouLa). Finally, several experiments are presented with different types of deep learning models, which show the superior performance of TheoPouLa over many popular adaptive optimization algorithms.

相關內容

Neural Networks

關注 1648

神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)（Neural Networks）是世界(jie)上三個最(zui)古老的(de)神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)建模(mo)學(xue)(xue)會的(de)檔案期刊:國際神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)學(xue)(xue)會(INNS)、歐洲神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)學(xue)(xue)會(ENNS)和(he)(he)(he)(he)日本神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)學(xue)(xue)會(JNNS)。神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)提供(gong)了一(yi)個論(lun)壇，以發(fa)(fa)展和(he)(he)(he)(he)培育一(yi)個國際社(she)會的(de)學(xue)(xue)者和(he)(he)(he)(he)實踐者感興趣(qu)的(de)所有(you)方面的(de)神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)和(he)(he)(he)(he)相關方法的(de)計(ji)算(suan)(suan)智能。神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)歡迎(ying)高質量(liang)論(lun)文(wen)的(de)提交(jiao)，有(you)助(zhu)于(yu)(yu)全面的(de)神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)研究(jiu)(jiu)，從行為和(he)(he)(he)(he)大腦建模(mo)，學(xue)(xue)習算(suan)(suan)法，通(tong)過數學(xue)(xue)和(he)(he)(he)(he)計(ji)算(suan)(suan)分(fen)析，系統的(de)工程和(he)(he)(he)(he)技(ji)術(shu)應用，大量(liang)使(shi)用神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)的(de)概(gai)念和(he)(he)(he)(he)技(ji)術(shu)。這一(yi)獨特而廣泛的(de)范(fan)圍促進(jin)了生物和(he)(he)(he)(he)技(ji)術(shu)研究(jiu)(jiu)之間的(de)思想交(jiao)流，并(bing)有(you)助(zhu)于(yu)(yu)促進(jin)對生物啟(qi)發(fa)(fa)的(de)計(ji)算(suan)(suan)智能感興趣(qu)的(de)跨學(xue)(xue)科社(she)區的(de)發(fa)(fa)展。因(yin)此，神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)編(bian)委(wei)會代表(biao)的(de)專家(jia)領域包括心理(li)學(xue)(xue)，神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)生物學(xue)(xue)，計(ji)算(suan)(suan)機科學(xue)(xue)，工程，數學(xue)(xue)，物理(li)。該(gai)雜(za)志發(fa)(fa)表(biao)文(wen)章、信件(jian)(jian)和(he)(he)(he)(he)評論(lun)以及給編(bian)輯的(de)信件(jian)(jian)、社(she)論(lun)、時(shi)事、軟件(jian)(jian)調(diao)查和(he)(he)(he)(he)專利信息(xi)。文(wen)章發(fa)(fa)表(biao)在五個部分(fen)之一(yi):認知科學(xue)(xue)，神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)科學(xue)(xue)，學(xue)(xue)習系統，數學(xue)(xue)和(he)(he)(he)(he)計(ji)算(suan)(suan)分(fen)析、工程和(he)(he)(he)(he)應用。官網(wang)(wang)(wang)地址(zhi)：

Neural Networks · Networking · 可約的 · 估計/估計量 · 可辨認的 ·

2021 年 7 月 7 日

A Survey of Uncertainty in Deep Neural Networks

Jakob Gawlikowski,Cedrique Rovile Njieutcheu Tassi,Mohsin Ali,Jongseok Lee,Matthias Humt,Jianxiang Feng,Anna Kruspe,Rudolph Triebel,Peter Jung,Ribana Roscher,Muhammad Shahzad,Wen Yang,Richard Bamler,Xiao Xiang Zhu

Due to their increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence. Many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identified and a variety of approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. A comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and not reducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks, ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for the calibration of neural networks and give an overview of existing baselines and implementations. Different examples from the wide spectrum of challenges in different fields give an idea of the needs and challenges regarding uncertainties in practical applications. Additionally, the practical limitations of current methods for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given.

優化器 · Neural Networks · AdaGrad · 移動平均 · 泛化理論 ·

2021 年 7 月 5 日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Bohan Wang,Qi Meng,Wei Chen,Tie-Yan Liu

from arxiv, ICML 2021

Despite their overwhelming capacity to overfit, deep neural networks trained by specific optimization algorithms tend to generalize well to unseen data. Recently, researchers explained it by investigating the implicit regularization effect of optimization algorithms. A remarkable progress is the work (Lyu&Li, 2019), which proves gradient descent (GD) maximizes the margin of homogeneous deep neural networks. Except GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process. However, theoretical guarantee for the generalization of adaptive optimization algorithms is still lacking. In this paper, we study the implicit regularization of adaptive optimization algorithms when they are optimizing the logistic loss on homogeneous deep neural networks. We prove that adaptive algorithms that adopt exponential moving average strategy in conditioner (such as Adam and RMSProp) can maximize the margin of the neural network, while AdaGrad that directly sums historical squared gradients in conditioner can not. It indicates superiority on generalization of exponential moving average strategy in the design of the conditioner. Technically, we provide a unified framework to analyze convergent direction of adaptive optimization algorithms by constructing novel adaptive gradient flow and surrogate margin. Our experiments can well support the theoretical findings on convergent direction of adaptive optimization algorithms.

平滑 · 圖 · 圖形處理器 · Neural Networks · Networking ·

2021 年 7 月 5 日

Elastic Graph Neural Networks

Xiaorui Liu,Wei Jin,Yao Ma,Yaxin Li,Hua Liu,Yiqi Wang,Ming Yan,Jiliang Tang

from arxiv, ICML 2021 (International Conference on Machine Learning)

While many existing graph neural networks (GNNs) have been proven to perform $\ell_2$-based graph smoothing that enforces smoothness globally, in this work we aim to further enhance the local smoothness adaptivity of GNNs via $\ell_1$-based graph smoothing. As a result, we introduce a family of GNNs (Elastic GNNs) based on $\ell_1$ and $\ell_2$-based graph smoothing. In particular, we propose a novel and general message passing scheme into GNNs. This message passing algorithm is not only friendly to back-propagation training but also achieves the desired smoothing properties with a theoretical convergence guarantee. Experiments on semi-supervised learning tasks demonstrate that the proposed Elastic GNNs obtain better adaptivity on benchmark datasets and are significantly robust to graph adversarial attacks. The implementation of Elastic GNNs is available at \url{//github.com/lxiaorui/ElasticGNN}.

Neural Networks · 穩健性 · Networking · binary · Weight ·

2021 年 6 月 8 日

Enhancing Robustness of Neural Networks through Fourier Stabilization

Netanel Raviv,Aidan Kelley,Michael Guo,Yevgeny Vorobeychik

from arxiv, Full version of an ICML 2021 paper

Despite the considerable success of neural networks in security settings such as malware detection, such models have proved vulnerable to evasion attacks, in which attackers make slight changes to inputs (e.g., malware) to bypass detection. We propose a novel approach, \emph{Fourier stabilization}, for designing evasion-robust neural networks with binary inputs. This approach, which is complementary to other forms of defense, replaces the weights of individual neurons with robust analogs derived using Fourier analytic tools. The choice of which neurons to stabilize in a neural network is then a combinatorial optimization problem, and we propose several methods for approximately solving it. We provide a formal bound on the per-neuron drop in accuracy due to Fourier stabilization, and experimentally demonstrate the effectiveness of the proposed approach in boosting robustness of neural networks in several detection settings. Moreover, we show that our approach effectively composes with adversarial training.

采樣法 · 方差 · 圖形處理器 · INFORMS · 泛化理論 ·

2020 年 6 月 24 日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Weilin Cong,Rana Forsati,Mahmut Kandemir,Mehrdad Mahdavi

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

Neural Networks · 優化器 · Networks · 局部極小 · Networking ·

2019 年 12 月 19 日

Optimization for deep learning: theory and algorithms

Ruoyu Sun

from arxiv, 38 pages of main body; 5 pages of appendix; 12 pages of references

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

圖 · Networking · Neural Networks · 卷積 · 注意力機制 ·

2019 年 2 月 25 日

Graph Convolutional Neural Networks via Motif-based Attention

Hao Peng,Jianxin Li,Qiran Gong,Senzhang Wang,Yuanxing Ning,Philip S. Yu

Many real-world problems can be represented as graph-based learning problems. In this paper, we propose a novel framework for learning spatial and attentional convolution neural networks on arbitrary graphs. Different from previous convolutional neural networks on graphs, we first design a motif-matching guided subgraph normalization method to capture neighborhood information. Then we implement subgraph-level self-attentional layers to learn different importances from different subgraphs to solve graph classification problems. Analogous to image-based attentional convolution networks that operate on locally connected and weighted regions of the input, we also extend graph normalization from one-dimensional node sequence to two-dimensional node grid by leveraging motif-matching, and design self-attentional layers without requiring any kinds of cost depending on prior knowledge of the graph structure. Our results on both bioinformatics and social network datasets show that we can significantly improve graph classification benchmarks over traditional graph kernel and existing deep models.

坐標下降 · 優化器 · Performer · 學成 · 在線 ·

2018 年 7 月 16 日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Akshita Bhandari,Chandramani Singh

from arxiv, 20 pages, 4 figures, 2 tables

We propose accelerated randomized coordinate descent algorithms for stochastic optimization and online learning. Our algorithms have significantly less per-iteration complexity than the known accelerated gradient algorithms. The proposed algorithms for online learning have better regret performance than the known randomized online coordinate descent algorithms. Furthermore, the proposed algorithms for stochastic optimization exhibit as good convergence rates as the best known randomized coordinate descent algorithms. We also show simulation results to demonstrate performance of the proposed algorithms.

優化器 · Lipschitz連續 · 正則化項 · Continuity · Lipschitz ·

2018 年 6 月 1 日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Kevin Scaman,Francis Bach,Sébastien Bubeck,Yin Tat Lee,Laurent Massoulié

from arxiv, 17 pages

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.