2020久久精品亚洲热综合_日韩一区二区综合精品_一级黄色视频一区_色婷婷AV无码久久精品_亚洲精品无码乱码成人爱色_欧美精品一区二区不卡_亚洲中文字幕精品久久吃奶水

Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. In many cases, the gradient of any such loss function has superlinear growth, making the use of the widely-accepted (stochastic) gradient descent methods, which are based on Euler numerical schemes, problematic. We offer a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted stochastic Langevin algorithm (TUSLA). We also provide a nonasymptotic analysis of the new algorithm's convergence properties in the context of non-convex learning problems with the use of ANNs. Thus, we provide finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks. The roots of the TUSLA algorithm are based on the taming technology for diffusion processes with superlinear coefficients as developed in \citet{tamed-euler, SabanisAoAP} and for MCMC algorithms in \citet{tula}. Numerical experiments are presented which confirm the theoretical findings and illustrate the need for the use of the new algorithm in comparison to vanilla SGLD within the framework of ANNs.

相關內容

Neural Networks

關注 1648

神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)(luo)（Neural Networks）是(shi)世界上三個(ge)最古(gu)老(lao)的(de)(de)(de)(de)神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)建模學(xue)(xue)(xue)會的(de)(de)(de)(de)檔案期刊:國際神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)(luo)學(xue)(xue)(xue)會(INNS)、歐(ou)洲神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)(luo)學(xue)(xue)(xue)會(ENNS)和(he)(he)日本神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)(luo)學(xue)(xue)(xue)會(JNNS)。神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)(luo)提供了一個(ge)論壇，以(yi)(yi)發(fa)展和(he)(he)培育一個(ge)國際社(she)會的(de)(de)(de)(de)學(xue)(xue)(xue)者和(he)(he)實踐(jian)者感(gan)興趣的(de)(de)(de)(de)所有方面的(de)(de)(de)(de)神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)(luo)和(he)(he)相關方法的(de)(de)(de)(de)計(ji)算智能。神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)(luo)歡迎高質量論文(wen)的(de)(de)(de)(de)提交，有助于全面的(de)(de)(de)(de)神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)(luo)研究(jiu)，從行為和(he)(he)大腦建模，學(xue)(xue)(xue)習(xi)(xi)算法，通過數(shu)(shu)學(xue)(xue)(xue)和(he)(he)計(ji)算分析，系統的(de)(de)(de)(de)工(gong)程和(he)(he)技(ji)術應用，大量使用神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)(luo)的(de)(de)(de)(de)概念和(he)(he)技(ji)術。這一獨特而廣(guang)泛的(de)(de)(de)(de)范(fan)圍促進了生(sheng)物(wu)和(he)(he)技(ji)術研究(jiu)之間(jian)的(de)(de)(de)(de)思想交流，并(bing)有助于促進對生(sheng)物(wu)啟發(fa)的(de)(de)(de)(de)計(ji)算智能感(gan)興趣的(de)(de)(de)(de)跨(kua)學(xue)(xue)(xue)科社(she)區(qu)的(de)(de)(de)(de)發(fa)展。因此，神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)(luo)編(bian)委會代表的(de)(de)(de)(de)專(zhuan)家領域包括心(xin)理學(xue)(xue)(xue)，神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)生(sheng)物(wu)學(xue)(xue)(xue)，計(ji)算機科學(xue)(xue)(xue)，工(gong)程，數(shu)(shu)學(xue)(xue)(xue)，物(wu)理。該(gai)雜(za)志發(fa)表文(wen)章、信(xin)件和(he)(he)評論以(yi)(yi)及給編(bian)輯的(de)(de)(de)(de)信(xin)件、社(she)論、時(shi)事、軟(ruan)件調查和(he)(he)專(zhuan)利(li)信(xin)息。文(wen)章發(fa)表在五個(ge)部分之一:認(ren)知(zhi)科學(xue)(xue)(xue)，神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)(jing)(jing)(jing)科學(xue)(xue)(xue)，學(xue)(xue)(xue)習(xi)(xi)系統，數(shu)(shu)學(xue)(xue)(xue)和(he)(he)計(ji)算分析、工(gong)程和(he)(he)應用。官網(wang)(wang)地址：

優化器 · 估計/估計量 · 控制器 · 學成 · 強化學習 ·

2022 年 4 月 20 日

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Sihan Zeng,Thinh T. Doan,Justin Romberg

We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying Markov random processes parameterized by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated "faster" (using bigger step sizes) than the latter (using smaller step sizes). Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, convexity, the Polyak-Lojasiewicz condition, and general non-convexity. We apply our framework to two problems in control and reinforcement learning. First, we look at the standard online actor-critic algorithm over finite state and action spaces and derive a convergence rate of O(k^(-2/5)), which recovers the best known rate derived specifically for this problem. Second, we study an online actor-critic algorithm for the linear-quadratic regulator and show that a convergence rate of O(k^(-2/3)) is achieved. This is the first time such a result is known in the literature. Finally, we support our theoretical analysis with numerical simulations where the convergence rates are visualized.

Performer · 學成 · 在線 · Networking · 可約的 ·

2022 年 4 月 20 日

Online Caching with no Regret: Optimistic Learning via Recommendations

Naram Mhaisen,George Iosifidis,Douglas Leith

from arxiv, arXiv admin note: substantial text overlap with arXiv:2202.10590

The design of effective online caching policies is an increasingly important problem for content distribution networks, online social networks and edge computing services, among other areas. This paper proposes a new algorithmic toolbox for tackling this problem through the lens of optimistic online learning. We build upon the Follow-the-Regularized-Leader (FTRL) framework, which is developed further here to include predictions for the file requests, and we design online caching algorithms for bipartite networks with fixed-size caches or elastic leased caches subject to time-average budget constraints. The predictions are provided by a content recommendation system that influences the users viewing activity and hence can naturally reduce the caching network's uncertainty about future requests. We also extend the framework to learn and utilize the best request predictor in cases where many are available. We prove that the proposed {optimistic} learning caching policies can achieve sub-zero performance loss (regret) for perfect predictions, and maintain the sub-linear regret bound $O(\sqrt T)$, which is the best achievable bound for policies that do not use predictions, even for arbitrary-bad predictions. The performance of the proposed algorithms is evaluated with detailed trace-driven numerical tests.

優化器 · 離散化 · Networking · Neural Networks · 學成 ·

2022 年 4 月 18 日

An Optimal Time Variable Learning Framework for Deep Neural Networks

Harbir Antil,Hugo Díaz,Evelyn Herberg

Feature propagation in Deep Neural Networks (DNNs) can be associated to nonlinear discrete dynamical systems. The novelty, in this paper, lies in letting the discretization parameter (time step-size) vary from layer to layer, which needs to be learned, in an optimization framework. The proposed framework can be applied to any of the existing networks such as ResNet, DenseNet or Fractional-DNN. This framework is shown to help overcome the vanishing and exploding gradient issues. Stability of some of the existing continuous DNNs such as Fractional-DNN is also studied. The proposed approach is applied to an ill-posed 3D-Maxwell's equation.

Neural Networks · 學成 · Networking · 損失函數（機器學習） · 近似 ·

2022 年 4 月 18 日

A Deep Learning Galerkin Method for the Closed-Loop Geothermal System

Wen Zhang,Jian Li

from arxiv, 29 pages, 7 figures, 1 tables

There has been an arising trend of adopting deep learning methods to study partial differential equations (PDEs). This article is to propose a Deep Learning Galerkin Method (DGM) for the closed-loop geothermal system, which is a new coupled multi-physics PDEs and mainly consists of a framework of underground heat exchange pipelines to extract the geothermal heat from the geothermal reservoir. This method is a natural combination of Galerkin Method and machine learning with the solution approximated by a neural network instead of a linear combination of basis functions. We train the neural network by randomly sampling the spatiotemporal points and minimize loss function to satisfy the differential operators, initial condition, boundary and interface conditions. Moreover, the approximate ability of the neural network is proved by the convergence of the loss function and the convergence of the neural network to the exact solution in L^2 norm under certain conditions. Finally, some numerical examples are carried out to demonstrate the approximation ability of the neural networks intuitively.

方差減小 · 平均梯度 · 估計/估計量 · Batch Size · contrastive ·

2022 年 4 月 17 日

Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization

Gideon Dresdner,Maria-Luiza Vladarean,Gunnar R?tsch,Francesco Locatello,Volkan Cevher,Alp Yurtsever

from arxiv, Artificial Intelligence and Statistics (AISTATS) 2022

We propose a stochastic conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. Existing CGM variants for this template either suffer from slow convergence rates, or require carefully increasing the batch size over the course of the algorithm's execution, which leads to computing full gradients. In contrast, the proposed method, equipped with a stochastic average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques. In applications we put special emphasis on problems with a large number of separable constraints. Such problems are prevalent among semidefinite programming (SDP) formulations arising in machine learning and theoretical computer science. We provide numerical experiments on matrix completion, unsupervised clustering, and sparsest-cut SDPs.

方差減小 · 可約的 · 方差 · 優化器 · Batch Size ·

2022 年 4 月 16 日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Yuri Kinoshita,Taiji Suzuki

The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications. Especially, its variance reduced versions have nowadays gained particular attention. In this paper, we study two variants of this kind, namely, the Stochastic Variance Reduced Gradient Langevin Dynamics and the Stochastic Recursive Gradient Langevin Dynamics. We prove their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality which are weaker conditions than those used in prior works for these algorithms. With the batch size and the inner loop length set to $\sqrt{n}$, the gradient complexity to achieve an $\epsilon$-precision is $\tilde{O}((n+dn^{1/2}\epsilon^{-1})\gamma^2 L^2\alpha^{-2})$, which is an improvement from any previous analyses. We also show some essential applications of our result to non-convex optimization.

殘差網絡 · Networking · 正則化項 · 泛函 · 層 ·

2022 年 4 月 14 日

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

Rama Cont,Alain Rossier,RenYuan Xu

We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.

MoDELS · 學成 · Networking · 動力系統 · Neural Networks ·

2022 年 2 月 4 日

On Neural Differential Equations

Patrick Kidger

from arxiv, Doctoral thesis, Mathematical Institute, University of Oxford. 231 pages

The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.

Neural Networks · Networking · Networks · MoDELS · Notability ·

2021 年 2 月 10 日

Dynamic Neural Networks: A Survey

Yizeng Han,Gao Huang,Shiji Song,Le Yang,Honghui Wang,Yulin Wang

Dynamic neural network is an emerging research topic in deep learning. Compared to static models which have fixed computational graphs and parameters at the inference stage, dynamic networks can adapt their structures or parameters to different inputs, leading to notable advantages in terms of accuracy, computational efficiency, adaptiveness, etc. In this survey, we comprehensively review this rapidly developing area by dividing dynamic networks into three main categories: 1) instance-wise dynamic models that process each instance with data-dependent architectures or parameters; 2) spatial-wise dynamic networks that conduct adaptive computation with respect to different spatial locations of image data and 3) temporal-wise dynamic models that perform adaptive inference along the temporal dimension for sequential data such as videos and texts. The important research problems of dynamic networks, e.g., architecture design, decision making scheme, optimization technique and applications, are reviewed systematically. Finally, we discuss the open problems in this field together with interesting future research directions.

Neural Networks · 優化器 · Networks · 局部極小 · Networking ·

2019 年 12 月 19 日

Optimization for deep learning: theory and algorithms

Ruoyu Sun

from arxiv, 38 pages of main body; 5 pages of appendix; 12 pages of references

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.