亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive optimizers that are currently used for the fine tuning of deep learning models. Its underpinning theory relies on recent advances of Euler's polygonal approximations for stochastic differential equations (SDEs) with monotone coefficients. As a result, it inherits the stability properties of tamed algorithms, while it addresses other known issues, e.g. vanishing gradients in neural networks. In particular, we provide a nonasymptotic analysis and full theoretical guarantees for the convergence properties of an algorithm of this novel class, which we named TH$\varepsilon$O POULA (or, simply, TheoPouLa). Finally, several experiments are presented with different types of deep learning models, which show the superior performance of TheoPouLa over many popular adaptive optimization algorithms.

相關內容

神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(Neural Networks)是世界(jie)上三個最(zui)古老的(de)神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)建模(mo)學(xue)(xue)會的(de)檔案期刊:國際神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)學(xue)(xue)會(INNS)、歐洲神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)學(xue)(xue)會(ENNS)和(he)(he)(he)(he)日本神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)學(xue)(xue)會(JNNS)。神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)提供(gong)了一(yi)個論(lun)壇,以發(fa)(fa)展和(he)(he)(he)(he)培育一(yi)個國際社(she)會的(de)學(xue)(xue)者和(he)(he)(he)(he)實踐者感興趣(qu)的(de)所有(you)方面的(de)神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)和(he)(he)(he)(he)相關方法的(de)計(ji)算(suan)(suan)智能。神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)歡迎(ying)高質量(liang)論(lun)文(wen)的(de)提交(jiao),有(you)助(zhu)于(yu)(yu)全面的(de)神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)研究(jiu)(jiu),從行為和(he)(he)(he)(he)大腦建模(mo),學(xue)(xue)習算(suan)(suan)法,通(tong)過數學(xue)(xue)和(he)(he)(he)(he)計(ji)算(suan)(suan)分(fen)析,系統的(de)工程和(he)(he)(he)(he)技(ji)術(shu)應用,大量(liang)使(shi)用神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)的(de)概(gai)念和(he)(he)(he)(he)技(ji)術(shu)。這一(yi)獨特而廣泛的(de)范(fan)圍促進(jin)了生物和(he)(he)(he)(he)技(ji)術(shu)研究(jiu)(jiu)之間的(de)思想交(jiao)流,并(bing)有(you)助(zhu)于(yu)(yu)促進(jin)對生物啟(qi)發(fa)(fa)的(de)計(ji)算(suan)(suan)智能感興趣(qu)的(de)跨學(xue)(xue)科社(she)區的(de)發(fa)(fa)展。因(yin)此,神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)編(bian)委(wei)會代表(biao)的(de)專家(jia)領域包括心理(li)學(xue)(xue),神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)生物學(xue)(xue),計(ji)算(suan)(suan)機科學(xue)(xue),工程,數學(xue)(xue),物理(li)。該(gai)雜(za)志發(fa)(fa)表(biao)文(wen)章、信件(jian)(jian)和(he)(he)(he)(he)評論(lun)以及給編(bian)輯的(de)信件(jian)(jian)、社(she)論(lun)、時(shi)事、軟件(jian)(jian)調(diao)查和(he)(he)(he)(he)專利信息(xi)。文(wen)章發(fa)(fa)表(biao)在五個部分(fen)之一(yi):認知科學(xue)(xue),神(shen)(shen)(shen)(shen)經(jing)(jing)(jing)科學(xue)(xue),學(xue)(xue)習系統,數學(xue)(xue)和(he)(he)(he)(he)計(ji)算(suan)(suan)分(fen)析、工程和(he)(he)(he)(he)應用。 官網(wang)(wang)(wang)地址(zhi):

Due to their increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence. Many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identified and a variety of approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. A comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and not reducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks, ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for the calibration of neural networks and give an overview of existing baselines and implementations. Different examples from the wide spectrum of challenges in different fields give an idea of the needs and challenges regarding uncertainties in practical applications. Additionally, the practical limitations of current methods for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given.

Despite their overwhelming capacity to overfit, deep neural networks trained by specific optimization algorithms tend to generalize well to unseen data. Recently, researchers explained it by investigating the implicit regularization effect of optimization algorithms. A remarkable progress is the work (Lyu&Li, 2019), which proves gradient descent (GD) maximizes the margin of homogeneous deep neural networks. Except GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process. However, theoretical guarantee for the generalization of adaptive optimization algorithms is still lacking. In this paper, we study the implicit regularization of adaptive optimization algorithms when they are optimizing the logistic loss on homogeneous deep neural networks. We prove that adaptive algorithms that adopt exponential moving average strategy in conditioner (such as Adam and RMSProp) can maximize the margin of the neural network, while AdaGrad that directly sums historical squared gradients in conditioner can not. It indicates superiority on generalization of exponential moving average strategy in the design of the conditioner. Technically, we provide a unified framework to analyze convergent direction of adaptive optimization algorithms by constructing novel adaptive gradient flow and surrogate margin. Our experiments can well support the theoretical findings on convergent direction of adaptive optimization algorithms.

While many existing graph neural networks (GNNs) have been proven to perform $\ell_2$-based graph smoothing that enforces smoothness globally, in this work we aim to further enhance the local smoothness adaptivity of GNNs via $\ell_1$-based graph smoothing. As a result, we introduce a family of GNNs (Elastic GNNs) based on $\ell_1$ and $\ell_2$-based graph smoothing. In particular, we propose a novel and general message passing scheme into GNNs. This message passing algorithm is not only friendly to back-propagation training but also achieves the desired smoothing properties with a theoretical convergence guarantee. Experiments on semi-supervised learning tasks demonstrate that the proposed Elastic GNNs obtain better adaptivity on benchmark datasets and are significantly robust to graph adversarial attacks. The implementation of Elastic GNNs is available at \url{//github.com/lxiaorui/ElasticGNN}.

Despite the considerable success of neural networks in security settings such as malware detection, such models have proved vulnerable to evasion attacks, in which attackers make slight changes to inputs (e.g., malware) to bypass detection. We propose a novel approach, \emph{Fourier stabilization}, for designing evasion-robust neural networks with binary inputs. This approach, which is complementary to other forms of defense, replaces the weights of individual neurons with robust analogs derived using Fourier analytic tools. The choice of which neurons to stabilize in a neural network is then a combinatorial optimization problem, and we propose several methods for approximately solving it. We provide a formal bound on the per-neuron drop in accuracy due to Fourier stabilization, and experimentally demonstrate the effectiveness of the proposed approach in boosting robustness of neural networks in several detection settings. Moreover, we show that our approach effectively composes with adversarial training.

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

Many real-world problems can be represented as graph-based learning problems. In this paper, we propose a novel framework for learning spatial and attentional convolution neural networks on arbitrary graphs. Different from previous convolutional neural networks on graphs, we first design a motif-matching guided subgraph normalization method to capture neighborhood information. Then we implement subgraph-level self-attentional layers to learn different importances from different subgraphs to solve graph classification problems. Analogous to image-based attentional convolution networks that operate on locally connected and weighted regions of the input, we also extend graph normalization from one-dimensional node sequence to two-dimensional node grid by leveraging motif-matching, and design self-attentional layers without requiring any kinds of cost depending on prior knowledge of the graph structure. Our results on both bioinformatics and social network datasets show that we can significantly improve graph classification benchmarks over traditional graph kernel and existing deep models.

We propose accelerated randomized coordinate descent algorithms for stochastic optimization and online learning. Our algorithms have significantly less per-iteration complexity than the known accelerated gradient algorithms. The proposed algorithms for online learning have better regret performance than the known randomized online coordinate descent algorithms. Furthermore, the proposed algorithms for stochastic optimization exhibit as good convergence rates as the best known randomized coordinate descent algorithms. We also show simulation results to demonstrate performance of the proposed algorithms.

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

北京阿比特科技有限公司