亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This article proposes a sparse computation-based method for optimizing neural networks for reinforcement learning (RL) tasks. This method combines two ideas: neural network pruning and taking into account input data correlations; it makes it possible to update neuron states only when changes in them exceed a certain threshold. It significantly reduces the number of multiplications when running neural networks. We tested different RL tasks and achieved 20-150x reduction in the number of multiplications. There were no substantial performance losses; sometimes the performance even improved.

相關內容

神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)(Neural Networks)是世界(jie)上三個最(zui)古老的(de)(de)(de)(de)(de)(de)神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)建(jian)模學(xue)(xue)會(hui)(hui)的(de)(de)(de)(de)(de)(de)檔案期刊:國際神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)學(xue)(xue)會(hui)(hui)(INNS)、歐洲神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)學(xue)(xue)會(hui)(hui)(ENNS)和(he)日本神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)學(xue)(xue)會(hui)(hui)(JNNS)。神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)提供了一(yi)個論(lun)壇,以發(fa)(fa)展(zhan)和(he)培育一(yi)個國際社(she)會(hui)(hui)的(de)(de)(de)(de)(de)(de)學(xue)(xue)者(zhe)和(he)實踐者(zhe)感興趣的(de)(de)(de)(de)(de)(de)所有方(fang)面(mian)的(de)(de)(de)(de)(de)(de)神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)和(he)相關(guan)方(fang)法(fa)的(de)(de)(de)(de)(de)(de)計(ji)算(suan)(suan)智能(neng)。神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)歡(huan)迎高(gao)質量(liang)論(lun)文(wen)的(de)(de)(de)(de)(de)(de)提交,有助(zhu)于全面(mian)的(de)(de)(de)(de)(de)(de)神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)研究(jiu),從行(xing)為和(he)大(da)(da)腦建(jian)模,學(xue)(xue)習(xi)算(suan)(suan)法(fa),通(tong)過(guo)數學(xue)(xue)和(he)計(ji)算(suan)(suan)分(fen)析,系統的(de)(de)(de)(de)(de)(de)工(gong)(gong)(gong)程和(he)技(ji)(ji)術應(ying)用(yong),大(da)(da)量(liang)使用(yong)神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)的(de)(de)(de)(de)(de)(de)概念和(he)技(ji)(ji)術。這一(yi)獨特而(er)廣(guang)泛的(de)(de)(de)(de)(de)(de)范圍促(cu)進(jin)了生(sheng)物和(he)技(ji)(ji)術研究(jiu)之間的(de)(de)(de)(de)(de)(de)思想交流(liu),并有助(zhu)于促(cu)進(jin)對生(sheng)物啟發(fa)(fa)的(de)(de)(de)(de)(de)(de)計(ji)算(suan)(suan)智能(neng)感興趣的(de)(de)(de)(de)(de)(de)跨學(xue)(xue)科(ke)(ke)社(she)區(qu)的(de)(de)(de)(de)(de)(de)發(fa)(fa)展(zhan)。因此,神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)編委會(hui)(hui)代(dai)表的(de)(de)(de)(de)(de)(de)專家領(ling)域(yu)包括心理(li)學(xue)(xue),神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)生(sheng)物學(xue)(xue),計(ji)算(suan)(suan)機科(ke)(ke)學(xue)(xue),工(gong)(gong)(gong)程,數學(xue)(xue),物理(li)。該雜(za)志(zhi)發(fa)(fa)表文(wen)章、信(xin)(xin)件和(he)評(ping)論(lun)以及給(gei)編輯的(de)(de)(de)(de)(de)(de)信(xin)(xin)件、社(she)論(lun)、時事(shi)、軟件調查和(he)專利信(xin)(xin)息。文(wen)章發(fa)(fa)表在五(wu)個部分(fen)之一(yi):認知科(ke)(ke)學(xue)(xue),神(shen)(shen)(shen)經(jing)(jing)(jing)(jing)(jing)科(ke)(ke)學(xue)(xue),學(xue)(xue)習(xi)系統,數學(xue)(xue)和(he)計(ji)算(suan)(suan)分(fen)析、工(gong)(gong)(gong)程和(he)應(ying)用(yong)。 官網(wang)(wang)(wang)(wang)地址:

We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying Markov random processes parameterized by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated "faster" (using bigger step sizes) than the latter (using smaller step sizes). Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, convexity, the Polyak-Lojasiewicz condition, and general non-convexity. We apply our framework to two problems in control and reinforcement learning. First, we look at the standard online actor-critic algorithm over finite state and action spaces and derive a convergence rate of O(k^(-2/5)), which recovers the best known rate derived specifically for this problem. Second, we study an online actor-critic algorithm for the linear-quadratic regulator and show that a convergence rate of O(k^(-2/3)) is achieved. This is the first time such a result is known in the literature. Finally, we support our theoretical analysis with numerical simulations where the convergence rates are visualized.

In this paper we introduce a new approach to discrete-time semi-Markov decision processes based on the sojourn time process. Different characterizations of discrete-time semi-Markov processes are exploited and decision processes are constructed by their means. With this new approach, the agent is allowed to consider different actions depending also on the sojourn time of the process in the current state. A numerical method based on $Q$-learning algorithms for finite horizon reinforcement learning and stochastic recursive relations is investigated. Finally, we consider two toy examples: one in which the reward depends on the sojourn-time, according to the gambler's fallacy; the other in which the environment is semi-Markov even if the reward function does not depend on the sojourn time. These are used to carry on some numerical evaluations on the previously presented $Q$-learning algorithm and on a different naive method based on deep reinforcement learning.

We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable generative models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basketball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.

Graph mining tasks arise from many different application domains, ranging from social networks, transportation, E-commerce, etc., which have been receiving great attention from the theoretical and algorithm design communities in recent years, and there has been some pioneering work using the hotly researched reinforcement learning (RL) techniques to address graph data mining tasks. However, these graph mining algorithms and RL models are dispersed in different research areas, which makes it hard to compare different algorithms with each other. In this survey, we provide a comprehensive overview of RL models and graph mining and generalize these algorithms to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method description, open-source codes, and benchmark datasets of GRL methods. Finally, we propose possible important directions and challenges to be solved in the future. This is the latest work on a comprehensive survey of GRL literature, and this work provides a global view for researchers as well as a learning resource for researchers outside the domain. In addition, we create an online open-source for both interested researchers who want to enter this rapidly developing domain and experts who would like to compare GRL methods.

The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.

This paper surveys the field of transfer learning in the problem setting of Reinforcement Learning (RL). RL has been the key solution to sequential decision-making problems. Along with the fast advance of RL in various domains. including robotics and game-playing, transfer learning arises as an important technique to assist RL by leveraging and transferring external expertise to boost the learning process. In this survey, we review the central issues of transfer learning in the RL domain, providing a systematic categorization of its state-of-the-art techniques. We analyze their goals, methodologies, applications, and the RL frameworks under which these transfer learning techniques would be approachable. We discuss the relationship between transfer learning and other relevant topics from an RL perspective and also explore the potential challenges as well as future development directions for transfer learning in RL.

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

Deep convolutional neural networks (CNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and sharing will be described at the beginning, after that the other techniques will be introduced. For each scheme, we provide insightful analysis regarding the performance, related applications, advantages, and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrix, the main datasets used for evaluating the model performance and recent benchmarking efforts. Finally, we conclude this paper, discuss remaining challenges and possible directions on this topic.

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.

In this paper, we propose a deep reinforcement learning framework called GCOMB to learn algorithms that can solve combinatorial problems over large graphs. GCOMB mimics the greedy algorithm in the original problem and incrementally constructs a solution. The proposed framework utilizes Graph Convolutional Network (GCN) to generate node embeddings that predicts the potential nodes in the solution set from the entire node set. These embeddings enable an efficient training process to learn the greedy policy via Q-learning. Through extensive evaluation on several real and synthetic datasets containing up to a million nodes, we establish that GCOMB is up to 41% better than the state of the art, up to seven times faster than the greedy algorithm, robust and scalable to large dynamic networks.

北京阿比特科技有限公司