五月丁香四月婷婷激情综合_尹人香蕉网在线视频观看_国产精品亚洲天堂免_迷奷国产在线播放免费_亚洲欧美日韩免费高清在线_在线观看你懂的网站免费不卡_久久久久久精品一级毛片

Recent literature established that neural networks can represent good policies across a range of stochastic dynamic models in supply chain and logistics. We propose a new algorithm that incorporates variance reduction techniques, to overcome limitations of algorithms typically employed in literature to learn such neural network policies. For the classical lost sales inventory model, the algorithm learns neural network policies that are vastly superior to those learned using model-free algorithms, while outperforming the best heuristic benchmarks by an order of magnitude. The algorithm is an interesting candidate to apply to other stochastic dynamic problems in supply chain and logistics, because the ideas in its development are generic.

相關內容

Neural Networks

關注 1648

神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)(luo)（Neural Networks）是世界上三個最古(gu)老的(de)(de)(de)神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)建模學(xue)會的(de)(de)(de)檔(dang)案期刊(kan):國際(ji)神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)(luo)學(xue)會(INNS)、歐洲神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)(luo)學(xue)會(ENNS)和(he)(he)日本神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)(luo)學(xue)會(JNNS)。神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)(luo)提供了一個論壇，以發(fa)展和(he)(he)培(pei)育一個國際(ji)社(she)(she)會的(de)(de)(de)學(xue)者(zhe)和(he)(he)實(shi)踐者(zhe)感興趣(qu)的(de)(de)(de)所(suo)有方(fang)面(mian)的(de)(de)(de)神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)(luo)和(he)(he)相關方(fang)法(fa)的(de)(de)(de)計算(suan)智(zhi)能(neng)。神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)(luo)歡迎高質(zhi)量(liang)(liang)論文(wen)的(de)(de)(de)提交，有助于(yu)全面(mian)的(de)(de)(de)神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)(luo)研究(jiu)(jiu)，從(cong)行為和(he)(he)大腦建模，學(xue)習算(suan)法(fa)，通過數(shu)學(xue)和(he)(he)計算(suan)分(fen)析，系統的(de)(de)(de)工程(cheng)和(he)(he)技術(shu)應(ying)用(yong)，大量(liang)(liang)使用(yong)神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)(luo)的(de)(de)(de)概念和(he)(he)技術(shu)。這一獨特而廣泛的(de)(de)(de)范(fan)圍(wei)促進了生(sheng)物(wu)(wu)和(he)(he)技術(shu)研究(jiu)(jiu)之間的(de)(de)(de)思(si)想(xiang)交流(liu)，并有助于(yu)促進對生(sheng)物(wu)(wu)啟發(fa)的(de)(de)(de)計算(suan)智(zhi)能(neng)感興趣(qu)的(de)(de)(de)跨(kua)學(xue)科(ke)社(she)(she)區的(de)(de)(de)發(fa)展。因此，神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)網(wang)(wang)絡(luo)(luo)編委會代表(biao)的(de)(de)(de)專家領(ling)域包括心(xin)理學(xue)，神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)生(sheng)物(wu)(wu)學(xue)，計算(suan)機科(ke)學(xue)，工程(cheng)，數(shu)學(xue)，物(wu)(wu)理。該雜志發(fa)表(biao)文(wen)章、信件(jian)和(he)(he)評(ping)論以及給編輯的(de)(de)(de)信件(jian)、社(she)(she)論、時事、軟件(jian)調查和(he)(he)專利(li)信息。文(wen)章發(fa)表(biao)在(zai)五個部(bu)分(fen)之一:認知科(ke)學(xue)，神(shen)(shen)(shen)(shen)(shen)經(jing)(jing)科(ke)學(xue)，學(xue)習系統，數(shu)學(xue)和(he)(he)計算(suan)分(fen)析、工程(cheng)和(he)(he)應(ying)用(yong)。官網(wang)(wang)地址：

學成 · 總回報 · 強化學習 · 優化器 · 示例 ·

2022 年 1 月 14 日

Reinforcement Learning to Solve NP-hard Problems: an Application to the CVRP

Leo Ardon

In this paper, we evaluate the use of Reinforcement Learning (RL) to solve a classic combinatorial optimization problem: the Capacitated Vehicle Routing Problem (CVRP). We formalize this problem in the RL framework and compare two of the most promising RL approaches with traditional solving techniques on a set of benchmark instances. We measure the different approaches with the quality of the solution returned and the time required to return it. We found that despite not returning the best solution, the RL approach has many advantages over traditional solvers. First, the versatility of the framework allows the resolution of more complex combinatorial problems. Moreover, instead of trying to solve a specific instance of the problem, the RL algorithm learns the skills required to solve the problem. The trained policy can then quasi instantly provide a solution to an unseen problem without having to solve it from scratch. Finally, the use of trained models makes the RL solver by far the fastest, and therefore make this approach more suited for commercial use where the user experience is paramount. Techniques like Knowledge Transfer can also be used to improve the training efficiency of the algorithm and help solve bigger and more complex problems.

控制器 · 線性的 · 學成 · 價值函數 · 估計/估計量 ·

2022 年 1 月 13 日

Control Theoretic Analysis of Temporal Difference Learning

Donghwan Lee

The goal of this paper is to investigate a control theoretic analysis of linear stochastic iterative algorithm and temporal difference (TD) learning. TD-learning is a linear stochastic iterative algorithm to estimate the value function of a given policy for a Markov decision process, which is one of the most popular and fundamental reinforcement learning algorithms. While there has been a series of successful works in theoretical analysis of TD-learning, it was not until recently that researchers found some guarantees on its statistical efficiency. In this paper, we propose a control theoretic finite-time analysis TD-learning, which exploits standard notions in linear system control communities. Therefore, the proposed work provides additional insights on TD-learning and reinforcement learning with simple concepts and analysis tools in control theory.

強化學習 · MoDELS · 學成 · contrastive · 最優化 ·

2021 年 12 月 8 日

Recent Advances in Reinforcement Learning in Finance

Ben Hambly,Renyuan Xu,Huining Yang

from arxiv, 60 pages, 1 figure

The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in finance. We give an introduction to Markov decision processes, which is the setting for many of the commonly used RL approaches. Various algorithms are then introduced with a focus on value and policy based methods that do not require any model assumptions. Connections are made with neural networks to extend the framework to encompass deep RL algorithms. Our survey concludes by discussing the application of these RL algorithms in a variety of decision-making problems in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising.

學成 · 約束 · 強化學習 · contrastive · 評論員 ·

2021 年 5 月 21 日

Inverse Constrained Reinforcement Learning

Usman Anwar,Shehryar Malik,Alireza Aghasi,Ali Ahmed

from arxiv, Camera-ready version for ICML 2021

In real world settings, numerous constraints are present which are hard to specify mathematically. However, for the real world deployment of reinforcement learning (RL), it is critical that RL agents are aware of these constraints, so that they can act safely. In this work, we consider the problem of learning constraints from demonstrations of a constraint-abiding agent's behavior. We experimentally validate our approach and show that our framework can successfully learn the most likely constraints that the agent respects. We further show that these learned constraints are \textit{transferable} to new agents that may have different morphologies and/or reward functions. Previous works in this regard have either mainly been restricted to tabular (discrete) settings, specific types of constraints or assume the environment's transition dynamics. In contrast, our framework is able to learn arbitrary \textit{Markovian} constraints in high-dimensions in a completely model-free setting. The code can be found it: \url{//github.com/shehryar-malik/icrl}.

貪心逐層預訓練 · 學成 · 貪心 · 深度強化學習 · Extensibility ·

2019 年 3 月 8 日

Learning Heuristics over Large Graphs via Deep Reinforcement Learning

Akash Mittal,Anuj Dhawan,Sourav Medya,Sayan Ranu,Ambuj Singh

In this paper, we propose a deep reinforcement learning framework called GCOMB to learn algorithms that can solve combinatorial problems over large graphs. GCOMB mimics the greedy algorithm in the original problem and incrementally constructs a solution. The proposed framework utilizes Graph Convolutional Network (GCN) to generate node embeddings that predicts the potential nodes in the solution set from the entire node set. These embeddings enable an efficient training process to learn the greedy policy via Q-learning. Through extensive evaluation on several real and synthetic datasets containing up to a million nodes, we establish that GCOMB is up to 41% better than the state of the art, up to seven times faster than the greedy algorithm, robust and scalable to large dynamic networks.

學成 · 經驗回放 · 強化學習 · 稀疏 · TEAM ·

2018 年 9 月 25 日

Hierarchical Deep Multiagent Reinforcement Learning

Hongyao Tang,Jianye Hao,Tangjie Lv,Yingfeng Chen,Zongzhang Zhang,Hangtian Jia,Chunxu Ren,Yan Zheng,Changjie Fan,Li Wang

Despite deep reinforcement learning has recently achieved great successes, however in multiagent environments, a number of challenges still remain. Multiagent reinforcement learning (MARL) is commonly considered to suffer from the problem of non-stationary environments and exponentially increasing policy space. It would be even more challenging to learn effective policies in circumstances where the rewards are sparse and delayed over long trajectories. In this paper, we study Hierarchical Deep Multiagent Reinforcement Learning (hierarchical deep MARL) in cooperative multiagent problems with sparse and delayed rewards, where efficient multiagent learning methods are desperately needed. We decompose the original MARL problem into hierarchies and investigate how effective policies can be learned hierarchically in synchronous/asynchronous hierarchical MARL frameworks. Several hierarchical deep MARL architectures, i.e., Ind-hDQN, hCom and hQmix, are introduced for different learning paradigms. Moreover, to alleviate the issues of sparse experiences in high-level learning and non-stationarity in multiagent settings, we propose a new experience replay mechanism, named as Augmented Concurrent Experience Replay (ACER). We empirically demonstrate the effects and efficiency of our approaches in several classic Multiagent Trash Collection tasks, as well as in an extremely challenging team sports game, i.e., Fever Basketball Defense.

Performer · 學成 · 強化學習 · 情景 · 深度強化學習 ·

2018 年 9 月 12 日

Multi-task Deep Reinforcement Learning with PopArt

Matteo Hessel,Hubert Soyer,Lasse Espeholt,Wojciech Czarnecki,Simon Schmitt,Hado van Hasselt

The reinforcement learning community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequential-decision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent's updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.

示例 · 優化器 · MoDELS · 強化學習 · 學成 ·

2018 年 5 月 21 日

Reinforcement Learning for Solving the Vehicle Routing Problem

Mohammadreza Nazari,Afshin Oroojlooy,Lawrence V. Snyder,Martin Taká?

from arxiv, more results and illustrations

We present an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning. In this approach, we train a single model that finds near-optimal solutions for problem instances sampled from a given distribution, only by observing the reward signals and following feasibility rules. Our model represents a parameterized stochastic policy, and by applying a policy gradient algorithm to optimize its parameters, the trained model produces the solution as a sequence of consecutive actions in real time, without the need to re-train for every new problem instance. On capacitated VRP, our approach outperforms classical heuristics and Google's OR-Tools on medium-sized instances in solution quality with comparable computation time (after training). We demonstrate how our approach can handle problems with split delivery and explore the effect of such deliveries on the solution quality. Our proposed framework can be applied to other variants of the VRP such as the stochastic VRP, and has the potential to be applied more generally to combinatorial optimization problems.

學成 · 控制器 · MoDELS · 在線 · 元學習 ·

2018 年 3 月 30 日

Learning to Adapt: Meta-Learning for Model-Based Control

Ignasi Clavera,Anusha Nagabandi,Ronald S. Fearing,Pieter Abbeel,Sergey Levine,Chelsea Finn

Although reinforcement learning methods can achieve impressive results in simulation, the real world presents two major challenges: generating samples is exceedingly expensive, and unexpected perturbations can cause proficient but narrowly-learned policies to fail at test time. In this work, we propose to learn how to quickly and effectively adapt online to new situations as well as to perturbations. To enable sample-efficient meta-learning, we consider learning online adaptation in the context of model-based reinforcement learning. Our approach trains a global model such that, when combined with recent data, the model can be be rapidly adapted to the local context. Our experiments demonstrate that our approach can enable simulated agents to adapt their behavior online to novel terrains, to a crippled leg, and in highly-dynamic environments.

逆強化學習 · 學成 · Processing（編程語言） · 特征空間 · 強化學習 ·

2017 年 5 月 4 日

Inverse Reinforcement Learning via Deep Gaussian Process

Ming Jin,Andreas Damianou,Pieter Abbeel,Costas Spanos

We propose a new approach to inverse reinforcement learning (IRL) based on the deep Gaussian process (deep GP) model, which is capable of learning complicated reward structures with few demonstrations. Our model stacks multiple latent GP layers to learn abstract representations of the state feature space, which is linked to the demonstrations through the Maximum Entropy learning framework. Incorporating the IRL engine into the nonlinear latent structure renders existing deep GP inference approaches intractable. To tackle this, we develop a non-standard variational approximation framework which extends previous inference schemes. This allows for approximate Bayesian treatment of the feature space and guards against overfitting. Carrying out representation and inverse reinforcement learning simultaneously within our model outperforms state-of-the-art approaches, as we demonstrate with experiments on standard benchmarks ("object world","highway driving") and a new benchmark ("binary world").