一级a视频免费一区二区_99国产精品久久久久99_91精品国产91久久综合O_一级特黄趁人大片无遮挡_亚洲一区波多野结衣在线观看_婷婷色一区二区三区四区_二级AV中文字幕在线观看

In a ride-hailing system, an optimal relocation of vacant vehicles can significantly reduce fleet idling time and balance the supply-demand distribution, enhancing system efficiency and promoting driver satisfaction and retention. Model-free deep reinforcement learning (DRL) has been shown to dynamically learn the relocating policy by actively interacting with the intrinsic dynamics in large-scale ride-hailing systems. However, the issues of sparse reward signals and unbalanced demand and supply distribution place critical barriers in developing effective DRL models. Conventional exploration strategy (e.g., the $\epsilon$-greedy) may barely work under such an environment because of dithering in low-demand regions distant from high-revenue regions. This study proposes the deep relocating option policy (DROP) that supervises vehicle agents to escape from oversupply areas and effectively relocate to potentially underserved areas. We propose to learn the Laplacian embedding of a time-expanded relocation graph, as an approximation representation of the system relocation policy. The embedding generates task-agnostic signals, which in combination with task-dependent signals, constitute the pseudo-reward function for generating DROPs. We present a hierarchical learning framework that trains a high-level relocation policy and a set of low-level DROPs. The effectiveness of our approach is demonstrated using a custom-built high-fidelity simulator with real-world trip record data. We report that DROP significantly improves baseline models with 15.7% more hourly revenue and can effectively resolve the dithering issue in low-demand areas.

相關內容

優化器(qi)

關注 4

可約的 · 估計/估計量 · Performer · 通道 · Better ·

2021 年 10 月 29 日

Joint Channel Estimation and Data Detection in Cell-Free Massive MU-MIMO Systems

Haochuan Song,Tom Goldstein,Xiaohu You,Chuan Zhang,Olav Tirkkonen,Christoph Studer

from arxiv, To appear in the IEEE Transactions on Wireless Communications

We propose a joint channel estimation and data detection (JED) algorithm for densely-populated cell-free massive multiuser (MU) multiple-input multiple-output (MIMO) systems, which reduces the channel training overhead caused by the presence of hundreds of simultaneously transmitting user equipments (UEs). Our algorithm iteratively solves a relaxed version of a maximum a-posteriori JED problem and simultaneously exploits the sparsity of cell-free massive MU-MIMO channels as well as the boundedness of QAM constellations. In order to improve the performance and convergence of the algorithm, we propose methods that permute the access point and UE indices to form so-called virtual cells, which leads to better initial solutions. We assess the performance of our algorithm in terms of root-mean-squared-symbol error, bit error rate, and mutual information, and we demonstrate that JED significantly reduces the pilot overhead compared to orthogonal training, which enables reliable communication with short packets to a large number of UEs.

縮放 · 優化器 · Microsoft Surface · 分離的 · 前向 ·

2021 年 10 月 28 日

IRS-Aided Wireless Relaying: Optimal Deployment and Capacity Scaling

Zhenyu Kang,Changsheng You,Rui Zhang

In this letter, we consider an intelligent reflecting surface (IRS)-aided wireless relaying system, where a decode-and-forward relay (R) is employed to forward data from a source (S) to a destination (D), aided by M passive reflecting elements. We consider two practical IRS deployment strategies, namely, single-IRS deployment where all reflecting elements are mounted on one single IRS that is deployed near S, R, or D, and multi-IRS deployment where the reflecting elements are allocated over three separate IRSs which are deployed near S, R, and D, respectively. Under the line-of-sight (LoS) channel model, we characterize the capacity scaling orders with respect to an increasing M for the IRS-aided relay system with different IRS deployment strategies. For single-IRS deployment, we show that deploying the IRS near R achieves the highest capacity as compared to that near S or D. While for multi-IRS deployment, we propose a practical cooperative IRS passive beamforming design which is analytically shown to achieve a larger capacity scaling order than the single-IRS deployment (i.e., near R or S/D) when M is sufficiently large. Numerical examples are provided, which validate our theoretical results.

上置信界限 · Extensibility · 學成 · 強化學習 · 經驗池 ·

2021 年 10 月 27 日

Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction

Jiachen Li,Shuo Cheng,Zhenyu Liao,Huayan Wang,William Yang Wang,Qinxun Bai

Improving sample efficiency of reinforcement learning algorithms requires effective exploration. Following the principle of $\textit{optimism in the face of uncertainty}$, we train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework. However, this introduces extra differences between the replay buffer and the target policy in terms of their stationary state-action distributions. To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training. In particular, we correct the training distribution for both policies and critics. Empirically, we evaluate our proposed method in several challenging continuous control tasks and show superior performance compared to state-of-the-art methods. We also conduct extensive ablation studies to demonstrate the effectiveness and the rationality of the proposed method.

主動學習 · 學成 · 方差減小 · 欠估計 · 可約的 ·

2021 年 10 月 27 日

Failure-averse Active Learning for Physics-constrained Systems

Cheolhei Lee,Xing Wang,Jianguo Wu,Xiaowei Yue

from arxiv, 12 pages

Active learning is a subfield of machine learning that is devised for design and modeling of systems with highly expensive sampling costs. Industrial and engineering systems are generally subject to physics constraints that may induce fatal failures when they are violated, while such constraints are frequently underestimated in active learning. In this paper, we develop a novel active learning method that avoids failures considering implicit physics constraints that govern the system. The proposed approach is driven by two tasks: the safe variance reduction explores the safe region to reduce the variance of the target model, and the safe region expansion aims to extend the explorable region exploiting the probabilistic model of constraints. The global acquisition function is devised to judiciously optimize acquisition functions of two tasks, and its theoretical properties are provided. The proposed method is applied to the composite fuselage assembly process with consideration of material failure using the Tsai-wu criterion, and it is able to achieve zero-failure without the knowledge of explicit failure regions.

樣本復雜度 · 學成 · 線性的 · 樣本 · 情景 ·

2021 年 10 月 27 日

Provable Lifelong Learning of Representations

Xinyuan Cao,Weiyang Liu,Santosh S. Vempala

from arxiv, Working paper (30 pages, 6 figures)

In lifelong learning, the tasks (or classes) to be learned arrive sequentially over time in arbitrary order. During training, knowledge from previous tasks can be captured and transferred to subsequent ones to improve sample efficiency. We consider the setting where all target tasks can be represented in the span of a small number of unknown linear or nonlinear features of the input data. We propose a provable lifelong learning algorithm that maintains and refines the internal feature representation. We prove that for any desired accuracy on all tasks, the dimension of the representation remains close to that of the underlying representation. The resulting sample complexity improves significantly on existing bounds. In the setting of linear features, our algorithm is provably efficient and the sample complexity for input dimension $d$, $m$ tasks with $k$ features up to error $\epsilon$ is $\tilde{O}(dk^{1.5}/\epsilon+km/\epsilon)$. We also prove a matching lower bound for any lifelong learning algorithm that uses a single task learner as a black box. Finally, we complement our analysis with an empirical study.

學成 · 約束 · 強化學習 · contrastive · 評論員 ·

2021 年 5 月 21 日

Inverse Constrained Reinforcement Learning

Usman Anwar,Shehryar Malik,Alireza Aghasi,Ali Ahmed

from arxiv, Camera-ready version for ICML 2021

In real world settings, numerous constraints are present which are hard to specify mathematically. However, for the real world deployment of reinforcement learning (RL), it is critical that RL agents are aware of these constraints, so that they can act safely. In this work, we consider the problem of learning constraints from demonstrations of a constraint-abiding agent's behavior. We experimentally validate our approach and show that our framework can successfully learn the most likely constraints that the agent respects. We further show that these learned constraints are \textit{transferable} to new agents that may have different morphologies and/or reward functions. Previous works in this regard have either mainly been restricted to tabular (discrete) settings, specific types of constraints or assume the environment's transition dynamics. In contrast, our framework is able to learn arbitrary \textit{Markovian} constraints in high-dimensions in a completely model-free setting. The code can be found it: \url{//github.com/shehryar-malik/icrl}.

Continuity · Neural Networks · 學成 · INFORMS · Networking ·

2019 年 2 月 11 日

Continual Lifelong Learning with Neural Networks: A Review

German I. Parisi,Ronald Kemker,Jose L. Part,Christopher Kanan,Stefan Wermter

Humans and animals have the ability to continually acquire, fine-tune, and transfer knowledge and skills throughout their lifespan. This ability, referred to as lifelong learning, is mediated by a rich set of neurocognitive mechanisms that together contribute to the development and specialization of our sensorimotor skills as well as to long-term memory consolidation and retrieval. Consequently, lifelong learning capabilities are crucial for autonomous agents interacting in the real world and processing continuous streams of information. However, lifelong learning remains a long-standing challenge for machine learning and neural network models since the continual acquisition of incrementally available information from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback for state-of-the-art deep neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic forgetting. We discuss well-established and emerging research motivated by lifelong learning factors in biological systems such as structural plasticity, memory replay, curriculum and transfer learning, intrinsic motivation, and multisensory integration.

學成 · 均值 · 強化學習 · entity · INTERACT ·

2018 年 6 月 12 日

Mean Field Multi-Agent Reinforcement Learning

Yaodong Yang,Rui Luo,Minne Li,Ming Zhou,Weinan Zhang,Jun Wang

from arxiv, ICML 2018 (Full paper + Long talk)

Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.

Performer · 深度強化學習 · 學成 · entity · 強化學習 ·

2018 年 6 月 5 日

Relational Deep Reinforcement Learning

Vinicius Zambaldi,David Raposo,Adam Santoro,Victor Bapst,Yujia Li,Igor Babuschkin,Karl Tuyls,David Reichert,Timothy Lillicrap,Edward Lockhart,Murray Shanahan,Victoria Langston,Razvan Pascanu,Matthew Botvinick,Oriol Vinyals,Peter Battaglia

We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.

2018 年 1 月 5 日

Deep Reinforcement Learning for List-wise Recommendations

Xiangyu Zhao,Liang Zhang,Zhuoye Ding,Dawei Yin,Yihong Zhao,Jiliang Tang

Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.