蜜芽亚洲精品国产品国语在线试看,欧美日韩在线精品视频一区二区三,99国精产品一区一区三,亚洲无码视频免费在线观看,国产情侣在视频

The deployment of Autonomous Vehicles (AVs) poses considerable challenges and unique opportunities for the design and management of future urban road infrastructure. In light of this disruptive transformation, the Right-Of-Way (ROW) composition of road space has the potential to be renewed. Design approaches and intelligent control models have been proposed to address this problem, but we lack an operational framework that can dynamically generate ROW plans for AVs and pedestrians in response to real-time demand. Based on microscopic traffic simulation, this study explores Reinforcement Learning (RL) methods for evolving ROW compositions. We implement a centralised paradigm and a distributive learning paradigm to separately perform the dynamic control on several road network configurations. Experimental results indicate that the algorithms have the potential to improve traffic flow efficiency and allocate more space for pedestrians. Furthermore, the distributive learning algorithm outperforms its centralised counterpart regarding computational cost (49.55\%), benchmark rewards (25.35\%), best cumulative rewards (24.58\%), optimal actions (13.49\%) and rate of convergence. This novel road management technique could potentially contribute to the flow-adaptive and active mobility-friendly streets in the AVs era.

相關內容

AVS

關注 0

Automator · LIDAR · Performer · 查準率/準確率 · 真實值 ·

2023 年 5 月 11 日

Real-Time Joint Simulation of LiDAR Perception and Motion Planning for Automated Driving

Zhanhong Huang,Xiao Zhang,Xinming Huang

Real-time perception and motion planning are two crucial tasks for autonomous driving. While there are many research works focused on improving the performance of perception and motion planning individually, it is still not clear how a perception error may adversely impact the motion planning results. In this work, we propose a joint simulation framework with LiDAR-based perception and motion planning for real-time automated driving. Taking the sensor input from the CARLA simulator with additive noise, a LiDAR perception system is designed to detect and track all surrounding vehicles and to provide precise orientation and velocity information. Next, we introduce a new collision bound representation that relaxes the communication cost between the perception module and the motion planner. A novel collision checking algorithm is implemented using line intersection checking that is more efficient for long distance range in comparing to the traditional method of occupancy grid. We evaluate the joint simulation framework in CARLA for urban driving scenarios. Experiments show that our proposed automated driving system can execute at 25 Hz, which meets the real-time requirement. The LiDAR perception system has high accuracy within 20 meters when evaluated with the ground truth. The motion planning results in consistent safe distance keeping when tested in CARLA urban driving scenarios.

Learning · 優化器 · ML · 編譯器 · 強化學習 ·

2023 年 5 月 11 日

Optimizing Memory Mapping Using Deep Reinforcement Learning

Pengming Wang,Mikita Sazanovich,Berkin Ilbeyi,Phitchaya Mangpo Phothilimthana,Manish Purohit,Han Yang Tay,Ngan V?,Miaosen Wang,Cosmin Paduraru,Edouard Leurent,Anton Zhernov,Julian Schrittwieser,Thomas Hubert,Robert Tung,Paula Kurylowicz,Kieran Milan,Oriol Vinyals,Daniel J. Mankowitz

Resource scheduling and allocation is a critical component of many high impact systems ranging from congestion control to cloud computing. Finding more optimal solutions to these problems often has significant impact on resource and time savings, reducing device wear-and-tear, and even potentially improving carbon emissions. In this paper, we focus on a specific instance of a scheduling problem, namely the memory mapping problem that occurs during compilation of machine learning programs: That is, mapping tensors to different memory layers to optimize execution time. We introduce an approach for solving the memory mapping problem using Reinforcement Learning. RL is a solution paradigm well-suited for sequential decision making problems that are amenable to planning, and combinatorial search spaces with high-dimensional data inputs. We formulate the problem as a single-player game, which we call the mallocGame, such that high-reward trajectories of the game correspond to efficient memory mappings on the target hardware. We also introduce a Reinforcement Learning agent, mallocMuZero, and show that it is capable of playing this game to discover new and improved memory mapping solutions that lead to faster execution times on real ML workloads on ML accelerators. We compare the performance of mallocMuZero to the default solver used by the Accelerated Linear Algebra (XLA) compiler on a benchmark of realistic ML workloads. In addition, we show that mallocMuZero is capable of improving the execution time of the recently published AlphaTensor matrix multiplication model.

穩健性 · 情景 · Learning · Extensibility · Markov ·

2023 年 5 月 11 日

On practical robust reinforcement learning: adjacent uncertainty set and double-agent algorithm

Ukjo Hwang,Songnam Hong

Robust reinforcement learning (RL) aims at learning a policy that optimizes the worst-case performance over an uncertainty set. Given nominal Markov decision process (N-MDP) that generates samples for training, the set contains MDPs obtained by some perturbations from N-MDP. In this paper, we introduce a new uncertainty set containing more realistic MDPs in practice than the existing sets. Using this uncertainty set, we present a robust RL, named ARQ-Learning, for tabular cases. Also, we characterize the finite-time error bounds and prove that it converges as fast as Q-Learning and robust Q-Learning (i.e., the state-of-the-art robust RL method) while providing better robustness for real applications. We propose {\em pessimistic agent} that efficiently tackles the key bottleneck for the extension of ARQ-Learning into large or continuous state spaces. Using this technique, we first propose PRQ-Learning. To the next, combining this with DQN and DDPG, we develop PR-DQN and PR-DDPG, respectively. We emphasize that our technique can be easily combined with the other popular model-free methods. Via experiments, we demonstrate the superiority of the proposed methods in various RL applications with model uncertainties.

可約的 · 評論員 · 輸入空間 · CASE · 知識 (knowledge) ·

2023 年 5 月 11 日

Realistic Safety-critical Scenarios Search for Autonomous Driving System via Behavior Tree

Ping Zhang,Lingfeng Ming,Tingyi Yuan,Cong Qiu,Yang Li,Xinhua Hui,Zhiquan Zhang,Chao Huang

The simulation-based testing of Autonomous Driving Systems (ADSs) has gained significant attention. However, current approaches often fall short of accurately assessing ADSs for two reasons: over-reliance on expert knowledge and the utilization of simplistic evaluation metrics. That leads to discrepancies between simulated scenarios and naturalistic driving environments. To address this, we propose the Matrix-Fuzzer, a behavior tree-based testing framework, to automatically generate realistic safety-critical test scenarios. Our approach involves the $log2BT$ method, which abstracts logged road-users' trajectories to behavior sequences. Furthermore, we vary the properties of behaviors from real-world driving distributions and then use an adaptive algorithm to explore the input space. Meanwhile, we design a general evaluation engine that guides the algorithm toward critical areas, thus reducing the generation of invalid scenarios. Our approach is demonstrated in our Matrix Simulator. The experimental results show that: (1) Our $log2BT$ achieves satisfactory trajectory reconstructions. (2) Our approach is able to find the most types of safety-critical scenarios, but only generating around 30% of the total scenarios compared with the baseline algorithm. Specifically, it improves the ratio of the critical violations to total scenarios and the ratio of the types to total scenarios by at least 10x and 5x, respectively, while reducing the ratio of the invalid scenarios to total scenarios by at least 58% in two case studies.

示例 · Extensibility · 類別 · Performer · Better ·

2023 年 5 月 10 日

Sequence-Agnostic Multi-Object Navigation

Nandiraju Gireesh,Ayush Agrawal,Ahana Datta,Snehasis Banerjee,Mohan Sridharan,Brojeshwar Bhowmick,Madhava Krishna

The Multi-Object Navigation (MultiON) task requires a robot to localize an instance (each) of multiple object classes. It is a fundamental task for an assistive robot in a home or a factory. Existing methods for MultiON have viewed this as a direct extension of Object Navigation (ON), the task of localising an instance of one object class, and are pre-sequenced, i.e., the sequence in which the object classes are to be explored is provided in advance. This is a strong limitation in practical applications characterized by dynamic changes. This paper describes a deep reinforcement learning framework for sequence-agnostic MultiON based on an actor-critic architecture and a suitable reward specification. Our framework leverages past experiences and seeks to reward progress toward individual as well as multiple target object classes. We use photo-realistic scenes from the Gibson benchmark dataset in the AI Habitat 3D simulation environment to experimentally show that our method performs better than a pre-sequenced approach and a state of the art ON method extended to MultiON.

INFORMS · Extensibility · 可辨認的 · CARS · 穩健性 ·

2023 年 5 月 10 日

TARGET: Traffic Rule-based Test Generation for Autonomous Driving Systems

Yao Deng,Jiaohong Yao,Zhi Tu,Xi Zheng,Mengshi Zhang,Tianyi Zhang

Recent accidents involving self-driving cars call for extensive testing efforts to improve the safety and robustness of autonomous driving. However, constructing test scenarios for autonomous driving is tedious and time-consuming. In this work, we develop an end-to-end test generation framework called TARGET, which automatically constructs test scenarios from human-written traffic rules in an autonomous driving simulator. To handle the ambiguity and sophistication of natural language, TARGET uses GPT-3 to extract key information related to the test scenario from a traffic rule and represents the extracted information in a test scenario schema. Then, TARGET synthesizes the corresponding scenario scripts to construct the test scenario based on the scenario representation. We have evaluated TARGET on four autonomous driving systems, 18 traffic rules, and 8 road maps. TARGET can successfully generate 75 test scenarios and detect 247 traffic rule violations. Based on the violation logs (e.g., waypoints of ego vehicles), we are able to identify three underlying issues in these autonomous driving systems, which are either confirmed by the developers or the existing bug reports.

端到端 · state-of-the-art · 查準率/準確率 · Attention · 優化器 ·

2023 年 5 月 10 日

Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives

Siyu Teng,Xuemin Hu,Peng Deng,Bai Li,Yuchen Li,Dongsheng Yang,Yunfeng Ai,Lingxi Li,Zhe Xuanyuan,Fenghua Zhu,Long Chen

from arxiv, 21 pages, 15 figures and 5 tables, in IEEE Transactions on Intelligent Vehicles

Intelligent vehicles (IVs) have gained worldwide attention due to their increased convenience, safety advantages, and potential commercial value. Despite predictions of commercial deployment by 2025, implementation remains limited to small-scale validation, with precise tracking controllers and motion planners being essential prerequisites for IVs. This paper reviews state-of-the-art motion planning methods for IVs, including pipeline planning and end-to-end planning methods. The study examines the selection, expansion, and optimization operations in a pipeline method, while it investigates training approaches and validation scenarios for driving tasks in end-to-end methods. Experimental platforms are reviewed to assist readers in choosing suitable training and validation strategies. A side-by-side comparison of the methods is provided to highlight their strengths and limitations, aiding system-level design choices. Current challenges and future perspectives are also discussed in this survey.

經驗回放 · 優化器 · 強化學習 · 學成 · 清華大學智能產業研究院 ·

2022 年 1 月 14 日

Reinforcement Learning based Air Combat Maneuver Generation

Muhammed Murat Ozbek,Emre Koyuncu

The advent of artificial intelligence technology paved the way of many researches to be made within air combat sector. Academicians and many other researchers did a research on a prominent research direction called autonomous maneuver decision of UAV. Elaborative researches produced some outcomes, but decisions that include Reinforcement Learning(RL) came out to be more efficient. There have been many researches and experiments done to make an agent reach its target in an optimal way, most prominent are Genetic Algorithm(GA) , A star, RRT and other various optimization techniques have been used. But Reinforcement Learning is the well known one for its success. In DARPHA Alpha Dogfight Trials, reinforcement learning prevailed against a real veteran F16 human pilot who was trained by Boeing. This successor model was developed by Heron Systems. After this accomplishment, reinforcement learning bring tremendous attention on itself. In this research we aimed our UAV which has a dubin vehicle dynamic property to move to the target in two dimensional space in an optimal path using Twin Delayed Deep Deterministic Policy Gradients (TD3) and used in experience replay Hindsight Experience Replay(HER).We did tests on two different environments and used simulations.

強化學習 · MoDELS · 學成 · contrastive · 最優化 ·

2021 年 12 月 8 日

Recent Advances in Reinforcement Learning in Finance

Ben Hambly,Renyuan Xu,Huining Yang

from arxiv, 60 pages, 1 figure

The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in finance. We give an introduction to Markov decision processes, which is the setting for many of the commonly used RL approaches. Various algorithms are then introduced with a focus on value and policy based methods that do not require any model assumptions. Connections are made with neural networks to extend the framework to encompass deep RL algorithms. Our survey concludes by discussing the application of these RL algorithms in a variety of decision-making problems in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising.

INFORMS · 學成 · 強化學習 · 分離的 · state-of-the-art ·

2021 年 2 月 7 日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Jin Zhang,Jianhao Wang,Hao Hu,Tong Chen,Yingfeng Chen,Changjie Fan,Chongjie Zhang

Meta reinforcement learning (meta-RL) extracts knowledge from previous tasks and achieves fast adaptation to new tasks. Despite recent progress, efficient exploration in meta-RL remains a key challenge in sparse-reward tasks, as it requires quickly finding informative task-relevant experiences in both meta-training and adaptation. To address this challenge, we explicitly model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning, and introduce a novel empowerment-driven exploration objective, which aims to maximize information gain for task identification. We derive a corresponding intrinsic reward and develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies by sharing the knowledge of task inference. Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on various sparse-reward MuJoCo locomotion tasks and more complex sparse-reward Meta-World tasks.