姑娘日本电影免费观看全集中文-销魂美女一区二区三区AV

Safe UAV navigation is challenging due to the complex environment structures, dynamic obstacles, and uncertainties from measurement noises and unpredictable moving obstacle behaviors. Although plenty of recent works achieve safe navigation in complex static environments with sophisticated mapping algorithms, such as occupancy map and ESDF map, these methods cannot reliably handle dynamic environments due to the mapping limitation from moving obstacles. To address the limitation, this paper proposes a trajectory planning framework to achieve safe navigation considering complex static environments with dynamic obstacles. To reliably handle dynamic obstacles, we divide the environment representation into static mapping and dynamic object representation, which can be obtained from computer vision methods. Our framework first generates a static trajectory based on the proposed iterative corridor shrinking algorithm. Then, reactive chance-constrained model predictive control with temporal goal tracking is applied to avoid dynamic obstacles with uncertainties. The simulation results in various environments demonstrate the ability of our algorithm to navigate safely in complex static environments with dynamic obstacles.

相關內容

回合

關注 3

學成 · 機器人 · 強化學習 · 值迭代 · 路徑 ·

2021 年 11 月 5 日

Hierarchies of Planning and Reinforcement Learning for Robot Navigation

Jan W?hlke,Felix Schmitt,Herke van Hoof

from arxiv, 7 pages, 5 figures, 2021 IEEE International Conference on Robotics and Automation (ICRA), v2: DOI number added

Solving robotic navigation tasks via reinforcement learning (RL) is challenging due to their sparse reward and long decision horizon nature. However, in many navigation tasks, high-level (HL) task representations, like a rough floor plan, are available. Previous work has demonstrated efficient learning by hierarchal approaches consisting of path planning in the HL representation and using sub-goals derived from the plan to guide the RL policy in the source task. However, these approaches usually neglect the complex dynamics and sub-optimal sub-goal-reaching capabilities of the robot during planning. This work overcomes these limitations by proposing a novel hierarchical framework that utilizes a trainable planning policy for the HL representation. Thereby robot capabilities and environment conditions can be learned utilizing collected rollout data. We specifically introduce a planning policy based on value iteration with a learned transition model (VI-RL). In simulated robotic navigation tasks, VI-RL results in consistent strong improvement over vanilla RL, is on par with vanilla hierarchal RL on single layouts but more broadly applicable to multiple layouts, and is on par with trainable HL path planning baselines except for a parking task with difficult non-holonomic dynamics where it shows marked improvements.

查準率/準確率 · Performer · Integration · 控制器 · 回合 ·

2021 年 11 月 5 日

Towards Autonomous Robotic Precision Harvesting: Mapping, Localization, Planning and Control for a Legged Tree Harvester

Edo Jelavic,Dominic Jud,Pascal Egli,Marco Hutter

from arxiv, Accepted to Field Robotics journal

This paper presents an integrated system for performing precision harvesting missions using a legged harvester. Our harvester performs a challenging task of autonomous navigation and tree grabbing in a confined, GPS denied forest environment. Strategies for mapping, localization, planning, and control are proposed and integrated into a fully autonomous system. The mission starts with a human mapping the area of interest using a custom-made sensor module. Subsequently, a human expert selects the trees for harvesting. The sensor module is then mounted on the machine and used for localization within the given map. A planning algorithm searches for both an approach pose and a path in a single path planning problem. We design a path following controller leveraging the legged harvester's capabilities for negotiating rough terrain. Upon reaching the approach pose, the machine grabs a tree with a general-purpose gripper. This process repeats for all the trees selected by the operator. Our system has been tested on a testing field with tree trunks and in a natural forest. To the best of our knowledge, this is the first time this level of autonomy has been shown on a full-size hydraulic machine operating in a realistic environment.

回合 · state-of-the-art · 蒙特卡羅 · 狀態空間 · 學成 ·

2021 年 11 月 5 日

POMP++: Pomcp-based Active Visual Search in unknown indoor environments

Francesco Giuliari,Alberto Castellini,Riccardo Berra,Alessio Del Bue,Alessandro Farinelli,Marco Cristani,Francesco Setti,Yiming Wang

from arxiv, Accepted at 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

In this paper we focus on the problem of learning online an optimal policy for Active Visual Search (AVS) of objects in unknown indoor environments. We propose POMP++, a planning strategy that introduces a novel formulation on top of the classic Partially Observable Monte Carlo Planning (POMCP) framework, to allow training-free online policy learning in unknown environments. We present a new belief reinvigoration strategy which allows to use POMCP with a dynamically growing state space to address the online generation of the floor map. We evaluate our method on two public benchmark datasets, AVD that is acquired by real robotic platforms and Habitat ObjectNav that is rendered from real 3D scene scans, achieving the best success rate with an improvement of >10% over the state-of-the-art methods.

CC · 可約的 · 優化器 · SCA · Better ·

2021 年 11 月 5 日

Energy-Efficient Cyclical Trajectory Design for UAV-Aided Maritime Data Collection in Wind

Yifan Zhang,Jiangbin Lyu,Liqun Fu

from arxiv, Published in GLOBECOM2020. Investigated UAV-aided maritime data collection in wind, with joint trajectory and communications optimization for energy efficiency. Proposed new cyclical trajectory design that can handle arbitrary data volume with significantly reduced computational/trajectory complexity. Unveiled that the wind can be proactively utilized by our optimized trajectory

Unmanned aerial vehicles (UAVs), especially fixed-wing ones that withstand strong winds, have great potential for oceanic exploration and research. This paper studies a UAV-aided maritime data collection system with a fixed-wing UAV dispatched to collect data from marine buoys. We aim to minimize the UAV's energy consumption in completing the task by jointly optimizing the communication time scheduling among the buoys and the UAV's flight trajectory subject to wind effect, which is a non-convex problem and difficult to solve optimally. Existing techniques such as the successive convex approximation (SCA) method provide efficient sub-optimal solutions for collecting small/moderate data volume, whereas the solution heavily relies on the trajectory initialization and has not explicitly considered the wind effect, while the computational complexity and resulted trajectory complexity both become prohibitive for the task with large data volume. To this end, we propose a new cyclical trajectory design framework that can handle arbitrary data volume efficiently subject to wind effect. Specifically, the proposed UAV trajectory comprises multiple cyclical laps, each responsible for collecting only a subset of data and thereby significantly reducing the computational/trajectory complexity, which allows searching for better trajectory initialization that fits the buoys' topology and the wind. Numerical results show that the proposed cyclical scheme outperforms the benchmark one-flight-only scheme in general. Moreover, the optimized cyclical 8-shape trajectory can proactively exploit the wind and achieve lower energy consumption compared with the case without wind.

目標檢測 · 學成 · Performer · state-of-the-art · 深度學習 ·

2021 年 10 月 25 日

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Xin Wu,Wei Li,Danfeng Hong,Ran Tao,Qian Du

Owing to effective and flexible data acquisition, unmanned aerial vehicle (UAV) has recently become a hotspot across the fields of computer vision (CV) and remote sensing (RS). Inspired by recent success of deep learning (DL), many advanced object detection and tracking approaches have been widely applied to various UAV-related tasks, such as environmental monitoring, precision agriculture, traffic management. This paper provides a comprehensive survey on the research progress and prospects of DL-based UAV object detection and tracking methods. More specifically, we first outline the challenges, statistics of existing methods, and provide solutions from the perspectives of DL-based models in three research topics: object detection from the image, object detection from the video, and object tracking from the video. Open datasets related to UAV-dominated object detection and tracking are exhausted, and four benchmark datasets are employed for performance evaluation using some state-of-the-art methods. Finally, prospects and considerations for the future work are discussed and summarized. It is expected that this survey can facilitate those researchers who come from remote sensing field with an overview of DL-based UAV object detection and tracking methods, along with some thoughts on their further developments.

Continuity · 策略評估 · Principle · Extensibility · 學成 ·

2021 年 4 月 13 日

Learning and Planning in Complex Action Spaces

Thomas Hubert,Julian Schrittwieser,Ioannis Antonoglou,Mohammadamin Barekatain,Simon Schmitt,David Silver

Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampled action subsets. This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm based upon policy iteration. Concretely, we propose Sampled MuZero, an extension of the MuZero algorithm that is able to learn in domains with arbitrarily complex action spaces by planning over sampled actions. We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: DeepMind Control Suite and Real-World RL Suite.

Guidance · Performer · Extensibility · 路徑 · state-of-the-art ·

2021 年 2 月 8 日

Path Planning using Neural A* Search

Ryo Yonetani,Tatsunori Taniai,Mohammadamin Barekatain,Mai Nishimura,Asako Kanezaki

We present Neural A*, a novel data-driven search method for path planning problems. Despite the recent increasing attention to data-driven path planning, a machine learning approach to search-based planning is still challenging due to the discrete nature of search algorithms. In this work, we reformulate a canonical A* search algorithm to be differentiable and couple it with a convolutional encoder to form an end-to-end trainable neural network planner. Neural A* solves a path planning problem by encoding a problem instance to a guidance map and then performing the differentiable A* search with the guidance map. By learning to match the search results with ground-truth paths provided by experts, Neural A* can produce a path consistent with the ground truth accurately and efficiently. Our extensive experiments confirmed that Neural A* outperformed state-of-the-art data-driven planners in terms of the search optimality and efficiency trade-off, and furthermore, successfully predicted realistic human trajectories by directly performing search-based planning on natural image inputs.

state-of-the-art · FAST · 回合 · SPL · 束搜索 ·

2019 年 3 月 6 日

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Liyiming Ke,Xiujun Li,Yonatan Bisk,Ari Holtzman,Zhe Gan,Jingjing Liu,Jianfeng Gao,Yejin Choi,Siddhartha Srinivasa

from arxiv, CVPR 2019 Oral, video demo: //youtu.be/ik9uz06Fcpk

We present FAST NAVIGATOR, a general framework for action decoding, which yields state-of-the-art results on the recent Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et. al. (2018). Given a natural language instruction and photo-realistic image views of a previously unseen environment, the agent must navigate from a source to a target location as quickly as possible. While all of current approaches make local action decisions or score entire trajectories with beam search, our framework seamlessly balances local and global signals when exploring the environment. Importantly, this allows us to act greedily, but use global signals to backtrack when necessary. Our FAST framework, applied to existing models, yielded a 17% relative gain over the previous state-of-the-art, an absolute 6% gain on success rate weighted by path length (SPL).

穩健性 · 深度強化學習 · 控制器 · 強化學習 · MoDELS ·

2018 年 12 月 7 日

Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control

Zhuo Xu,Chen Tang,Masayoshi Tomizuka

from arxiv, Published at IEEE ITSC 2018

Although deep reinforcement learning (deep RL) methods have lots of strengths that are favorable if applied to autonomous driving, real deep RL applications in autonomous driving have been slowed down by the modeling gap between the source (training) domain and the target (deployment) domain. Unlike current policy transfer approaches, which generally limit to the usage of uninterpretable neural network representations as the transferred features, we propose to transfer concrete kinematic quantities in autonomous driving. The proposed robust-control-based (RC) generic transfer architecture, which we call RL-RC, incorporates a transferable hierarchical RL trajectory planner and a robust tracking controller based on disturbance observer (DOB). The deep RL policies trained with known nominal dynamics model are transfered directly to the target domain, DOB-based robust tracking control is applied to tackle the modeling gap including the vehicle dynamics errors and the external disturbances such as side forces. We provide simulations validating the capability of the proposed method to achieve zero-shot transfer across multiple driving scenarios such as lane keeping, lane changing and obstacle avoidance.

Atari · 學成 · Performer · 獎勵函數 · MoDELS ·

2018 年 11 月 15 日

Reward learning from human preferences and demonstrations in Atari

Borja Ibarz,Jan Leike,Tobias Pohlen,Geoffrey Irving,Shane Legg,Dario Amodei

from arxiv, NIPS 2018

To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we can have humans communicate an objective to the agent directly. In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences. We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games without using game rewards. Additionally, we investigate the goodness of fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.