草莓视频在线观看免费完整,国色天香网站,免费免费毛片免费播放,亚洲精品中文字幕第一页

The tasks that an autonomous agent is expected to perform are often optional or are incompatible with each other owing to the agent's limited actuation capabilities, specifically the dynamics and control input bounds. We encode tasks as time-dependent state constraints and leverage the advances in multi-objective optimization to formulate the problem of choosing tasks as selection of a feasible subset of constraints that can be satisfied for all time and maximizes a performance metric. We show that this problem, although amenable to reachability or mixed integer model predictive control-based analysis in the offline phase, is NP-Hard in general and therefore requires heuristics to be solved efficiently. When incompatibility in constraints is observed under a given policy that imposes task constraints at each time step in an optimization problem, we assign a Lagrange score to each of these constraints based on the variation in the corresponding Lagrange multipliers over the compatible time horizon. These scores are then used to decide the order in which constraints are dropped in a greedy strategy. We further employ a genetic algorithm to improve upon the greedy strategy. We evaluate our method on a robot waypoint following task when the low-level controllers that impose state constraints are described by Control Barrier Function-based Quadratic Programs and provide a comparison with waypoint selection based on knowledge of backward reachable sets.

相關內容

約束

關注 0

RGB-D · 機器人 · 回合 · LIDAR · 3D ·

2023 年 7 月 5 日

Low computational-cost detection and tracking of dynamic obstacles for mobile robots with RGB-D cameras

Zhefan Xu,Xiaoyang Zhan,Yumeng Xiu,Christopher Suzuki,Kenji Shimada

from arxiv, 8 pages, 12 figures, 2 tables

Deploying autonomous robots in crowded indoor environments usually requires them to have accurate dynamic obstacle perception. Although plenty of previous works in the autonomous driving field have investigated the 3D object detection problem, the usage of dense point clouds from a heavy LiDAR and their high computation cost for learning-based data processing make those methods not applicable to small robots, such as vision-based UAVs with small onboard computers. To address this issue, we propose a lightweight 3D dynamic obstacle detection and tracking (DODT) method based on an RGB-D camera, which is designed for low-power robots with limited computing power. Our method adopts a novel ensemble detection strategy, combining multiple computationally efficient but low-accuracy detectors to achieve real-time high-accuracy obstacle detection. Besides, we introduce a new feature-based data association method to prevent mismatches and use the Kalman filter with the constant acceleration model to track detected obstacles. In addition, our system includes an optional and auxiliary learning-based module to enhance the obstacle detection range and dynamic obstacle identification. The users can determine whether or not to run this module based on the available computation resources. The proposed method is implemented in a small quadcopter, and the experiments prove that the algorithm can make the robot detect dynamic obstacles and navigate dynamic environments safely.

塑造 · 操作 · Agent · INTERACT · state-of-the-art ·

2023 年 7 月 4 日

Hierarchical Planning and Policy Shaping Shared Autonomy for Articulated Robots

Ehsan Yousefi,Mo Chen,Inna Sharf

In this work, we propose a novel shared autonomy framework to operate articulated robots. We provide strategies to design both the task-oriented hierarchical planning and policy shaping algorithms for efficient human-robot interactions in context-aware operation of articulated robots. Our framework for interplay between the human and the autonomy, as the participating agents in the system, is particularly influenced by the ideas from multi-agent systems, game theory, and theory of mind for a sliding level of autonomy. We formulate the sequential hierarchical human-in-the-loop decision making process by extending MDPs and Options framework to shared autonomy, and make use of deep RL techniques to train an uncertainty-aware shared autonomy policy. To fine-tune the formulation to a human, we use history of the system states, human actions, and their error with respect to a surrogate optimal model to encode human's internal state embeddings, beyond the designed values, by using conditional VAEs. We showcase the effectiveness of our formulation for different human skill levels and degrees of cooperativeness by using a case study of a feller-buncher machine in the challenging tasks of timber harvesting. Our framework is successful in providing a sliding level of autonomy from fully autonomous to fully manual, and is particularly successful in handling a noisy non-cooperative human agent in the loop. The proposed framework advances the state-of-the-art in shared autonomy for operating articulated robots, but can also be applied to other domains where autonomous operation is the ultimate goal.

Learning · 優化器 · 值域 · 有偏 · 強化學習 ·

2023 年 7 月 2 日

Learning to Optimize for Reinforcement Learning

Qingfeng Lan,A. Rupam Mahmood,Shuicheng Yan,Zhongwen Xu

from arxiv, For code release, see //github.com/sail-sg/optim4rl

In recent years, by leveraging more data, computation, and diverse tasks, learned optimizers have achieved remarkable success in supervised learning, outperforming classical hand-designed optimizers. Reinforcement learning (RL) is essentially different from supervised learning and in practice these learned optimizers do not work well even in simple RL tasks. We investigate this phenomenon and identity three issues. First, the gradients of an RL agent vary across a wide range in logarithms while their absolute values are in a small range, making neural networks hard to obtain accurate parameter updates. Second, the agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training. Finally, due to highly stochastic agent-environment interactions, the agent-gradients have high bias and variance, which increase the difficulty of learning an optimizer for RL. We propose gradient processing, pipeline training, and a novel optimizer structure with good inductive bias to address these issues. By applying these techniques, for the first time, we show that learning an optimizer for RL from scratch is possible. Although only trained in toy tasks, our learned optimizer can generalize to unseen complex tasks in Brax.

采樣法 · 自適應采樣 · 樣本 · 優化器 · 樣本空間 ·

2023 年 7 月 2 日

Adaptive Sampling Control in Motion Planning of Autonomous Vehicle

Yucheng LI

from arxiv, Master's thesis

Autonomous driving vehicles aim to free the hands of vehicle operators, helping them to drive easier and faster, meanwhile, improving the safety of driving on the highway or in complex scenarios. Automated driving systems (ADS) are developed and designed in the last several decades to realize fully autonomous driving vehicles (L4 or L5 level). The scale of sampling space leads to the main computational complexity. Therefore, by adjusting the sampling method, the difficulty to solve the real-time motion planning problem could be incrementally reduced. Usually, the Average Sampling Method is taken in Lattice Planner, and Random Sampling Method is chosen for RRT algorithms. However, both of them don't take into consideration the prior information, and focus the sampling space on areas where the optimal trajectory is previously obtained. Therefore, \emph{in this thesis it is proposed an adaptive sampling method to reduce the computation complexity, and achieve faster solutions while keeping the quality of optimal solution unchanged}. The main contribution of this thesis is the significant decrease in the complexity of the optimization problem for motion planning, without sacrificing the quality of the final trajectory output, with the implementation of an Adaptive Sampling method based on Artificial Potential Field (ASAPF). In addition, also the quality and the stability of the trajectory is improved due to the appropriate sampling of the appropriate region to be analyzed.

控制器 · 優化器 · 覆蓋 · 穩健性 · Agent ·

2023 年 6 月 30 日

Unscented Optimal Control for 3D Coverage Planning with an Autonomous UAV Agent

Savvas Papaioannou,Panayiotis Kolios,Theocharis Theocharides,Christos G. Panayiotou,Marios M. Polycarpou

from arxiv, 2023 International Conference on Unmanned Aircraft Systems (ICUAS)

We propose a novel probabilistically robust controller for the guidance of an unmanned aerial vehicle (UAV) in coverage planning missions, which can simultaneously optimize both the UAV's motion, and camera control inputs for the 3D coverage of a given object of interest. Specifically, the coverage planning problem is formulated in this work as an optimal control problem with logical constraints to enable the UAV agent to jointly: a) select a series of discrete camera field-of-view states which satisfy a set of coverage constraints, and b) optimize its motion control inputs according to a specified mission objective. We show how this hybrid optimal control problem can be solved with standard optimization tools by converting the logical expressions in the constraints into equality/inequality constraints involving only continuous variables. Finally, probabilistic robustness is achieved by integrating the unscented transformation to the proposed controller, thus enabling the design of robust open-loop coverage plans which take into account the future posterior distribution of the UAV's state inside the planning horizon.

約束 · 可辨認的 · 容差 · INFORMS · 查全率/召回率 ·

2023 年 6 月 30 日

Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization

Stephen Hausler,Sourav Garg,Punarjay Chakravarty,Shubham Shrivastava,Ankit Vora,Michael Milford

from arxiv, Accepted to IROS 2023

Most 6-DoF localization and SLAM systems use static landmarks but ignore dynamic objects because they cannot be usefully incorporated into a typical pipeline. Where dynamic objects have been incorporated, typical approaches have attempted relatively sophisticated identification and localization of these objects, limiting their robustness or general utility. In this research, we propose a middle ground, demonstrated in the context of autonomous vehicles, using dynamic vehicles to provide limited pose constraint information in a 6-DoF frame-by-frame PnP-RANSAC localization pipeline. We refine initial pose estimates with a motion model and propose a method for calculating the predicted quality of future pose estimates, triggered based on whether or not the autonomous vehicle's motion is constrained by the relative frame-to-frame location of dynamic vehicles in the environment. Our approach detects and identifies suitable dynamic vehicles to define these pose constraints to modify a pose filter, resulting in improved recall across a range of localization tolerances from $0.25m$ to $5m$, compared to a state-of-the-art baseline single image PnP method and its vanilla pose filtering. Our constraint detection system is active for approximately $35\%$ of the time on the Ford AV dataset and localization is particularly improved when the constraint detection is active.

穩健性 · Use Case · 狀態估計 · Performance · 值域 ·

2023 年 1 月 5 日

Autonomous Drone Racing: A Survey

Drew Hanover,Antonio Loquercio,Leonard Bauersfeld,Angel Romero,Robert Penicka,Yunlong Song,Giovanni Cioffi,Elia Kaufmann,Davide Scaramuzza

from arxiv, 20 pages, submitted to T-RO January 3rd, 2022

Over the last decade, the use of autonomous drone systems for surveying, search and rescue, or last-mile delivery has increased exponentially. With the rise of these applications comes the need for highly robust, safety-critical algorithms which can operate drones in complex and uncertain environments. Additionally, flying fast enables drones to cover more ground which in turn increases productivity and further strengthens their use case. One proxy for developing algorithms used in high-speed navigation is the task of autonomous drone racing, where researchers program drones to fly through a sequence of gates and avoid obstacles as quickly as possible using onboard sensors and limited computational power. Speeds and accelerations exceed over 80 kph and 4 g respectively, raising significant challenges across perception, planning, control, and state estimation. To achieve maximum performance, systems require real-time algorithms that are robust to motion blur, high dynamic range, model uncertainties, aerodynamic disturbances, and often unpredictable opponents. This survey covers the progression of autonomous drone racing across model-based and learning-based approaches. We provide an overview of the field, its evolution over the years, and conclude with the biggest challenges and open questions to be faced in the future.

回合 · 學成 · 強化學習 · INTERACT · 通用智能 ·

2022 年 5 月 13 日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Michael Bradley Johanson,Edward Hughes,Finbarr Timbers,Joel Z. Leibo

Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.

學成 · 強化學習 · Performer · Better · state-of-the-art ·

2020 年 2 月 10 日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Yaodong Yang,Jianye Hao,Guangyong Chen,Hongyao Tang,Yingfeng Chen,Yujing Hu,Changjie Fan,Zhongyu Wei

Recently, deep multiagent reinforcement learning (MARL) has become a highly active research area as many real-world problems can be inherently viewed as multiagent systems. A particularly interesting and widely applicable class of problems is the partially observable cooperative multiagent setting, in which a team of agents learns to coordinate their behaviors conditioning on their private observations and commonly shared global reward signals. One natural solution is to resort to the centralized training and decentralized execution paradigm. During centralized training, one key challenge is the multiagent credit assignment: how to allocate the global rewards for individual agent policies for better coordination towards maximizing system-level's benefits. In this paper, we propose a new method called Q-value Path Decomposition (QPD) to decompose the system's global Q-values into individual agents' Q-values. Unlike previous works which restrict the representation relation of the individual Q-values and the global one, we leverage the integrated gradient attribution technique into deep MARL to directly decompose global Q-values along trajectory paths to assign credits for agents. We evaluate QPD on the challenging StarCraft II micromanagement tasks and show that QPD achieves the state-of-the-art performance in both homogeneous and heterogeneous multiagent scenarios compared with existing cooperative MARL algorithms.

SOFT · Continuity · Better · Performer · state-of-the-art ·

2018 年 4 月 25 日

Multiagent Soft Q-Learning

Ermo Wei,Drew Wicke,David Freelan,Sean Luke

from arxiv, Accepted in AAAI 18 Spring Symposium

Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.