久久久久精品电影_女女啪啪激烈高潮喷出网站免费_国产美腿美女被操_亚洲综合国产精品一区二区99_萌白酱国产一区二区在线_久久久久久精品福利中文字幕_欧美视频在线一区二区三区

Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages. On one hand, RL approaches are able to learn global control policies directly from data, but generally require large sample sizes to properly converge towards feasible policies. On the other hand, TO methods are able to exploit gradient-based information extracted from simulators to quickly converge towards a locally optimal control trajectory which is only valid within the vicinity of the solution. Over the past decade, several approaches have aimed to adequately combine the two classes of methods in order to obtain the best of both worlds. Following on from this line of research, we propose several improvements on top of these approaches to learn global control policies quicker, notably by leveraging sensitivity information stemming from TO methods via Sobolev learning, and augmented Lagrangian techniques to enforce the consensus between TO and policy learning. We evaluate the benefits of these improvements on various classical tasks in robotics through comparison with existing approaches in the literature.

相關內容

Learning

關注 12

搜索 · 空間探測 · 訓練數據 · 航拍圖像 · 空間相關性 ·

2023 年 4 月 5 日

A Visual Active Search Framework for Geospatial Exploration

Anindya Sarkar,Michael Lanier,Scott Alfeld,Jiarui Feng,Roman Garnett,Nathan Jacobs,Yevgeniy Vorobeychik

from arxiv, A Pre-print Version, 21 pages, 15 figures, Code is available at: //github.com/anindyasarkarIITH/VAS

Many problems can be viewed as forms of geospatial search aided by aerial imagery, with examples ranging from detecting poaching activity to human trafficking. We model this class of problems in a visual active search (VAS) framework, which takes as input an image of a broad area, and aims to identify as many examples of a target object as possible. It does this through a limited sequence of queries, each of which verifies whether an example is present in a given region. A crucial feature of VAS is that each such query is informative about the spatial distribution of target objects beyond what is captured visually (for example, due to spatial correlation). We propose a reinforcement learning approach for VAS that leverages a collection of fully annotated search tasks as training data to learn a search policy, and combines features of the input image with a natural representation of active search state. Additionally, we propose domain adaptation techniques to improve the policy at decision time when training data is not fully reflective of the test-time distribution of VAS tasks. Through extensive experiments on several satellite imagery datasets, we show that the proposed approach significantly outperforms several strong baselines. Code and data will be made public.

離線強化學習 · 模仿學習 · 強化學習 · 離策略 · 組合性 ·

2023 年 4 月 5 日

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

Haoran Xu,Li Jiang,Jianxiong Li,Xianyuan Zhan

from arxiv, Oral @ NeurIPS 2022; An extended version with more experiments & correct some experiment details in previous version

Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution generalization but suffer from erroneous off-policy evaluation. Imitation-based methods avoid off-policy evaluation but are too conservative to surpass the dataset. In this study, we propose an alternative approach, inheriting the training stability of imitation-style methods while still allowing logical out-of-distribution generalization. We decompose the conventional reward-maximizing policy in offline RL into a guide-policy and an execute-policy. During training, the guide-poicy and execute-policy are learned using only data from the dataset, in a supervised and decoupled manner. During evaluation, the guide-policy guides the execute-policy by telling where it should go so that the reward can be maximized, serving as the \textit{Prophet}. By doing so, our algorithm allows \textit{state-compositionality} from the dataset, rather than \textit{action-compositionality} conducted in prior imitation-style methods. We dumb this new approach Policy-guided Offline RL (\texttt{POR}). \texttt{POR} demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline RL. We also highlight the benefits of \texttt{POR} in terms of improving with supplementary suboptimal data and easily adapting to new tasks by only changing the guide-poicy.

風險感知 · 值函數 · 價值函數 · 多智能體強化學習 · 智能體 ·

2023 年 4 月 4 日

Risk-Aware Distributed Multi-Agent Reinforcement Learning

Abdullah Al Maruf,Luyao Niu,Bhaskar Ramasubramanian,Andrew Clark,Radha Poovendran

Autonomous cyber and cyber-physical systems need to perform decision-making, learning, and control in unknown environments. Such decision-making can be sensitive to multiple factors, including modeling errors, changes in costs, and impacts of events in the tails of probability distributions. Although multi-agent reinforcement learning (MARL) provides a framework for learning behaviors through repeated interactions with the environment by minimizing an average cost, it will not be adequate to overcome the above challenges. In this paper, we develop a distributed MARL approach to solve decision-making problems in unknown environments by learning risk-aware actions. We use the conditional value-at-risk (CVaR) to characterize the cost function that is being minimized, and define a Bellman operator to characterize the value function associated to a given state-action pair. We prove that this operator satisfies a contraction property, and that it converges to the optimal value function. We then propose a distributed MARL algorithm called the CVaR QD-Learning algorithm, and establish that value functions of individual agents reaches consensus. We identify several challenges that arise in the implementation of the CVaR QD-Learning algorithm, and present solutions to overcome these. We evaluate the CVaR QD-Learning algorithm through simulations, and demonstrate the effect of a risk parameter on value functions at consensus.

切割 · 交互控制 · 機器人 · 交互 · 最優 ·

2023 年 4 月 3 日

Learning robotic milling strategies based on passive variable operational space interaction control

Jamie Hathaway,Alireza Rastegarpanah,Rustam Stolkin

from arxiv, 15 pages, 14 figures, submitted to IEEE Transactions on Automation Science and Engineering (T-ASE)

This paper addresses the problem of robotic cutting during disassembly of products for materials separation and recycling. Waste handling applications differ from milling in manufacturing processes, as they engender considerable variety and uncertainty in the parameters (e.g. hardness) of materials which the robot must cut. To address this challenge, we propose a learning-based approach incorporating elements of interaction control, in which the robot can adapt key parameters, such as feed rate, depth of cut, and mechanical compliance during task execution. We show how a mathematical model of cutting mechanics, embedded in a simulation environment, can be used to rapidly train the system without needing large amounts of data from physical cutting trials. The simulation approach was validated on a real robot setup based on four case study materials with varying structural and mechanical properties. We demonstrate the proposed method minimises process force and path deviations to a level similar to offline optimal planning methods, while the average time to complete a cutting task is within 25% of the optimum, at the expense of reduced volume of material removed per pass. A key advantage of our approach over similar works is that no prior knowledge about the material is required.

學習效率 · 最優策略 · 強化學習 · 算法 · 最優 ·

2023 年 4 月 3 日

Action Pick-up in Dynamic Action Space Reinforcement Learning

Jiaqi Ye,Xiaodong Li,Pangjing Wu,Feng Wang

Most reinforcement learning algorithms are based on a key assumption that Markov decision processes (MDPs) are stationary. However, non-stationary MDPs with dynamic action space are omnipresent in real-world scenarios. Yet problems of dynamic action space reinforcement learning have been studied by many previous works, how to choose valuable actions from new and unseen actions to improve learning efficiency remains unaddressed. To tackle this problem, we propose an intelligent Action Pick-up (AP) algorithm to autonomously choose valuable actions that are most likely to boost performance from a set of new actions. In this paper, we first theoretically analyze and find that a prior optimal policy plays an important role in action pick-up by providing useful knowledge and experience. Then, we design two different AP methods: frequency-based global method and state clustering-based local method, based on the prior optimal policy. Finally, we evaluate the AP on two simulated but challenging environments where action spaces vary over time. Experimental results demonstrate that our proposed AP has advantages over baselines in learning efficiency.

控制器 · 貝葉斯 · 融合 · 機器人 · 深度強化學習 ·

2023 年 4 月 3 日

Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics

Krishan Rana,Vibhavari Dasagi,Jesse Haviland,Ben Talbot,Michael Milford,Niko Sünderhauf

from arxiv, The International Journal of Robotics Research (IJRR), 2023. Project page: //krishanrana.github.io/bcf

We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, simple handcrafted controllers exist that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration and deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF's applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world. BCF is a promising approach towards combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at //krishanrana.github.io/bcf.

控制策略 · 移動機器人 · 機器人 · 控制器 · 特征線 ·

2023 年 4 月 2 日

User-Conditioned Neural Control Policies for Mobile Robotics

Leonard Bauersfeld,Elia Kaufmann,Davide Scaramuzza

from arxiv, 6 pages + 1 pages references

Recently, learning-based controllers have been shown to push mobile robotic systems to their limits and provide the robustness needed for many real-world applications. However, only classical optimization-based control frameworks offer the inherent flexibility to be dynamically adjusted during execution by, for example, setting target speeds or actuator limits. We present a framework to overcome this shortcoming of neural controllers by conditioning them on an auxiliary input. This advance is enabled by including a feature-wise linear modulation layer (FiLM). We use model-free reinforcement-learning to train quadrotor control policies for the task of navigating through a sequence of waypoints in minimum time. By conditioning the policy on the maximum available thrust or the viewing direction relative to the next waypoint, a user can regulate the aggressiveness of the quadrotor's flight during deployment. We demonstrate in simulation and in real-world experiments that a single control policy can achieve close to time-optimal flight performance across the entire performance envelope of the robot, reaching up to 60 km/h and 4.5g in acceleration. The ability to guide a learned controller during task execution has implications beyond agile quadrotor flight, as conditioning the control policy on human intent helps safely bringing learning based systems out of the well-defined laboratory environment into the wild.

擾動 · 反饋控制 · 軌跡規劃 · 混合 · 控制器 ·

2023 年 4 月 1 日

Convergent iLQR for Safe Trajectory Planning and Control of Legged Robots

James Zhu,J. Joe Payne,Aaron M. Johnson

from arxiv, Submitted to CDC 2023

In order to perform highly dynamic and agile maneuvers, legged robots typically spend time in underactuated domains (e.g. with feet off the ground) where the system has limited command of its acceleration and a constrained amount of time before transitioning to a new domain (e.g. foot touchdown). Meanwhile, these transitions can have instantaneous, unbounded effects on perturbations. These properties make it difficult for local feedback controllers to effectively recover from disturbances as the system evolves through underactuated domains and hybrid impact events. To address this, we utilize the fundamental solution matrix that characterizes the evolution of perturbations through a hybrid trajectory and its 2-norm, which represents the worst-case growth of perturbations. In this paper, the worst-case perturbation analysis is used to explicitly reason about the tracking performance of a hybrid trajectory and is incorporated in an iLQR framework to optimize a trajectory while taking into account the closed-loop convergence of the trajectory under an LQR tracking controller. The generated convergent trajectories are able to recover more effectively from perturbations, are more robust to large disturbances, and use less feedback control effort than trajectories generated with traditional optimization methods.

分解 · 運動規劃 · 搜索空間 · 搜索 · 多智能體 ·

2023 年 4 月 1 日

Factorization of Multi-Agent Sampling-Based Motion Planning

Alessandro Zanardi,Pietro Zullo,Andrea Censi,Emilio Frazzoli

from arxiv, under review

Modern robotics often involves multiple embodied agents operating within a shared environment. Path planning in these cases is considerably more challenging than in single-agent scenarios. Although standard Sampling-based Algorithms (SBAs) can be used to search for solutions in the robots' joint space, this approach quickly becomes computationally intractable as the number of agents increases. To address this issue, we integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods. During the search for a solution we can decouple (i.e., factorize) different subsets of agents into independent lower-dimensional search spaces once we certify that their future solutions will be independent of each other using a factorization heuristic. Consequently, we progressively construct a lean hypergraph where certain (hyper-)edges split the agents to independent subgraphs. In the best case, this approach can reduce the growth in dimensionality of the search space from exponential to linear in the number of agents. On average, fewer samples are needed to find high-quality solutions while preserving the optimality, completeness, and anytime properties of SBAs. We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.

學成 · INTERACT · Storage · TOOLS · Processing（編程語言） ·

2020 年 3 月 25 日

A Survey on Trajectory Data Management, Analytics, and Learning

Sheng Wang,Zhifeng Bao,J. Shane Culpepper,Gao Cong

from arxiv, 36 page, 12 figures

Recent advances in sensor and mobile devices have enabled an unprecedented increase in the availability and collection of urban trajectory data, thus increasing the demand for more efficient ways to manage and analyze the data being produced. In this survey, we comprehensively review recent research trends in trajectory data management, ranging from trajectory pre-processing, storage, common trajectory analytic tools, such as querying spatial-only and spatial-textual trajectory data, and trajectory clustering. We also explore four closely related analytical tasks commonly used with trajectory data in interactive or real-time processing. Deep trajectory learning is also reviewed for the first time. Finally, we outline the essential qualities that a trajectory management system should possess in order to maximize flexibility.