成人不卡顿免费视频在线,国产亚洲欧美日韩精品色狠二区,中文字幕AV一区二区三区亭亭色,一区二区三区免费在线播放

Real-time planning for a combined problem of target assignment and path planning for multiple agents, also known as the unlabeled version of Multi-Agent Path Finding (MAPF), is crucial for high-level coordination in multi-agent systems, e.g., pattern formation by robot swarms. This paper studies two aspects of unlabeled-MAPF: (1) offline scenario: solving large instances by centralized approaches with small computation time, and (2) online scenario: executing unlabeled-MAPF despite timing uncertainties of real robots. For this purpose, we propose TSWAP, a novel sub-optimal complete algorithm, which takes an arbitrary initial target assignment then repeats one-timestep path planning with target swapping. TSWAP can adapt to both offline and online scenarios. We empirically demonstrate that Offline TSWAP is highly scalable; providing near-optimal solutions while reducing runtime by orders of magnitude compared to existing approaches. In addition, we present the benefits of Online TSWAP, such as delay tolerance, through real-robot demos.

相關內容

可約的

關注 2

易處理的 · 約束 · 離散化 · Processing（編程語言） · 噪聲 ·

2022 年 4 月 20 日

Risk-Averse Receding Horizon Motion Planning

Anushri Dixit,Mohamadreza Ahmadi,Joel W. Burdick

from arxiv, Submitted to Artificial Intelligence Journal, Special Issue on Risk-aware Autonomous Systems: Theory and Practice. arXiv admin note: text overlap with arXiv:2011.11211

This paper studies the problem of risk-averse receding horizon motion planning for agents with uncertain dynamics, in the presence of stochastic, dynamic obstacles. We propose a model predictive control (MPC) scheme that formulates the obstacle avoidance constraint using coherent risk measures. To handle disturbances, or process noise, in the state dynamics, the state constraints are tightened in a risk-aware manner to provide a disturbance feedback policy. We also propose a waypoint following algorithm that uses the proposed MPC scheme for discrete distributions and prove its risk-sensitive recursive feasibility while guaranteeing finite-time task completion. We further investigate some commonly used coherent risk metrics, namely, conditional value-at-risk (CVaR), entropic value-at-risk (EVaR), and g-entropic risk measures, and propose a tractable incorporation within MPC. We illustrate our framework via simulation studies.

Processing（編程語言） · MoDELS · 講稿 · INTERACT · 相互獨立的 ·

2022 年 4 月 20 日

Modeling and Executing Production Processes with Capabilities and Skills using Ontologies and BPMN

Aljosha K?cher,Luis Miguel Vieira da Silva,Alexander Fay

from arxiv, To be submitted to ETFA 2022

Current challenges of the manufacturing industry require modular and changeable manufacturing systems that can be adapted to variable conditions with little effort. At the same time, production recipes typically represent important company know-how that should not be directly tied to changing plant configurations. Thus, there is a need to model general production recipes independent of specific plant layouts. For execution of such a recipe however, a binding to then available production resources needs to be made. In this contribution, select a suitable modeling language to model and execute such recipes. Furthermore, we present an approach to solve the issue of recipe modeling and execution in modular plants using semantically modeled capabilities and skills as well as BPMN. We make use of BPMN to model \emph{capability processes}, i.e. production processes referencing abstract descriptions of resource functions. These capability processes are not bound to a certain plant layout, as there can be multiple resources fulfilling the same capability. For execution, every capability in a capability process is replaced by a skill realizing it, effectively creating a \emph{skill process} consisting of various skill invocations. The presented solution is capable of orchestrating and executing complex processes that integrate production steps with typical IT functionalities such as error handling, user interactions and notifications. Benefits of the approach are demonstrated using a flexible manufacturing system.

蒙特卡洛樹搜索 · 蒙特卡羅 · 回合 · Integration · 部分可觀測馬爾可夫決策過程 ·

2022 年 4 月 20 日

Cooperative Trajectory Planning in Uncertain Environments with Monte Carlo Tree Search and Risk Metrics

Philipp Stegmaier,Karl Kurzer,J. Marius Z?llner

Automated vehicles require the ability to cooperate with humans for smooth integration into today's traffic. While the concept of cooperation is well known, developing a robust and efficient cooperative trajectory planning method is still a challenge. One aspect of this challenge is the uncertainty surrounding the state of the environment due to limited sensor accuracy. This uncertainty can be represented by a Partially Observable Markov Decision Process. Our work addresses this problem by extending an existing cooperative trajectory planning approach based on Monte Carlo Tree Search for continuous action spaces. It does so by explicitly modeling uncertainties in the form of a root belief state, from which start states for trees are sampled. After the trees have been constructed with Monte Carlo Tree Search, their results are aggregated into return distributions using kernel regression. We apply two risk metrics for the final selection, namely a Lower Confidence Bound and a Conditional Value at Risk. It can be demonstrated that the integration of risk metrics in the final selection policy consistently outperforms a baseline in uncertain environments, generating considerably safer trajectories.

回合 · Extensibility · Continuity · 圖 · 講稿 ·

2022 年 4 月 20 日

Dynamic Free-Space Roadmap for Safe Quadrotor Motion Planning

Junlong Guo,Zhiren Xun,Shuang Geng,Yi Lin,Chao Xu,Fei Gao

Free-space-oriented roadmaps typically generate a series of convex geometric primitives, which constitute the safe region for motion planning. However, a static environment is assumed for this kind of roadmap. This assumption makes it unable to deal with dynamic obstacles and limits its applications. In this paper, we present a dynamic free-space roadmap, which provides feasible spaces and a navigation graph for safe quadrotor motion planning. Our roadmap is constructed by continuously seeding and extracting free regions in the environment. In order to adapt our map to environments with dynamic obstacles, we incrementally decompose the polyhedra intersecting with obstacles into obstacle-free regions, while the graph is also updated by our well-designed mechanism. Extensive simulations and real-world experiments demonstrate that our method is practically applicable and efficient.

學成 · INFORMS · MoDELS · 深度強化學習 · 強化學習 ·

2022 年 4 月 19 日

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Jinmei Liu,Zhi Wang,Chunlin Chen,Daoyi Dong

from arxiv, 16 pages, 6 figures, under review

Bayesian policy reuse (BPR) is a general policy transfer framework for selecting a source policy from an offline library by inferring the task belief based on some observation signals and a trained observation model. In this paper, we propose an improved BPR method to achieve more efficient policy transfer in deep reinforcement learning (DRL). First, most BPR algorithms use the episodic return as the observation signal that contains limited information and cannot be obtained until the end of an episode. Instead, we employ the state transition sample, which is informative and instantaneous, as the observation signal for faster and more accurate task inference. Second, BPR algorithms usually require numerous samples to estimate the probability distribution of the tabular-based observation model, which may be expensive and even infeasible to learn and maintain, especially when using the state transition sample as the signal. Hence, we propose a scalable observation model based on fitting state transition functions of source tasks from only a small number of samples, which can generalize to any signals observed in the target task. Moreover, we extend the offline-mode BPR to the continual learning setting by expanding the scalable observation model in a plug-and-play fashion, which can avoid negative transfer when faced with new unknown tasks. Experimental results show that our method can consistently facilitate faster and more efficient policy transfer.

學成 · 獎勵函數 · 穩健性 · FAST · 逆強化學習 ·

2022 年 4 月 18 日

Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning

Carl Qi,Pieter Abbeel,Aditya Grover

The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal. A popular class of approach infers the (unknown) reward function via inverse reinforcement learning (IRL) followed by maximizing this reward function via reinforcement learning (RL). The policies learned via these approaches are however very brittle in practice and deteriorate quickly even with small test-time perturbations due to compounding errors. We propose Imitation with Planning at Test-time (IMPLANT), a new meta-algorithm for imitation learning that utilizes decision-time planning to correct for compounding errors of any base imitation policy. In contrast to existing approaches, we retain both the imitation policy and the rewards model at decision-time, thereby benefiting from the learning signal of the two components. Empirically, we demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments and excels at zero-shot generalization when subject to challenging perturbations in test-time dynamics.

主動學習 · 學成 · 查準率/準確率 · Processing（編程語言） · 輸入空間 ·

2022 年 4 月 18 日

Active Learning with Weak Labels for Gaussian Processes

Amanda Olmin,Jakob Lindqvist,Lennart Svensson,Fredrik Lindsten

Annotating data for supervised learning can be costly. When the annotation budget is limited, active learning can be used to select and annotate those observations that are likely to give the most gain in model performance. We propose an active learning algorithm that, in addition to selecting which observation to annotate, selects the precision of the annotation that is acquired. Assuming that annotations with low precision are cheaper to obtain, this allows the model to explore a larger part of the input space, with the same annotation costs. We build our acquisition function on the previously proposed BALD objective for Gaussian Processes, and empirically demonstrate the gains of being able to adjust the annotation precision in the active learning loop.

學成 · 強化學習 · 多樣性 · HTTPS · 泛化理論 ·

2022 年 4 月 18 日

MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning

Quanyi Li,Zhenghao Peng,Lan Feng,Qihang Zhang,Zhenghai Xue,Bolei Zhou

from arxiv, Source code, documentation, and demo video are available at //metadriverse.github.io/metadrive . More research projects based on MetaDrive simulator are listed at //metadriverse.github.io

Driving safely requires multiple capabilities from human and intelligent agents, such as the generalizability to unseen environments, the safety awareness of the surrounding traffic, and the decision-making in complex multi-agent settings. Despite the great success of Reinforcement Learning (RL), most of the RL research works investigate each capability separately due to the lack of integrated environments. In this work, we develop a new driving simulation platform called MetaDrive to support the research of generalizable reinforcement learning algorithms for machine autonomy. MetaDrive is highly compositional, which can generate an infinite number of diverse driving scenarios from both the procedural generation and the real data importing. Based on MetaDrive, we construct a variety of RL tasks and baselines in both single-agent and multi-agent settings, including benchmarking generalizability across unseen scenes, safe exploration, and learning multi-agent traffic. The generalization experiments conducted on both procedurally generated scenarios and real-world scenarios show that increasing the diversity and the size of the training set leads to the improvement of the generalizability of the RL agents. We further evaluate various safe reinforcement learning and multi-agent reinforcement learning algorithms in MetaDrive environments and provide the benchmarks. Source code, documentation, and demo video are available at //metadriverse.github.io/metadrive . More research projects based on MetaDrive simulator are listed at //metadriverse.github.io

線性模型 · MoDELS · 線性的 · 離散化 · 推斷 ·

2022 年 4 月 15 日

Warped Dynamic Linear Models for Time Series of Counts

Brian King,Daniel R. Kowal

Dynamic Linear Models (DLMs) are commonly employed for time series analysis due to their versatile structure, simple recursive updating, ability to handle missing data, and probabilistic forecasting. However, the options for count time series are limited: Gaussian DLMs require continuous data, while Poisson-based alternatives often lack sufficient modeling flexibility. We introduce a novel semiparametric methodology for count time series by warping a Gaussian DLM. The warping function has two components: a (nonparametric) transformation operator that provides distributional flexibility and a rounding operator that ensures the correct support for the discrete data-generating process. We develop conjugate inference for the warped DLM, which enables analytic and recursive updates for the state space filtering and smoothing distributions. We leverage these results to produce customized and efficient algorithms for inference and forecasting, including Monte Carlo simulation for offline analysis and an optimal particle filter for online inference. This framework unifies and extends a variety of discrete time series models and is valid for natural counts, rounded values, and multivariate observations. Simulation studies illustrate the excellent forecasting capabilities of the warped DLM. The proposed approach is applied to a multivariate time series of daily overdose counts and demonstrates both modeling and computational successes.

學成 · Lipschitz常數 · 泛函 · Lipschitz · 損失函數（機器學習） ·

2022 年 4 月 15 日

Learning to Accelerate by the Methods of Step-size Planning

Hengshuai Yao

Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An important technique for acceleration is step-size adaptation. The first part of this paper contains a detailed review of step-size adaptation methods, including Polyak step-size, L4, LossGrad, Adam, IDBD, and Hypergradient descent, and the relation of step-size adaptation to meta-gradient methods. In the second part of this paper, we propose a new class of methods of accelerating gradient descent that have some distinctiveness from existing techniques. The new methods, which we call {\em step-size planning}, use the {\em update experience} to learn an improved way of updating the parameters. The methods organize the experience into $K$ steps away from each other to facilitate planning. From the past experience, our planning algorithm, Csawg, learns a step-size model which is a form of multi-step machine that predicts future updates. We extends Csawg to applying step-size planning multiple steps, which leads to further speedup. We discuss and highlight the projection power of the diagonal-matrix step-size for future large scale applications. We show for a convex problem, our methods can surpass the convergence rate of Nesterov's accelerated gradient, $1 - \sqrt{\mu/L}$, where $\mu, L$ are the strongly convex factor of the loss function $F$ and the Lipschitz constant of $F'$, which is the theoretical limit for the convergence rate of first-order methods. On the well-known non-convex Rosenbrock function, our planning methods achieve zero error below 500 gradient evaluations, while gradient descent takes about 10000 gradient evaluations to reach a $10^{-3}$ accuracy. We discuss the connection of step-size planing to planning in reinforcement learning, in particular, Dyna architectures.