国产高清一区二区在线影院,碰碰女人公开免费视频

In this paper, we propose an inverse-kinematics controller for a class of multi-robot systems in the scenario of sampled communication. The goal is to make a group of robots perform trajectory tracking in a coordinated way when the sampling time of communications is much larger than the sampling time of low-level controllers, disrupting theoretical convergence guarantees of standard control design in continuous time. Given a desired trajectory in configuration space which is precomputed offline, the proposed controller receives configuration measurements, possibly via wireless, to re-compute velocity references for the robots, which are tracked by a low-level controller. We propose joint design of a sampled proportional feedback plus a novel continuous-time feedforward that linearizes the dynamics around the reference trajectory: this method is amenable to distributed communication implementation where only one broadcast transmission is needed per sample. Also, we provide closed-form expressions for instability and stability regions and convergence rate in terms of proportional gain $k$ and sampling period $T$. We test the proposed control strategy via numerical simulations in the scenario of cooperative aerial manipulation of a cable-suspended load using a realistic simulator (Fly-Crane). Finally, we compare our proposed controller with centralized approaches that adapt the feedback gain online through smart heuristics, and show that it achieves comparable performance.

相關內容

控制器

關注 5

INTERACT · 控制器 · INFORMS · MoDELS · Continuity ·

2023 年 2 月 1 日

Active Uncertainty Reduction for Safe and Efficient Interaction Planning: A Shielding-Aware Dual Control Approach

Haimin Hu,David Isele,Sangjae Bae,Jaime F. Fisac

from arxiv, arXiv admin note: text overlap with arXiv:2202.07720

The ability to accurately predict the opponent's behavior is central to the safety and efficiency of robotic systems in interactive settings, such as human-robot interaction and multi-robot teaming tasks. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as opponent's goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem. The resulting policy is shown to preserve the dual control effect for a broad class of predictive models with both continuous and categorical uncertainty. To ensure the safe operation of the interacting agents, we leverage a supervisory control scheme, oftentimes referred to as ``shielding'', which overrides the ego agent's dual control policy with a safety fallback strategy when a safety-critical event is imminent. We then augment the dual control framework with an improved variant of the recently proposed shielding-aware robust planning scheme, which proactively balances the nominal planning performance with the risk of high-cost emergency maneuvers triggered by low-probability opponent's behaviors. We demonstrate the efficacy of our approach with both simulated driving examples and hardware experiments using 1/10 scale autonomous vehicles.

回合 · 優化器 · 機器人 · 估計誤差 · state-of-the-art ·

2023 年 1 月 31 日

Chance-Constrained Trajectory Optimization for High-DOF Robots in Uncertain Environments

Charles Dawson,Ashkan Jasour,Andreas Hofmann,Brian Williams

Many practical applications of robotics require systems that can operate safely despite uncertainty. In the context of motion planning, two types of uncertainty are particularly important when planning safe robot trajectories. The first is environmental uncertainty -- uncertainty in the locations of nearby obstacles, stemming from sensor noise or (in the case of obstacles' future locations) prediction error. The second class of uncertainty is uncertainty in the robots own state, typically caused by tracking or estimation error. To achieve high levels of safety, it is necessary for robots to consider both of these sources of uncertainty. In this paper, we propose a risk-bounded trajectory optimization algorithm, known as Sequential Convex Optimization with Risk Optimization (SCORA), to solve chance-constrained motion planning problems despite both environmental uncertainty and tracking error. Through experiments in simulation, we demonstrate that SCORA significantly outperforms state-of-the-art risk-aware motion planners both in planning time and in the safety of the resulting trajectories.

Agent · INTERACT · Performer · MoDELS · Analysis ·

2023 年 1 月 31 日

Evaluating Temporal Observation-Based Causal Discovery Techniques Applied to Road Driver Behaviour

Rhys Howard,Lars Kunze

from arxiv, 26 Pages = 12 Pages (Main Content) + 5 Pages (References) + 9 Pages (Appendix), 2 Figures, To be published in the Proceedings of the 2nd Conference on Causal Learning and Reasoning as part of the Journal of Machine Learning Research Workshop and Conference Proceedings series, Initial submission version

Autonomous robots are required to reason about the behaviour of dynamic agents in their environment. To this end, many approaches assume that causal models describing the interactions of agents are given a priori. However, in many application domains such models do not exist or cannot be engineered. Hence, the learning (or discovery) of high-level causal structures from low-level, temporal observations is a key problem in AI and robotics. However, the application of causal discovery methods to scenarios involving autonomous agents remains in the early stages of research. While a number of methods exist for performing causal discovery on time series data, these usually rely upon assumptions such as sufficiency and stationarity which cannot be guaranteed in interagent behavioural interactions in the real world. In this paper we are applying contemporary observation-based temporal causal discovery techniques to real world and synthetic driving scenarios from multiple datasets. Our evaluation demonstrates and highlights the limitations of state of the art approaches by comparing and contrasting the performance between real and synthetically generated data. Finally, based on our analysis, we discuss open issues related to causal discovery on autonomous robotics scenarios and propose future research directions for overcoming current limitations in the field.

樣本空間 · 統計量 · 樣本 · 流形 · Projection ·

2023 年 1 月 31 日

Geometry of Sample Spaces

Philipp Harms,Peter W. Michor,Xavier Pennec,Stefan Sommer

from arxiv, 29 pages, 1 figure

In statistics, independent, identically distributed random samples do not carry a natural ordering, and their statistics are typically invariant with respect to permutations of their order. Thus, an $n$-sample in a space $M$ can be considered as an element of the quotient space of $M^n$ modulo the permutation group. The present paper takes this definition of sample space and the related concept of orbit types as a starting point for developing a geometric perspective on statistics. We aim at deriving a general mathematical setting for studying the behavior of empirical and population means in spaces ranging from smooth Riemannian manifolds to general stratified spaces. We fully describe the orbifold and path-metric structure of the sample space when $M$ is a manifold or path-metric space, respectively. These results are non-trivial even when $M$ is Euclidean. We show that the infinite sample space exists in a Gromov-Hausdorff type sense and coincides with the Wasserstein space of probability distributions on $M$. We exhibit Fr\'echet means and $k$-means as metric projections onto 1-skeleta or $k$-skeleta in Wasserstein space, and we define a new and more general notion of polymeans. This geometric characterization via metric projections applies equally to sample and population means, and we use it to establish asymptotic properties of polymeans such as consistency and asymptotic normality.

優化器 · Performer · 回合 · 置信度 · CASE ·

2023 年 1 月 30 日

Safe and Adaptive Decision-Making for Optimization of Safety-Critical Systems: The ARTEO Algorithm

Buse Sibel Korkmaz,Marta Zagórowska,Mehmet Mercang?z

We consider the problem of decision-making under uncertainty in an environment with safety constraints. Many business and industrial applications rely on real-time optimization to improve key performance indicators. In the case of unknown characteristics, real-time optimization becomes challenging, particularly because of the satisfaction of safety constraints. We propose the ARTEO algorithm, where we cast multi-armed bandits as a mathematical programming problem subject to safety constraints and learn the unknown characteristics through exploration while optimizing the targets. We quantify the uncertainty in unknown characteristics by using Gaussian processes and incorporate it into the cost function as a contribution which drives exploration. We adaptively control the size of this contribution in accordance with the requirements of the environment. We guarantee the safety of our algorithm with a high probability through confidence bounds constructed under the regularity assumptions of Gaussian processes. We demonstrate the safety and efficiency of our approach with two case studies: optimization of electric motor current and real-time bidding problems. We further evaluate the performance of ARTEO compared to a safe variant of upper confidence bound based algorithms. ARTEO achieves less cumulative regret with accurate and safe decisions.

約束 · 優化器 · 控制器 · INFORMS · 模型評估 ·

2023 年 1 月 30 日

Cooperative trajectory planning algorithm of USV-UAV with hull dynamic constraints

Tao Huang,Zhe Chen,Wang Gao,Zhenfeng Xue,Yong Liu

from arxiv, 11 pages, 9 figures

Efficient trajectory generation in complex dynamic environments remains an open problem in the unmanned surface vehicle (USV). The perception of the USV is usually interfered with by the swing of the hull and the ambient weather, making it challenging to plan the optimal USV trajectories. In this paper, a cooperative trajectory planning algorithm for the coupled USV-UAV system is proposed to ensure that USV can execute a safe and smooth path in the process of autonomous advance in multi-obstacle maps. Specifically, the unmanned aerial vehicle (UAV) plays the role of a flight sensor, providing real-time global map and obstacle information with a lightweight semantic segmentation network and 3D projection transformation. And then, an initial obstacle avoidance trajectory is generated by a graph-based search method. Concerning the unique under-actuated kinematic characteristics of the USV, a numerical optimization method based on hull dynamic constraints is introduced to make the trajectory easier to be tracked for motion control. Finally, a motion control method based on NMPC with the lowest energy consumption constraint during execution is proposed. Experimental results verify the effectiveness of the whole system, and the generated trajectory is locally optimal for USV with considerable tracking accuracy.

Learning · 多樣性 · 獎勵函數 · 強化學習 · 泛函 ·

2023 年 1 月 27 日

Reinforcement Learning from Diverse Human Preferences

Wanqi Xue,Bo An,Shuicheng Yan,Zhongwen Xu

The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques. Describing an agent's desired behaviors and properties can be difficult, even for experts. A new paradigm called reinforcement learning from human preferences (or preference-based RL) has emerged as a promising solution, in which reward functions are learned from human preference labels among behavior trajectories. However, existing methods for preference-based RL are limited by the need for accurate oracle preference labels. This paper addresses this limitation by developing a method for crowd-sourcing preference labels and learning from diverse human preferences. The key idea is to stabilize reward learning through regularization and correction in a latent space. To ensure temporal consistency, a strong constraint is imposed on the reward model that forces its latent space to be close to the prior distribution. Additionally, a confidence-based reward model ensembling method is designed to generate more stable and reliable predictions. The proposed method is tested on a variety of tasks in DMcontrol and Meta-world and has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback, paving the way for real-world applications of RL methods.

穩健性 · 回合 · Learning · 強化學習 · 經驗分布 ·

2023 年 1 月 27 日

Single-Trajectory Distributionally Robust Reinforcement Learning

Zhipeng Liang,Xiaoteng Ma,Jose Blanchet,Jiheng Zhang,Zhengyuan Zhou

from arxiv, First two authors contribute equally

As a framework for sequential decision-making, Reinforcement Learning (RL) has been regarded as an essential component leading to Artificial General Intelligence (AGI). However, RL is often criticized for having the same training environment as the test one, which also hinders its application in the real world. To mitigate this problem, Distributionally Robust RL (DRRL) is proposed to improve the worst performance in a set of environments that may contain the unknown test environment. Due to the nonlinearity of the robustness goal, most of the previous work resort to the model-based approach, learning with either an empirical distribution learned from the data or a simulator that can be sampled infinitely, which limits their applications in simple dynamics environments. In contrast, we attempt to design a DRRL algorithm that can be trained along a single trajectory, i.e., no repeated sampling from a state. Based on the standard Q-learning, we propose distributionally robust Q-learning with the single trajectory (DRQ) and its average-reward variant named differential DRQ. We provide asymptotic convergence guarantees and experiments for both settings, demonstrating their superiority in the perturbed environments against the non-robust ones.

回合 · 學成 · 強化學習 · INTERACT · 通用智能 ·

2022 年 5 月 13 日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Michael Bradley Johanson,Edward Hughes,Finbarr Timbers,Joel Z. Leibo

Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.

可辨認的 · Extensibility · TEAM · 估計/估計量 · 納什均衡 ·

2021 年 9 月 15 日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Brian Reily,Terran Mott,Hao Zhang

Effective multi-robot teams require the ability to move to goals in complex environments in order to address real-world applications such as search and rescue. Multi-robot teams should be able to operate in a completely decentralized manner, with individual robot team members being capable of acting without explicit communication between neighbors. In this paper, we propose a novel game theoretic model that enables decentralized and communication-free navigation to a goal position. Robots each play their own distributed game by estimating the behavior of their local teammates in order to identify behaviors that move them in the direction of the goal, while also avoiding obstacles and maintaining team cohesion without collisions. We prove theoretically that generated actions approach a Nash equilibrium, which also corresponds to an optimal strategy identified for each robot. We show through extensive simulations that our approach enables decentralized and communication-free navigation by a multi-robot system to a goal position, and is able to avoid obstacles and collisions, maintain connectivity, and respond robustly to sensor noise.