又大又硬又长又粗免费看_国产裸体美女永久免费无遮挡久久_深夜免费在线观看网址_碰碰女人公开免费视频_亚洲欧美另类激情综合区在线观看_久久综合精品国产一区二区三区国产无码_永久免费在线观看不伦一区二区

We are seeking control design paradigms for legged systems that allow bypassing costly algorithms that depend on heavy on-board computers widely used in these systems and yet being able to match what they can do by using less expensive optimization-free frameworks. In this work, we present our preliminary results in modeling and control design of a quadrupedal robot called \textit{Husky Carbon}, which under development at Northeastern University (NU) in Boston. In our approach, we utilized a supervisory controller and an Explicit Reference Governor (ERG) to enforce ground reaction force constraints. These constraints are usually enforced using costly optimizations. However, in this work, the ERG manipulates the state references applied to the supervisory controller to enforce the ground contact constraints through an updated law based on Lyapunov stability arguments. As a result, the approach is much faster to compute than the widely used optimization-based methods.

相關內容

控(kong)制器

關注 5

INTERACT · 可理解性 · 穩健性 · 機器人 · Performer ·

2022 年 1 月 28 日

The need for and feasibility of alternative ground robots to traverse sandy and rocky extraterrestrial terrain

Chen Li,Kevin Lewis

Robotic spacecraft have helped expand our reach for many planetary exploration missions. Most ground mobile planetary exploration robots use wheeled or modified wheeled platforms. Although extraordinarily successful at completing intended mission goals, because of the limitations of wheeled locomotion, they have been largely limited to benign, solid terrain and avoided extreme terrain with loose soil/sand and large rocks. Unfortunately, such challenging terrain is often scientifically interesting for planetary geology. Although many animals traverse such terrain at ease, robots have not matched their performance and robustness. This is in major part due to a lack of fundamental understanding of how effective locomotion can be generated from controlled interaction with complex terrain on the same level of flight aerodynamics and underwater vehicle hydrodynamics. Early fundamental understanding of legged and limbless locomotor-ground interaction has already enabled stable and efficient bio-inspired robot locomotion on relatively flat ground with small obstacles. Recent progress in the new field of terradynamics of locomotor-terrain interaction begins to reveal the principles of bio-inspired locomotion on loose soil/sand and over large obstacles. Multi-legged and limbless platforms using terradynamics insights hold the promise for serving as robust alternative platforms for traversing extreme extraterrestrial terrain and expanding our reach in planetary exploration.

控制器 · 學成 · 損失函數（機器學習） · Performer · contrastive ·

2022 年 1 月 27 日

Learning Constrained Adaptive Differentiable Predictive Control Policies With Guarantees

Jan Drgona,Aaron Tuor,Draguna Vrabie

from arxiv, 31 pages. Code for reproducing our experiments is available at: //github.com/pnnl/deps_arXiv20204 Under review at IEEE Transactions on Automatic Control

We present differentiable predictive control (DPC), a method for learning constrained neural control policies for linear systems with probabilistic performance guarantees. We employ automatic differentiation to obtain direct policy gradients by backpropagating the model predictive control (MPC) loss function and constraints penalties through a differentiable closed-loop system dynamics model. We demonstrate that the proposed method can learn parametric constrained control policies to stabilize systems with unstable dynamics, track time-varying references, and satisfy nonlinear state and input constraints. In contrast with imitation learning-based approaches, our method does not depend on a supervisory controller. Most importantly, we demonstrate that, without losing performance, our method is scalable and computationally more efficient than implicit, explicit, and approximate MPC. Under review at IEEE Transactions on Automatic Control.

歐氏空間 · 圖 · 能量函數 · Weight · 泛函 ·

2022 年 1 月 27 日

On the Hardness of Energy Minimisation for Crystal Structure Prediction

Duncan Adamson,Argyrios Deligkas,Vladimir Gusev,Igor Potapov

from arxiv, Short version published in SOFSEM 2020, full version to be published in Fundamenta Informaticae

Crystal Structure Prediction (csp) is one of the central and most challenging problems in materials science and computational chemistry. In csp, the goal is to find a configuration of ions in 3D space that yields the lowest potential energy. Finding an efficient procedure to solve this complex optimisation question is a well known open problem. Due to the exponentially large search space, the problem has been referred in several materials-science papers as "NP-Hard and very challenging" without a formal proof. This paper fills a gap in the literature providing the first set of formally proven NP-Hardness results for a variant of csp with various realistic constraints. In particular, we focus on the problem of removal: the goal is to find a substructure with minimal potential energy, by removing a subset of the ions. Our main contributions are NP-Hardness results for the csp removal problem, new embeddings of combinatorial graph problems into geometrical settings, and a more systematic exploration of the energy function to reveal the complexity of csp. In a wider context, our results contribute to the analysis of computational problems for weighted graphs embedded into the three-dimensional Euclidean space.

穩健性 · 控制器 · 近似 · 機器人 · 學成 ·

2022 年 1 月 26 日

Robust Disturbance Rejection for Robotic Bipedal Walking: System-Level-Synthesis with Step-to-step Dynamics Approximation

Xiaobin Xiong,Yuxiao Chen,Aaron Ames

from arxiv, in CDC 2021

We present a stepping stabilization control that addresses external push disturbances on bipedal walking robots. The stepping control is synthesized based on the step-to-step (S2S) dynamics of the robot that is controlled to have an approximately constant center of mass (COM) height. We first learn a linear S2S dynamics with bounded model discrepancy from the undisturbed walking behaviors of the robot, where the walking step size is taken as the control input to the S2S dynamics. External pushes are then considered as disturbances to the learned S2S (L-S2S) dynamics. We then apply the system-level-synthesis (SLS) approach on the disturbed L-S2S dynamics to robustly stabilize the robot to the desired walking while satisfying the kinematic constraints of the robot. We successfully realize the proposed approach on the walking of the bipedal robot AMBER and Cassie subject to push disturbances, showing that the approach is general, effective, and computationally-efficient for robust disturbance rejection.

泛函 · 約束 · 強化學習 · Q函數 · 學成 ·

2021 年 6 月 24 日

Density Constrained Reinforcement Learning

Zengyi Qin,Yuxiao Chen,Chuchu Fan

from arxiv, Accepted by ICML, 2021

We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous works. State density has a clear physical and mathematical interpretation, and is able to express a wide variety of constraints such as resource limits and safety requirements. Density constraints can also avoid the time-consuming process of designing and tuning cost functions required by value function-based constraints to encode system specifications. We leverage the duality between density functions and Q functions to develop an effective algorithm to solve the density constrained RL problem optimally and the constrains are guaranteed to be satisfied. We prove that the proposed algorithm converges to a near-optimal solution with a bounded error even when the policy update is imperfect. We use a set of comprehensive experiments to demonstrate the advantages of our approach over state-of-the-art CRL methods, with a wide range of density constrained tasks as well as standard CRL benchmarks such as Safety-Gym.

學成 · 約束 · 強化學習 · contrastive · 評論員 ·

2021 年 5 月 21 日

Inverse Constrained Reinforcement Learning

Usman Anwar,Shehryar Malik,Alireza Aghasi,Ali Ahmed

from arxiv, Camera-ready version for ICML 2021

In real world settings, numerous constraints are present which are hard to specify mathematically. However, for the real world deployment of reinforcement learning (RL), it is critical that RL agents are aware of these constraints, so that they can act safely. In this work, we consider the problem of learning constraints from demonstrations of a constraint-abiding agent's behavior. We experimentally validate our approach and show that our framework can successfully learn the most likely constraints that the agent respects. We further show that these learned constraints are \textit{transferable} to new agents that may have different morphologies and/or reward functions. Previous works in this regard have either mainly been restricted to tabular (discrete) settings, specific types of constraints or assume the environment's transition dynamics. In contrast, our framework is able to learn arbitrary \textit{Markovian} constraints in high-dimensions in a completely model-free setting. The code can be found it: \url{//github.com/shehryar-malik/icrl}.

環 · MAML · 優化器 · 學成 · 小樣本學習 ·

2019 年 9 月 10 日

Meta-Learning with Implicit Gradients

Aravind Rajeswaran,Chelsea Finn,Sham Kakade,Sergey Levine

from arxiv, NeurIPS 2019. First two authors contributed equally

A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.

Integration · 目標檢測 · Extensibility · Performer · 估計/估計量 ·

2018 年 11 月 27 日

Integrated Object Detection and Tracking with Tracklet-Conditioned Detection

Zheng Zhang,Dazhi Cheng,Xizhou Zhu,Stephen Lin,Jifeng Dai

Accurate detection and tracking of objects is vital for effective video understanding. In previous work, the two tasks have been combined in a way that tracking is based heavily on detection, but the detection benefits marginally from the tracking. To increase synergy, we propose to more tightly integrate the tasks by conditioning the object detection in the current frame on tracklets computed in prior frames. With this approach, the object detection results not only have high detection responses, but also improved coherence with the existing tracklets. This greater coherence leads to estimated object trajectories that are smoother and more stable than the jittered paths obtained without tracklet-conditioned detection. Over extensive experiments, this approach is shown to achieve state-of-the-art performance in terms of both detection and tracking accuracy, as well as noticeable improvements in tracking stability.

Atari · 學成 · Performer · 獎勵函數 · MoDELS ·

2018 年 11 月 15 日

Reward learning from human preferences and demonstrations in Atari

Borja Ibarz,Jan Leike,Tobias Pohlen,Geoffrey Irving,Shane Legg,Dario Amodei

from arxiv, NIPS 2018

To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we can have humans communicate an objective to the agent directly. In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences. We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games without using game rewards. Additionally, we investigate the goodness of fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.

確定性策略 · 學成 · 控制器 · 機器人 · Machine Learning ·

2018 年 7 月 16 日

Bipedal Walking Robot using Deep Deterministic Policy Gradient

Arun Kumar,Navneet Paul,S N Omkar

from arxiv, 10 pages, 8 pages

Machine learning algorithms have found several applications in the field of robotics and control systems. The control systems community has started to show interest towards several machine learning algorithms from the sub-domains such as supervised learning, imitation learning and reinforcement learning to achieve autonomous control and intelligent decision making. Amongst many complex control problems, stable bipedal walking has been the most challenging problem. In this paper, we present an architecture to design and simulate a planar bipedal walking robot(BWR) using a realistic robotics simulator, Gazebo. The robot demonstrates successful walking behaviour by learning through several of its trial and errors, without any prior knowledge of itself or the world dynamics. The autonomous walking of the BWR is achieved using reinforcement learning algorithm called Deep Deterministic Policy Gradient(DDPG). DDPG is one of the algorithms for learning controls in continuous action spaces. After training the model in simulation, it was observed that, with a proper shaped reward function, the robot achieved faster walking or even rendered a running gait with an average speed of 0.83 m/s. The gait pattern of the bipedal walker was compared with the actual human walking pattern. The results show that the bipedal walking pattern had similar characteristics to that of a human walking pattern.