2021精品一级毛片一区二区,欧美精品日韩精品国内精品

In order to perform highly dynamic and agile maneuvers, legged robots typically spend time in underactuated domains (e.g. with feet off the ground) where the system has limited command of its acceleration and a constrained amount of time before transitioning to a new domain (e.g. foot touchdown). Meanwhile, these transitions can have instantaneous, unbounded effects on perturbations. These properties make it difficult for local feedback controllers to effectively recover from disturbances as the system evolves through underactuated domains and hybrid impact events. To address this, we utilize the fundamental solution matrix that characterizes the evolution of perturbations through a hybrid trajectory and its 2-norm, which represents the worst-case growth of perturbations. In this paper, the worst-case perturbation analysis is used to explicitly reason about the tracking performance of a hybrid trajectory and is incorporated in an iLQR framework to optimize a trajectory while taking into account the closed-loop convergence of the trajectory under an LQR tracking controller. The generated convergent trajectories are able to recover more effectively from perturbations, are more robust to large disturbances, and use less feedback control effort than trajectories generated with traditional optimization methods.

相關內容

擾動

關注 0

優化器 · 控制器 · Learning · 深度強化學習 · 強化學習 ·

2023 年 5 月 23 日

Solving Stabilize-Avoid Optimal Control via Epigraph Form and Deep Reinforcement Learning

Oswin So,Chuchu Fan

from arxiv, Accepted to Robotics: Science and Systems 2023. Project page can be found at //mit-realm.github.io/efppo

Tasks for autonomous robotic systems commonly require stabilization to a desired region while maintaining safety specifications. However, solving this multi-objective problem is challenging when the dynamics are nonlinear and high-dimensional, as traditional methods do not scale well and are often limited to specific problem structures. To address this issue, we propose a novel approach to solve the stabilize-avoid problem via the solution of an infinite-horizon constrained optimal control problem (OCP). We transform the constrained OCP into epigraph form and obtain a two-stage optimization problem that optimizes over the policy in the inner problem and over an auxiliary variable in the outer problem. We then propose a new method for this formulation that combines an on-policy deep reinforcement learning algorithm with neural network regression. Our method yields better stability during training, avoids instabilities caused by saddle-point finding, and is not restricted to specific requirements on the problem structure compared to more traditional methods. We validate our approach on different benchmark tasks, ranging from low-dimensional toy examples to an F16 fighter jet with a 17-dimensional state space. Simulation results show that our approach consistently yields controllers that match or exceed the safety of existing methods while providing ten-fold increases in stability performance from larger regions of attraction.

Agent · Learning · 納什均衡 · 記憶容量 · 梯度上升 ·

2023 年 5 月 23 日

Memory Asymmetry: A Key to Convergence in Zero-Sum Games

Yuma Fujimoto,Kaito Ariu,Kenshi Abe

from arxiv, 11 pages & 5 figures (main), 4 pages & 1 figure (appendix)

This study provides a new convergence mechanism in learning in games. Learning in games considers how multiple agents maximize their own rewards through repeated plays of games. Especially in two-player zero-sum games, where agents compete with each other for their rewards, the reward of the agent depends on the opponent's strategy. Thus, a critical problem emerges when both agents learn their strategy following standard algorithms such as replicator dynamics and gradient ascent; their learning dynamics often draw cycles and cannot converge to their optimal strategies, i.e., the Nash equilibrium. We tackle this problem with a novel perspective on asymmetry in learning algorithms between the agents. We consider with-memory games where the agents can store the played actions in their memories in order to choose their subsequent actions. In such games, we focus on the asymmetry in memory capacities between the agents. Interestingly, we demonstrate that learning dynamics converge to the Nash equilibrium when the agents have different memory capacities, from theoretical and experimental aspects. Moreover, we give an interpretation of this convergence; the agent with a longer memory can use a more complex strategy, endowing the utility of the other with strict concavity.

Integration · Networking · INFORMS · 估計/估計量 · 通道 ·

2023 年 5 月 22 日

Integrated Sensing, Navigation, and Communication for Secure UAV Networks with a Mobile Eavesdropper

Zhiqiang Wei,Fan Liu,Chang Liu,Zai Yang,Derrick Wing Kwan Ng,Robert Schober

from arxiv, 30 pages, 20 figures, submitted to an IEEE journal for possible publication

This paper proposes an integrated sensing, navigation, and communication (ISNC) framework for safeguarding unmanned aerial vehicle (UAV)-enabled wireless networks against a mobile eavesdropping UAV (E-UAV). To cope with the mobility of the E-UAV, the proposed framework advocates the dual use of artificial noise transmitted by the information UAV (I-UAV) for simultaneous jamming and sensing to facilitate navigation and secure communication. In particular, the I-UAV communicates with legitimate downlink ground users, while avoiding potential information leakage by emitting jamming signals, and estimates the state of the E-UAV with an extended Kalman filter based on the backscattered jamming signals. Exploiting the estimated state of the E-UAV in the previous time slot, the I-UAV determines its flight planning strategy, predicts the wiretap channel, and designs its communication resource allocation policy for the next time slot. To circumvent the severe coupling between these three tasks, a divide-and-conquer approach is adopted. The online navigation design has the objective to minimize the distance between the I-UAV and a pre-defined destination point considering kinematic and geometric constraints. Subsequently, given the predicted wiretap channel, the robust resource allocation design is formulated as an optimization problem to achieve the optimal trade-off between sensing and communication in the next time slot, while taking into account the wiretap channel prediction error and the quality-of-service (QoS) requirements of secure communication. Simulation results demonstrate the superior performance of the proposed design compared with baseline schemes and validate the benefits of integrating sensing and navigation into secure UAV communication systems.

離散化 · 近似 · 優化器 · Performer · 情景 ·

2023 年 5 月 22 日

Towards optimal space-time discretizations for reachable sets of nonlinear systems

Janosch Rieger,Kyria Wawryk

Reachable sets of nonlinear control systems can in general only be approximated numerically, and these approximations are typically very expensive to compute. In this paper, we explore strategies for choosing the temporal and spatial discretizations of Euler's method for reachable set computation in a non-uniform way to improve the performance of the method.

優化器 · 近似 · MoDELS · 示例 · 分解的 ·

2023 年 5 月 20 日

On the approximability and energy-flow modeling of the electric vehicle sharing problem

Welverton R. Silva,Fábio L. Usberti,Rafael C. S. Schouery

The electric vehicle sharing problem (EVSP) arises from the planning and operation of one-way electric car-sharing systems. It aims to maximize the total rental time of a fleet of electric vehicles while ensuring that all the demands of the customer are fulfilled. In this paper, we expand the knowledge on the complexity of the EVSP by showing that it is NP-hard to approximate it to within a factor of $n^{1-\epsilon}$ in polynomial time, for any $\epsilon > 0$, where $n$ denotes the number of customers, unless P = NP. In addition, we also show that the problem does not have a monotone structure, which can be detrimental to the development of heuristics employing constructive strategies. Moreover, we propose a novel approach for the modeling of the EVSP based on energy flows in the network. Based on the new model, we propose a relax-and-fix strategy and an exact algorithm that uses a warm-start solution obtained from our heuristic approach. We report computational results comparing our formulation with the best-performing formulation in the literature. The results show that our formulation outperforms the previous one concerning the number of optimal solutions obtained, optimality gaps, and computational times. Previously, $32.7\%$ of the instances remained unsolved (within a time limit of one hour) by the best-performing formulation in the literature, while our formulation obtained optimal solutions for all instances. To stress our approaches, two more challenging new sets of instances were generated, for which we were able to solve $49.5\%$ of the instances, with an average optimality gap of $2.91\%$ for those not solved optimally.

INFORMS · Agent · 情景 · 近似 · 相互獨立的 ·

2023 年 5 月 20 日

Delegating to Multiple Agents

MohammadTaghi Hajiaghayi,Keivan Rezaei,Suho Shin

We consider a multi-agent delegation mechanism without money. In our model, given a set of agents, each agent has a fixed number of solutions which is exogenous to the mechanism, and privately sends a signal, e.g., a subset of solutions, to the principal. Then, the principal selects a final solution based on the agents' signals. In stark contrast to single-agent setting by Kleinberg and Kleinberg (EC'18) with an approximate Bayesian mechanism, we show that there exists efficient approximate prior-independent mechanisms with both information and performance gain, thanks to the competitive tension between the agents. Interestingly, however, the amount of such a compelling power significantly varies with respect to the information available to the agents, and the degree of correlation between the principal's and the agent's utility. Technically, we conduct a comprehensive study on the multi-agent delegation problem and derive several results on the approximation factors of Bayesian/prior-independent mechanisms in complete/incomplete information settings. As a special case of independent interest, we obtain comparative statics regarding the number of agents which implies the dominance of the multi-agent setting ($n \ge 2$) over the single-agent setting ($n=1$) in terms of the principal's utility. We further extend our problem by considering an examination cost of the mechanism and derive some analogous results in the complete information setting.

近似 · 控制器 · Extensibility · 圖 · Continuity ·

2023 年 5 月 18 日

DiSProD: Differentiable Symbolic Propagation of Distributions for Planning

Palash Chatterjee,Ashutosh Chapagain,Weizhe Chen,Roni Khardon

from arxiv, International Joint Conference on Artificial Intelligence (IJCAI) 2023. For project website, see //pecey.github.io/DiSProD/

The paper introduces DiSProD, an online planner developed for environments with probabilistic transitions in continuous state and action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy's value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. An extensive experimental evaluation compares DiSProD to state-of-the-art planners in discrete-time planning and real-time control of robotic systems. The proposed method improves over existing planners in handling stochastic environments, sensitivity to search depth, sparsity of rewards, and large action spaces. Additional real-world experiments demonstrate that DiSProD can control ground vehicles and surface vessels to successfully navigate around obstacles.

控制器 · 峰值 · 設計 · 機器人 · 回合 ·

2023 年 5 月 18 日

Hybrid Feedback Control Design for Non-Convex Obstacle Avoidance

Mayur Sawant,Abdelhamid Tayebi,Ilia Polushin

We develop an autonomous navigation algorithm for a robot operating in two-dimensional environments containing obstacles, with arbitrary non-convex shapes, which can be in close proximity with each other, as long as there exists at least one safe path connecting the initial and the target location. The proposed navigation approach relies on a hybrid feedback guaranteeing asymptotic stability of target location while ensuring the forward invariance of the obstacle-free workspace. The proposed hybrid feedback controller guarantees Zeno-free switching between the move-to-target mode and the obstacle-avoidance mode based on the proximity of the robot with respect to the obstacle-occupied workspace. An instrumental transformation that reshapes (virtually) the non-convex obstacles, in a non-conservative manner, is introduced to facilitate the design of the obstacle-avoidance strategy. Finally, we provide an algorithmic procedure for the sensor-based implementation of the proposed hybrid controller and validate its effectiveness via some numerical simulations.

估計/估計量 · 統計量 · 噪聲 · 貝葉斯估計 · 推斷 ·

2023 年 5 月 18 日

Bayesian Estimation of Laser Linewidth from Delayed Self-Heterodyne Measurements

Lutz Mertensk?tter,Markus Kantner

from arxiv, 10 pages, 4 figures

We present a statistical inference approach to estimate the frequency noise characteristics of ultra-narrow linewidth lasers from delayed self-heterodyne beat note measurements using Bayesian inference. Particular emphasis is on estimation of the intrinsic (Lorentzian) laser linewidth. The approach is based on a statistical model of the measurement process, taking into account the effects of the interferometer as well as the detector noise. Our method therefore yields accurate results even when the intrinsic linewidth plateau is obscured by detector noise. The regression is performed on periodogram data in the frequency domain using a Markov-chain Monte Carlo method. By using explicit knowledge about the statistical distribution of the observed data, the method yields good results already from a single time series and does not rely on averaging over many realizations, since the information in the available data is evaluated very thoroughly. The approach is demonstrated for simulated time series data from a stochastic laser rate equation model with 1/f-type non-Markovian noise.

回合 · 學成 · 強化學習 · INTERACT · 通用智能 ·

2022 年 5 月 13 日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Michael Bradley Johanson,Edward Hughes,Finbarr Timbers,Joel Z. Leibo

Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.