苹果电影在线观看免费高清,欧美亚州视频一区二区三区,亚洲国产精品超久久久久久久久

Neural network controllers have become popular in control tasks thanks to their flexibility and expressivity. Stability is a crucial property for safety-critical dynamical systems, while stabilization of partially observed systems, in many cases, requires controllers to retain and process long-term memories of the past. We consider the important class of recurrent neural networks (RNN) as dynamic controllers for nonlinear uncertain partially-observed systems, and derive convex stability conditions based on integral quadratic constraints, S-lemma and sequential convexification. To ensure stability during the learning and control process, we propose a projected policy gradient method that iteratively enforces the stability conditions in the reparametrized space taking advantage of mild additional information on system dynamics. Numerical experiments show that our method learns stabilizing controllers while using fewer samples and achieving higher final performance compared with policy gradient.

相關內容

控制器

關注 5

狀態空間 · Neural Networks · 回合 · 值迭代 · INTERACT ·

2022 年 2 月 13 日

Strategy Synthesis for Zero-sum Neuro-symbolic Concurrent Stochastic Games

Rui Yan,Gabriel Santos,Gethin Norman,David Parker,Marta Kwiatkowska

from arxiv, 21 pages, 7 figures

Neuro-symbolic approaches to artificial intelligence, which combine neural networks with classical symbolic techniques, are growing in prominence, necessitating formal approaches to reason about their correctness. We propose a novel modelling formalism called neuro-symbolic concurrent stochastic games (NS-CSGs), which comprise a set of probabilistic finite-state agents interacting in a shared continuous-state environment, observed through perception mechanisms implemented as neural networks. Since the environment state space is continuous, we focus on the class of NS-CSGs with Borel state spaces and Borel measurability restrictions on the components of the model. We consider the problem of zero-sum discounted cumulative reward, proving that NS-CSGs are determined and therefore have a value which corresponds to a unique fixed point. From an algorithmic perspective, existing methods to compute values and optimal strategies for CSGs focus on finite state spaces. We present, for the first time, value iteration and policy iteration algorithms to solve a class of uncountable state space CSGs, and prove their convergence. Our approach works by formulating piecewise linear or constant representations of the value functions and strategies of NS-CSGs. We validate the approach with a prototype implementation applied to a dynamic vehicle parking example.

可行 · MoDELS · state-of-the-art · 控制器 · Buffer（公司） ·

2022 年 2 月 12 日

Recursive Feasibility and Deadlock Resolution in MPC-based Multi-robot Trajectory Generation

Yuda Chen,Meng Guo,Zhongkui Li

from arxiv, 16 pages, 15 figures

Collision-free trajectory generation within a shared workspace is fundamental for most multi-robot applications. However, despite of their versatility, many widely-used methods based on model predictive control (MPC) lack theoretical guarantees on the feasibility of underlying optimization. Furthermore, when applied in a distributed manner, deadlocks often occur where several robots block each other indefinitely without resolution. Towards this end, we propose a systematic method called infinite-horizon model predictive control with deadlock resolution (IMPC-DR). It can provably ensure recursive feasibility and effectively resolve deadlocks online in addition to the handling of input and model constraints. The method is based on formulating a convex optimization over the proposed modified buffered Voronoi cells in each planning horizon. Moreover, it is fully distributed and requires only local inter-robot communication. Comprehensive simulation and experiment studies are conducted over large-scale multi-robot systems. Significant improvements of both feasibility and success rate are shown, in comparison with other state-of-the-art methods and especially in crowded and high-speed scenarios.

預測器/決策函數 · 泛化理論 · 概率近似正確 · Extensibility · 經驗誤差 ·

2022 年 2 月 11 日

Failure Prediction with Statistical Guarantees for Vision-Based Robot Control

Alec Farid,David Snyder,Allen Z. Ren,Anirudha Majumdar

We are motivated by the problem of performing failure prediction for safety-critical robotic systems with high-dimensional sensor observations (e.g., vision). Given access to a blackbox control policy (e.g., in the form of a neural network) and a dataset of training environments, we present an approach for synthesizing a failure predictor with guaranteed bounds on false-positive and false-negative errors. In order to achieve this, we utilize techniques from Probably Approximately Correct (PAC)-Bayes generalization theory. In addition, we present novel class-conditional bounds that allow us to tradeoff the relative rates of false-positive vs. false-negative errors. We propose algorithms that train failure predictors (that take as input the history of sensor observations) by minimizing our theoretical error bounds. We demonstrate the resulting approach using extensive simulation and hardware experiments for vision-based navigation with a drone and grasping objects with a robotic manipulator equipped with a wrist-mounted RGB-D camera. These experiments illustrate the ability of our approach to (1) provide strong bounds on failure prediction error rates (that closely match empirical error rates), and (2) improve safety by predicting failures.

線性的 · SEC · 符號學 ·

2022 年 2 月 11 日

The Factorial-Basis Method for Finding Definite-Sum Solutions of Linear Recurrences With Polynomial Coefficients

Antonio Jiménez-Pastor,Marko Petkov?ek

from arxiv, 54 pages

The problem of finding a nonzero solution of a linear recurrence $Ly = 0$ with polynomial coefficients where $y$ has the form of a definite hypergeometric sum, related to the Inverse Creative Telescoping Problem of [14][Sec. 8], has now been open for three decades. Here we present an algorithm (implemented in a SageMath package) which, given such a recurrence and a quasi-triangular, shift-compatible factorial basis $\mathcal{B} = \langle P_k(n)\rangle_{k=0}^\infty$ of the polynomial space $\mathbb{K}[n]$ over a field $\mathbb{K}$ of characteristic zero, computes a recurrence satisfied by the coefficient sequence $c = \langle c_k\rangle_{k=0}^\infty$ of the solution $y_n = \sum_{k=0}^\infty c_kP_k(n)$ (where, thanks to the quasi-triangularity of $\mathcal{B}$, the sum on the right terminates for each $n \in \mathbb{N}$). More generally, if $\mathcal{B}$ is $m$-sieved for some $m \in \mathbb{N}$, our algorithm computes a system of $m$ recurrences satisfied by the $m$-sections of the coefficient sequence $c$. If an explicit nonzero solution of this system can be found, we obtain an explicit nonzero solution of $Ly = 0$.

Performer · 泛化理論 · 概率近似正確 · 學成 · 強化學習 ·

2022 年 2 月 10 日

Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

Kai-Chieh Hsu,Allen Z. Ren,Duy Phuong Nguyen,Anirudha Majumdar,Jaime F. Fisac

from arxiv, Preprint submitted to Special Issue on Risk-aware Autonomous Systems: Theory and Practice, Artificial Intelligence Journal

Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to safely close the reality gap. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the reach-avoid Bellman Equation based on Hamilton-Jacobi reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments including a photo-realistic one. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot. See //sites.google.com/princeton.edu/sim-to-lab-to-real for supplementary material.

Performer · Aliasing · 蒸餾 · 回合 · 可約的 ·

2022 年 2 月 10 日

D2A-BSP: Distilled Data Association Belief Space Planning with Performance Guarantees Under Budget Constraints

Moshe Shienman,Vadim Indelman

from arxiv, 8 pages, 2 figures, submitted to IEEE International Conference on Robotics and Automation (ICRA) 2022

Unresolved data association in ambiguous and perceptually aliased environments leads to multi-modal hypotheses on both the robot's and the environment state. To avoid catastrophic results, when operating in such ambiguous environments, it is crucial to reason about data association within Belief Space Planning (BSP). However, explicitly considering all possible data associations, the number of hypotheses grows exponentially with the planning horizon and determining the optimal action sequence quickly becomes intractable. Moreover, with hard budget constraints where some non-negligible hypotheses must be pruned, achieving performance guarantees is crucial. In this work we present a computationally efficient novel approach that utilizes only a distilled subset of hypotheses to solve BSP problems while reasoning about data association. Furthermore, to provide performance guarantees, we derive error bounds with respect to the optimal solution. We then demonstrate our approach in an extremely aliased environment, where we manage to significantly reduce computation time without compromising on the quality of the solution.

MoDELS · 學成 · Networking · 動力系統 · Neural Networks ·

2022 年 2 月 4 日

On Neural Differential Equations

Patrick Kidger

from arxiv, Doctoral thesis, Mathematical Institute, University of Oxford. 231 pages

The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.

Continuity · 評論員 · 估計/估計量 · 正則化項 · 欠估計 ·

2021 年 6 月 6 日

Efficient Continuous Control with Double Actors and Regularized Critics

Jiafei Lyu,Xiaoteng Ma,Jiangpeng Yan,Xiu Li

from arxiv, 21 pages

How to obtain good value estimation is one of the key problems in Reinforcement Learning (RL). Current value estimation methods, such as DDPG and TD3, suffer from unnecessary over- or underestimation bias. In this paper, we explore the potential of double actors, which has been neglected for a long time, for better value function estimation in continuous setting. First, we uncover and demonstrate the bias alleviation property of double actors by building double actors upon single critic and double critics to handle overestimation bias in DDPG and underestimation bias in TD3 respectively. Next, we interestingly find that double actors help improve the exploration ability of the agent. Finally, to mitigate the uncertainty of value estimate from double critics, we further propose to regularize the critic networks under double actors architecture, which gives rise to Double Actors Regularized Critics (DARC) algorithm. Extensive experimental results on challenging continuous control tasks show that DARC significantly outperforms state-of-the-art methods with higher sample efficiency.

采樣法 · 方差 · 圖形處理器 · INFORMS · 泛化理論 ·

2020 年 6 月 24 日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Weilin Cong,Rana Forsati,Mahmut Kandemir,Mehrdad Mahdavi

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

噪聲 · 參數空間 · Continuity · 離散化 · 學成 ·

2018 年 1 月 31 日

Parameter Space Noise for Exploration

Matthias Plappert,Rein Houthooft,Prafulla Dhariwal,Szymon Sidor,Richard Y. Chen,Xi Chen,Tamim Asfour,Pieter Abbeel,Marcin Andrychowicz

from arxiv, Updated to camera-ready ICLR submission

Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually.