亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Neural network controllers have become popular in control tasks thanks to their flexibility and expressivity. Stability is a crucial property for safety-critical dynamical systems, while stabilization of partially observed systems, in many cases, requires controllers to retain and process long-term memories of the past. We consider the important class of recurrent neural networks (RNN) as dynamic controllers for nonlinear uncertain partially-observed systems, and derive convex stability conditions based on integral quadratic constraints, S-lemma and sequential convexification. To ensure stability during the learning and control process, we propose a projected policy gradient method that iteratively enforces the stability conditions in the reparametrized space taking advantage of mild additional information on system dynamics. Numerical experiments show that our method learns stabilizing controllers while using fewer samples and achieving higher final performance compared with policy gradient.

相關內容

Neuro-symbolic approaches to artificial intelligence, which combine neural networks with classical symbolic techniques, are growing in prominence, necessitating formal approaches to reason about their correctness. We propose a novel modelling formalism called neuro-symbolic concurrent stochastic games (NS-CSGs), which comprise a set of probabilistic finite-state agents interacting in a shared continuous-state environment, observed through perception mechanisms implemented as neural networks. Since the environment state space is continuous, we focus on the class of NS-CSGs with Borel state spaces and Borel measurability restrictions on the components of the model. We consider the problem of zero-sum discounted cumulative reward, proving that NS-CSGs are determined and therefore have a value which corresponds to a unique fixed point. From an algorithmic perspective, existing methods to compute values and optimal strategies for CSGs focus on finite state spaces. We present, for the first time, value iteration and policy iteration algorithms to solve a class of uncountable state space CSGs, and prove their convergence. Our approach works by formulating piecewise linear or constant representations of the value functions and strategies of NS-CSGs. We validate the approach with a prototype implementation applied to a dynamic vehicle parking example.

Collision-free trajectory generation within a shared workspace is fundamental for most multi-robot applications. However, despite of their versatility, many widely-used methods based on model predictive control (MPC) lack theoretical guarantees on the feasibility of underlying optimization. Furthermore, when applied in a distributed manner, deadlocks often occur where several robots block each other indefinitely without resolution. Towards this end, we propose a systematic method called infinite-horizon model predictive control with deadlock resolution (IMPC-DR). It can provably ensure recursive feasibility and effectively resolve deadlocks online in addition to the handling of input and model constraints. The method is based on formulating a convex optimization over the proposed modified buffered Voronoi cells in each planning horizon. Moreover, it is fully distributed and requires only local inter-robot communication. Comprehensive simulation and experiment studies are conducted over large-scale multi-robot systems. Significant improvements of both feasibility and success rate are shown, in comparison with other state-of-the-art methods and especially in crowded and high-speed scenarios.

We are motivated by the problem of performing failure prediction for safety-critical robotic systems with high-dimensional sensor observations (e.g., vision). Given access to a blackbox control policy (e.g., in the form of a neural network) and a dataset of training environments, we present an approach for synthesizing a failure predictor with guaranteed bounds on false-positive and false-negative errors. In order to achieve this, we utilize techniques from Probably Approximately Correct (PAC)-Bayes generalization theory. In addition, we present novel class-conditional bounds that allow us to tradeoff the relative rates of false-positive vs. false-negative errors. We propose algorithms that train failure predictors (that take as input the history of sensor observations) by minimizing our theoretical error bounds. We demonstrate the resulting approach using extensive simulation and hardware experiments for vision-based navigation with a drone and grasping objects with a robotic manipulator equipped with a wrist-mounted RGB-D camera. These experiments illustrate the ability of our approach to (1) provide strong bounds on failure prediction error rates (that closely match empirical error rates), and (2) improve safety by predicting failures.

The problem of finding a nonzero solution of a linear recurrence $Ly = 0$ with polynomial coefficients where $y$ has the form of a definite hypergeometric sum, related to the Inverse Creative Telescoping Problem of [14][Sec. 8], has now been open for three decades. Here we present an algorithm (implemented in a SageMath package) which, given such a recurrence and a quasi-triangular, shift-compatible factorial basis $\mathcal{B} = \langle P_k(n)\rangle_{k=0}^\infty$ of the polynomial space $\mathbb{K}[n]$ over a field $\mathbb{K}$ of characteristic zero, computes a recurrence satisfied by the coefficient sequence $c = \langle c_k\rangle_{k=0}^\infty$ of the solution $y_n = \sum_{k=0}^\infty c_kP_k(n)$ (where, thanks to the quasi-triangularity of $\mathcal{B}$, the sum on the right terminates for each $n \in \mathbb{N}$). More generally, if $\mathcal{B}$ is $m$-sieved for some $m \in \mathbb{N}$, our algorithm computes a system of $m$ recurrences satisfied by the $m$-sections of the coefficient sequence $c$. If an explicit nonzero solution of this system can be found, we obtain an explicit nonzero solution of $Ly = 0$.

Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to safely close the reality gap. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the reach-avoid Bellman Equation based on Hamilton-Jacobi reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments including a photo-realistic one. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot. See //sites.google.com/princeton.edu/sim-to-lab-to-real for supplementary material.

Unresolved data association in ambiguous and perceptually aliased environments leads to multi-modal hypotheses on both the robot's and the environment state. To avoid catastrophic results, when operating in such ambiguous environments, it is crucial to reason about data association within Belief Space Planning (BSP). However, explicitly considering all possible data associations, the number of hypotheses grows exponentially with the planning horizon and determining the optimal action sequence quickly becomes intractable. Moreover, with hard budget constraints where some non-negligible hypotheses must be pruned, achieving performance guarantees is crucial. In this work we present a computationally efficient novel approach that utilizes only a distilled subset of hypotheses to solve BSP problems while reasoning about data association. Furthermore, to provide performance guarantees, we derive error bounds with respect to the optimal solution. We then demonstrate our approach in an extremely aliased environment, where we manage to significantly reduce computation time without compromising on the quality of the solution.

The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.

How to obtain good value estimation is one of the key problems in Reinforcement Learning (RL). Current value estimation methods, such as DDPG and TD3, suffer from unnecessary over- or underestimation bias. In this paper, we explore the potential of double actors, which has been neglected for a long time, for better value function estimation in continuous setting. First, we uncover and demonstrate the bias alleviation property of double actors by building double actors upon single critic and double critics to handle overestimation bias in DDPG and underestimation bias in TD3 respectively. Next, we interestingly find that double actors help improve the exploration ability of the agent. Finally, to mitigate the uncertainty of value estimate from double critics, we further propose to regularize the critic networks under double actors architecture, which gives rise to Double Actors Regularized Critics (DARC) algorithm. Extensive experimental results on challenging continuous control tasks show that DARC significantly outperforms state-of-the-art methods with higher sample efficiency.

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually.

北京阿比特科技有限公司