日本人体黄色三级视频_亚洲一区二区三区尿失禁_一本无码国产在线观看_亚洲午夜成人AV电影_老年人黄色视频手机在线_国产精品污无码一区二区_无遮挡黄色视频国产网站免费看

We propose a Safe Pontryagin Differentiable Programming (Safe PDP) methodology, which establishes a theoretical and algorithmic safe differentiable framework to solve a broad class of safety-critical learning and control tasks -- problems that require the guarantee of both immediate and long-term constraint satisfaction at any stage of the learning and control progress. In the spirit of interior-point methods, Safe PDP handles different types of state and input constraints by incorporating them into the cost and loss through barrier functions. We prove the following fundamental features of Safe PDP: first, both the constrained solution and its gradient in backward pass can be approximated by solving a more efficient unconstrained counterpart; second, the approximation for both the solution and its gradient can be controlled for arbitrary accuracy using a barrier parameter; and third, importantly, any intermediate results throughout the approximation and optimization are strictly respecting all constraints, thus guaranteeing safety throughout the entire learning and control process. We demonstrate the capabilities of Safe PDP in solving various safe learning and control tasks, including safe policy optimization, safe motion planning, and learning MPCs from demonstrations, on different challenging control systems such as 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.

相關內容

控制器

關注 5

退火重要采樣 · 重要性采樣 · 邊緣似然函數 · 似然 · 噪聲 ·

2021 年 7 月 21 日

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

Guodong Zhang,Kyle Hsu,Jianing Li,Chelsea Finn,Roger Grosse

from arxiv, 22 pages

Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation, but are not fully differentiable due to the use of Metropolis-Hastings (MH) correction steps. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective using gradient-based methods. To this end, we propose a differentiable AIS algorithm by abandoning MH steps, which further unlocks mini-batch computation. We provide a detailed convergence analysis for Bayesian linear regression which goes beyond previous analyses by explicitly accounting for non-perfect transitions. Using this analysis, we prove that our algorithm is consistent in the full-batch setting and provide a sublinear convergence rate. However, we show that the algorithm is inconsistent when mini-batch gradients are used due to a fundamental incompatibility between the goals of last-iterate convergence to the posterior and elimination of the pathwise stochastic error. This result is in stark contrast to our experience with stochastic optimization and stochastic gradient Langevin dynamics, where the effects of gradient noise can be washed out by taking more steps of a smaller size. Our negative result relies crucially on our explicit consideration of convergence to the stationary distribution, and it helps explain the difficulty of developing practically effective AIS-like algorithms that exploit mini-batch gradients.

可約的 · 泛函 · 生成方法 · 約束 · CASE ·

2021 年 7 月 21 日

Composition of Safety Constraints For Fixed-Wing Collision Avoidance Amidst Limited Communications

Eric Squires,Pietro Pierpaoli,Rohit Konda,Samuel Coogan,Magnus Egerstedt

This paper considers how to ensure that a system of fixed wing Unmanned Aerial Vehicles (UAVs) can avoid collisions. To do so we develop a novel method for creating a barrier function, which is similar to a Lyapunov function and can be used to ensure that a system can stay safe for all future times. After introducing the general approach, it is shown how to ensure that collision avoidance for two vehicles can be guaranteed for all future times. The construction is then extended to the case of arbitrarily many vehicles by addressing how to satisfy multiple safety objectives simultaneously. We do this while ensuring output actuator commands are within specified limits. Because this formulation requires communication of control values and may therefore reduce throughput of other important messages, we then show how to reformulate the solution without this significant communication overhead while still ensuring safety is maintained and actuator limits are respected. We validate the theoretical developments of this paper in the simulator SCRIMMAGE with a simulation of 20 UAVs that maintain safe distances from each other even though their nominal paths would otherwise cause a collision.

蒙特卡羅 · 蒙特卡羅方法 · 控制器 · 優化器 · 方差減小 ·

2021 年 7 月 21 日

Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization

Shijing Si,Chris. J. Oates,Andrew B. Duncan,Lawrence Carin,Fran?ois-Xavier Briol

from arxiv, Accepted by MCQMC2020

Control variates are a well-established tool to reduce the variance of Monte Carlo estimators. However, for large-scale problems including high-dimensional and large-sample settings, their advantages can be outweighed by a substantial computational cost. This paper considers control variates based on Stein operators, presenting a framework that encompasses and generalizes existing approaches that use polynomials, kernels and neural networks. A learning strategy based on minimising a variational objective through stochastic optimization is proposed, leading to scalable and effective control variates. Novel theoretical results are presented to provide insight into the variance reduction that can be achieved, and an empirical assessment, including applications to Bayesian inference, is provided in support.

MoDELS · 離散化 · Weight · 計算成本 · 近似 ·

2021 年 7 月 19 日

A local velocity grid conservative semi-Lagrangian schemes for BGK model

Sebastiano Boscarino,Seung Yeon Cho,Giovanni Russo

Most numerical schemes proposed for solving BGK models for rarefied gas dynamics are based on the discrete velocity approximation. Since such approach uses fixed velocity grids, one must secure a sufficiently large domain with fine velocity grids to resolve the structure of distribution functions. When one treats high Mach number problems, the computational cost becomes prohibitively expensive. In this paper, we propose a velocity adaptation technique in the semi-Lagrangian framework for BGK model. The velocity grid will be set locally in time and space, according to mean velocity and temperature. We apply a weighted minimization approach to impose conservation. We presented several numerical tests that illustrate the effectiveness of our proposed scheme.

CASES · 優化器 · 機器人 · Microsoft Windows · Performer ·

2021 年 7 月 18 日

Distributed Planning for Serving Cooperative Tasks with Time Windows: A Game Theoretic Approach

Yasin Yazicioglu,Raghavendra Bhat,Derya Aksaray

We study distributed planning for multi-robot systems to provide optimal service to cooperative tasks that are distributed over space and time. Each task requires service by sufficiently many robots at the specified location within the specified time window. Tasks arrive over episodes and the robots try to maximize the total value of service in each episode by planning their own trajectories based on the specifications of incoming tasks. Robots are required to start and end each episode at their assigned stations in the environment. We present a game theoretic solution to this problem by mapping it to a game, where the action of each robot is its trajectory in an episode, and using a suitable learning algorithm to obtain optimal joint plans in a distributed manner. We present a systematic way to design minimal action sets (subsets of feasible trajectories) for robots based on the specifications of incoming tasks to facilitate fast learning. We then provide the performance guarantees for the cases where all the robots follow a best response or noisy best response algorithm to iteratively plan their trajectories. While the best response algorithm leads to a Nash equilibrium, the noisy best response algorithm leads to globally optimal joint plans with high probability. We show that the proposed game can in general have arbitrarily poor Nash equilibria, which makes the noisy best response algorithm preferable unless the task specifications are known to have some special structure. We also describe a family of special cases where all the equilibria are guaranteed to have bounded suboptimality. Simulations and experimental results are provided to demonstrate the proposed approach.

Machine Learning · ML · 學成 · Extensibility · contrastive ·

2021 年 7 月 18 日

Provably efficient machine learning for quantum many-body problems

Hsin-Yuan Huang,Richard Kueng,Giacomo Torlai,Victor V. Albert,John Preskill

from arxiv, 10 pages, 12 figures + 57 page appendix

Classical machine learning (ML) provides a potentially powerful approach to solving challenging quantum many-body problems in physics and chemistry. However, the advantages of ML over more traditional methods have not been firmly established. In this work, we prove that classical ML algorithms can efficiently predict ground state properties of gapped Hamiltonians in finite spatial dimensions, after learning from data obtained by measuring other Hamiltonians in the same quantum phase of matter. In contrast, under widely accepted complexity theory assumptions, classical algorithms that do not learn from data cannot achieve the same guarantee. We also prove that classical ML algorithms can efficiently classify a wide range of quantum phases of matter. Our arguments are based on the concept of a classical shadow, a succinct classical description of a many-body quantum state that can be constructed in feasible quantum experiments and be used to predict many properties of the state. Extensive numerical experiments corroborate our theoretical results in a variety of scenarios, including Rydberg atom systems, 2D random Heisenberg models, symmetry-protected topological phases, and topologically ordered phases.

優化器 · 可約的 · 近似 · 控制器 · Principle ·

2020 年 6 月 29 日

Differential Dynamic Programming Neural Optimizer

Guan-Horng Liu,Tianrong Chen,Evangelos A. Theodorou

Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.

Networking · MoDELS · Neural Networks · 潛變量/隱變量 · Continuity ·

2018 年 10 月 3 日

Neural Ordinary Differential Equations

Ricky T. Q. Chen,Yulia Rubanova,Jesse Bettencourt,David Duvenaud

We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.

示例 · 優化器 · MoDELS · 強化學習 · 學成 ·

2018 年 5 月 21 日

Reinforcement Learning for Solving the Vehicle Routing Problem

Mohammadreza Nazari,Afshin Oroojlooy,Lawrence V. Snyder,Martin Taká?

from arxiv, more results and illustrations

We present an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning. In this approach, we train a single model that finds near-optimal solutions for problem instances sampled from a given distribution, only by observing the reward signals and following feasibility rules. Our model represents a parameterized stochastic policy, and by applying a policy gradient algorithm to optimize its parameters, the trained model produces the solution as a sequence of consecutive actions in real time, without the need to re-train for every new problem instance. On capacitated VRP, our approach outperforms classical heuristics and Google's OR-Tools on medium-sized instances in solution quality with comparable computation time (after training). We demonstrate how our approach can handle problems with split delivery and explore the effect of such deliveries on the solution quality. Our proposed framework can be applied to other variants of the VRP such as the stochastic VRP, and has the potential to be applied more generally to combinatorial optimization problems.

平滑 · 注意力機制 · 反向傳播 · 維特比算法 · 正則化項 ·

2018 年 2 月 20 日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arthur Mensch,Mathieu Blondel

Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.