亚州AV无码专区在线电影,日韩纯肉无遮挡一区二区视频,日本高清乱理伦片中文字幕,五月婷婷六月丁香免费视频

from arxiv, Accepted for publication in RA-L 2022. 9 pages, 5 figures, 1 table. Note: version submitted to RA-L did not include the Appendix section present in this arXiv version

We present the Koopman State Estimator (KoopSE), a framework for model-free batch state estimation of control-affine systems that makes no linearization assumptions, requires no problem-specific feature selections, and has an inference computational cost that is independent of the number of training points. We lift the original nonlinear system into a higher-dimensional Reproducing Kernel Hilbert Space (RKHS), where the system becomes bilinear. The time-invariant model matrices can be learned by solving a least-squares problem on training trajectories. At test time, the system is algebraically manipulated into a linear time-varying system, where standard batch linear state estimation techniques can be used to efficiently compute state means and covariances. Random Fourier Features (RFF) are used to combine the computational efficiency of Koopman-based methods and the generality of kernel-embedding methods. KoopSE is validated experimentally on a localization task involving a mobile robot equipped with ultra-wideband receivers and wheel odometry. KoopSE estimates are more accurate and consistent than the standard model-based extended Rauch-Tung-Striebel (RTS) smoother, despite KoopSE having no prior knowledge of the system's motion or measurement models.

相關內容

估計/估計量

關注 3

Weight · 可約的 · prototype · 正則化項 · 優化器 ·

2022 年 2 月 11 日

On the computation of Gr?bner bases for pluriweighted-homogeneous systems

Thibaut Verron

from arxiv, 10 pages

In this paper, we examine the structure of systems which are weighted homogeneous for several systems of weights, and how it impacts Gr\"obner basis computations. We present different ways to compute Gr\"obner bases for systems with this structure, either directly or by reducing to existing structures. We also present optimization techniques which are suitable for this structure. The most natural orderings to compute a Gr\"obner basis for systems with this structure are weighted orderings following the systems of weights, and we discuss the possibility to use the algorithms in order to directly compute a basis for such an order, regardless of the structure of the system. We discuss applicable notions of regularity which could be used to evaluate the complexity of the algorithm, and prove that they are generic if non-empty. Finally, we present experimental data from a prototype implementation of the algorithms in SageMath.

估計/估計量 · 狀態估計 · Networking · 穩健性 · 控制器 ·

2022 年 2 月 11 日

Concurrent Training of a Control Policy and a State Estimator for Dynamic and Robust Legged Locomotion

Gwanghyeon Ji,Juhyeok Mun,Hyeongjun Kim,Jemin Hwangbo

from arxiv, Accepted for IEEE Robotics and Automation Letters and ICRA 2022

In this paper, we propose a locomotion training framework where a control policy and a state estimator are trained concurrently. The framework consists of a policy network which outputs the desired joint positions and a state estimation network which outputs estimates of the robot's states such as the base linear velocity, foot height, and contact probability. We exploit a fast simulation environment to train the networks and the trained networks are transferred to the real robot. The trained policy and state estimator are capable of traversing diverse terrains such as a hill, slippery plate, and bumpy road. We also demonstrate that the learned policy can run at up to 3.75 m/s on normal flat ground and 3.54 m/s on a slippery plate with the coefficient of friction of 0.22.

正交 · CASE · 有向 · MS · contrastive ·

2022 年 2 月 10 日

A Stieltjes algorithm for generating multivariate orthogonal polynomials

Zexin Liu,Akil Narayan

from arxiv, 24 pages, 8 figures

Orthogonal polynomials of several variables have a vector-valued three-term recurrence relation, much like the corresponding one-dimensional relation. This relation requires only knowledge of certain recurrence matrices, and allows simple and stable evaluation of multivariate orthogonal polynomials. In the univariate case, various algorithms can evaluate the recurrence coefficients given the ability to compute polynomial moments, but such a procedure is absent in multiple dimensions. We present a new Multivariate Stieltjes (MS) algorithm that fills this gap in the multivariate case, allowing computation of recurrence matrices assuming moments are available. The algorithm is essentially explicit in two and three dimensions, but requires the numerical solution to a non-convex problem in more than three dimensions. Compared to direct Gram-Schmidt-type orthogonalization, we demonstrate on several examples in up to three dimensions that the MS algorithm is far more stable, and allows accurate computation of orthogonal bases in the multivariate setting, in contrast to direct orthogonalization approaches.

優化器 · 分離的 · 最優化 · 分解的 · 平滑 ·

2022 年 2 月 9 日

Sharper Rates for Separable Minimax and Finite Sum Optimization via Primal-Dual Extragradient Methods

Yujia Jin,Aaron Sidford,Kevin Tian

We design accelerated algorithms with improved rates for several fundamental classes of optimization problems. Our algorithms all build upon techniques related to the analysis of primal-dual extragradient methods via relative Lipschitzness proposed recently by [CST21]. (1) Separable minimax optimization. We study separable minimax optimization problems $\min_x \max_y f(x) - g(y) + h(x, y)$, where $f$ and $g$ have smoothness and strong convexity parameters $(L^x, \mu^x)$, $(L^y, \mu^y)$, and $h$ is convex-concave with a $(\Lambda^{xx}, \Lambda^{xy}, \Lambda^{yy})$-blockwise operator norm bounded Hessian. We provide an algorithm with gradient query complexity $\tilde{O}\left(\sqrt{\frac{L^{x}}{\mu^{x}}} + \sqrt{\frac{L^{y}}{\mu^{y}}} + \frac{\Lambda^{xx}}{\mu^{x}} + \frac{\Lambda^{xy}}{\sqrt{\mu^{x}\mu^{y}}} + \frac{\Lambda^{yy}}{\mu^{y}}\right)$. Notably, for convex-concave minimax problems with bilinear coupling (e.g.\ quadratics), where $\Lambda^{xx} = \Lambda^{yy} = 0$, our rate matches a lower bound of [ZHZ19]. (2) Finite sum optimization. We study finite sum optimization problems $\min_x \frac{1}{n}\sum_{i\in[n]} f_i(x)$, where each $f_i$ is $L_i$-smooth and the overall problem is $\mu$-strongly convex. We provide an algorithm with gradient query complexity $\tilde{O}\left(n + \sum_{i\in[n]} \sqrt{\frac{L_i}{n\mu}} \right)$. Notably, when the smoothness bounds $\{L_i\}_{i\in[n]}$ are non-uniform, our rate improves upon accelerated SVRG [LMH15, FGKS15] and Katyusha [All17] by up to a $\sqrt{n}$ factor. (3) Minimax finite sums. We generalize our algorithms for minimax and finite sum optimization to solve a natural family of minimax finite sum optimization problems at an accelerated rate, encapsulating both above results up to a logarithmic factor.

估計/估計量 · 狀態估計 · 噪聲 · 再生核希爾伯特空間 · 動力系統 ·

2022 年 2 月 9 日

Stein Particle Filter for Nonlinear, Non-Gaussian State Estimation

Fahira Afzal Maken,Fabio Ramos,Lionel Ott

from arxiv, 8 pages, 3 figures, Robotics and Automation Letters

Estimation of a dynamical system's latent state subject to sensor noise and model inaccuracies remains a critical yet difficult problem in robotics. While Kalman filters provide the optimal solution in the least squared sense for linear and Gaussian noise problems, the general nonlinear and non-Gaussian noise case is significantly more complicated, typically relying on sampling strategies that are limited to low-dimensional state spaces. In this paper we devise a general inference procedure for filtering of nonlinear, non-Gaussian dynamical systems that exploits the differentiability of both the update and prediction models to scale to higher dimensional spaces. Our method, Stein particle filter, can be seen as a deterministic flow of particles, embedded in a reproducing kernel Hilbert space, from an initial state to the desirable posterior. The particles evolve jointly to conform to a posterior approximation while interacting with each other through a repulsive force. We evaluate the method in simulation and in complex localization tasks while comparing it to sequential Monte Carlo solutions.

核化 · 控制器 · 優化器 · 再生核希爾伯特空間 · 原點 ·

2022 年 2 月 8 日

Data-Driven Chance Constrained Control using Kernel Distribution Embeddings

Adam J. Thorpe,Thomas Lew,Meeko M. K. Oishi,Marco Pavone

from arxiv, Submitted to 4th Annual Learning for Dynamics & Control Conference (L4DC) 2022

We present a data-driven algorithm for efficiently computing stochastic control policies for general joint chance constrained optimal control problems. Our approach leverages the theory of kernel distribution embeddings, which allows representing expectation operators as inner products in a reproducing kernel Hilbert space. This framework enables approximately reformulating the original problem using a dataset of observed trajectories from the system without imposing prior assumptions on the parameterization of the system dynamics or the structure of the uncertainty. By optimizing over a finite subset of stochastic open-loop control trajectories, we relax the original problem to a linear program over the control parameters that can be efficiently solved using standard convex optimization techniques. We demonstrate our proposed approach in simulation on a system with nonlinear non-Markovian dynamics navigating in a cluttered environment.

優化器 · 可約的 · 近似 · 控制器 · Principle ·

2020 年 6 月 29 日

Differential Dynamic Programming Neural Optimizer

Guan-Horng Liu,Tianrong Chen,Evangelos A. Theodorou

Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.

優化器 · 方差 · 協方差矩陣 · 分離的 · Continuity ·

2018 年 12 月 18 日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Perttu H?m?l?inen,Amin Babadi,Xiaoxiao Ma,Jaakko Lehtinen

Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, in continuous state and actions spaces and a Gaussian policy -- common in computer animation and robotics -- PPO is prone to getting stuck in local optima. In this paper, we observe a tendency of PPO to prematurely shrink the exploration variance, which naturally leads to slow progress. Motivated by this, we borrow ideas from CMA-ES, a black-box optimization method designed for intelligent adaptive Gaussian exploration, to derive PPO-CMA, a novel proximal policy optimization approach that can expand the exploration variance on objective function slopes and shrink the variance when close to the optimum. This is implemented by using separate neural networks for policy mean and variance and training the mean and variance in separate passes. Our experiments demonstrate a clear improvement over vanilla PPO in many difficult OpenAI Gym MuJoCo tasks.

策略搜索 · MoDELS · 優化器 · 機器人 · 控制器 ·

2018 年 7 月 6 日

A survey on policy search algorithms for learning robot controllers in a handful of trials

Konstantinos Chatzilygeroudis,Vassilis Vassiliades,Freek Stulp,Sylvain Calinon,Jean-Baptiste Mouret

Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.