亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Recent development of Deep Reinforcement Learning has demonstrated superior performance of neural networks in solving challenging problems with large or even continuous state spaces. One specific approach is to deploy neural networks to approximate value functions by minimising the Mean Squared Bellman Error function. Despite great successes of Deep Reinforcement Learning, development of reliable and efficient numerical algorithms to minimise the Bellman Error is still of great scientific interest and practical demand. Such a challenge is partially due to the underlying optimisation problem being highly non-convex or using incorrect gradient information as done in Semi-Gradient algorithms. In this work, we analyse the Mean Squared Bellman Error from a smooth optimisation perspective combined with a Residual Gradient formulation. Our contribution is two-fold. First, we analyse critical points of the error function and provide technical insights on the optimisation procure and design choices for neural networks. When the existence of global minima is assumed and the objective fulfils certain conditions we can eliminate suboptimal local minima when using over-parametrised neural networks. We can construct an efficient Approximate Newton's algorithm based on our analysis and confirm theoretical properties of this algorithm such as being locally quadratically convergent to a global minimum numerically. Second, we demonstrate feasibility and generalisation capabilities of the proposed algorithm empirically using continuous control problems and provide a numerical verification of our critical point analysis. We outline the short coming of Semi-Gradients. To benefit from an approximate Newton's algorithm complete derivatives of the Mean Squared Bellman error must be considered during training.

相關內容

神(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)(Neural Networks)是世界(jie)上三(san)個(ge)(ge)最古老(lao)的(de)(de)神(shen)經(jing)(jing)(jing)(jing)建模學(xue)(xue)會(hui)的(de)(de)檔案期刊:國際神(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)學(xue)(xue)會(hui)(INNS)、歐(ou)洲神(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)學(xue)(xue)會(hui)(ENNS)和(he)(he)日本神(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)學(xue)(xue)會(hui)(JNNS)。神(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)提供了(le)一個(ge)(ge)論(lun)壇(tan),以發(fa)展和(he)(he)培育一個(ge)(ge)國際社會(hui)的(de)(de)學(xue)(xue)者和(he)(he)實踐(jian)者感(gan)(gan)興趣的(de)(de)所(suo)有方面(mian)的(de)(de)神(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)和(he)(he)相關方法(fa)的(de)(de)計(ji)(ji)算(suan)(suan)智(zhi)能。神(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)歡迎高質量(liang)論(lun)文(wen)(wen)的(de)(de)提交,有助于(yu)全面(mian)的(de)(de)神(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)研(yan)究,從(cong)行為和(he)(he)大腦建模,學(xue)(xue)習(xi)算(suan)(suan)法(fa),通過數(shu)學(xue)(xue)和(he)(he)計(ji)(ji)算(suan)(suan)分析,系統的(de)(de)工(gong)程(cheng)和(he)(he)技術(shu)應(ying)(ying)用(yong),大量(liang)使用(yong)神(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)的(de)(de)概念(nian)和(he)(he)技術(shu)。這一獨(du)特而廣泛的(de)(de)范圍促進了(le)生物和(he)(he)技術(shu)研(yan)究之間的(de)(de)思想交流,并有助于(yu)促進對生物啟(qi)發(fa)的(de)(de)計(ji)(ji)算(suan)(suan)智(zhi)能感(gan)(gan)興趣的(de)(de)跨學(xue)(xue)科(ke)社區的(de)(de)發(fa)展。因此,神(shen)經(jing)(jing)(jing)(jing)網(wang)(wang)絡(luo)編(bian)委會(hui)代表的(de)(de)專家領(ling)域包括心理(li)學(xue)(xue),神(shen)經(jing)(jing)(jing)(jing)生物學(xue)(xue),計(ji)(ji)算(suan)(suan)機(ji)科(ke)學(xue)(xue),工(gong)程(cheng),數(shu)學(xue)(xue),物理(li)。該(gai)雜志發(fa)表文(wen)(wen)章(zhang)、信(xin)件(jian)(jian)和(he)(he)評論(lun)以及給編(bian)輯(ji)的(de)(de)信(xin)件(jian)(jian)、社論(lun)、時事、軟(ruan)件(jian)(jian)調查(cha)和(he)(he)專利信(xin)息。文(wen)(wen)章(zhang)發(fa)表在五個(ge)(ge)部(bu)分之一:認(ren)知(zhi)科(ke)學(xue)(xue),神(shen)經(jing)(jing)(jing)(jing)科(ke)學(xue)(xue),學(xue)(xue)習(xi)系統,數(shu)學(xue)(xue)和(he)(he)計(ji)(ji)算(suan)(suan)分析、工(gong)程(cheng)和(he)(he)應(ying)(ying)用(yong)。 官網(wang)(wang)地址(zhi):

In recent years, Bi-Level Optimization (BLO) techniques have received extensive attentions from both learning and vision communities. A variety of BLO models in complex and practical tasks are of non-convex follower structure in nature (a.k.a., without Lower-Level Convexity, LLC for short). However, this challenging class of BLOs is lack of developments on both efficient solution strategies and solid theoretical guarantees. In this work, we propose a new algorithmic framework, named Initialization Auxiliary and Pessimistic Trajectory Truncated Gradient Method (IAPTT-GM), to partially address the above issues. In particular, by introducing an auxiliary as initialization to guide the optimization dynamics and designing a pessimistic trajectory truncation operation, we construct a reliable approximate version of the original BLO in the absence of LLC hypothesis. Our theoretical investigations establish the convergence of solutions returned by IAPTT-GM towards those of the original BLO without LLC. As an additional bonus, we also theoretically justify the quality of our IAPTT-GM embedded with Nesterov's accelerated dynamics under LLC. The experimental results confirm both the convergence of our algorithm without LLC, and the theoretical findings under LLC.

In this paper, the problem of state estimation, in the context of both filtering and smoothing, for nonlinear state-space models is considered. Due to the nonlinear nature of the models, the state estimation problem is generally intractable as it involves integrals of general nonlinear functions and the filtered and smoothed state distributions lack closed-form solutions. As such, it is common to approximate the state estimation problem. In this paper, we develop an assumed Gaussian solution based on variational inference, which offers the key advantage of a flexible, but principled, mechanism for approximating the required distributions. Our main contribution lies in a new formulation of the state estimation problem as an optimisation problem, which can then be solved using standard optimisation routines that employ exact first- and second-order derivatives. The resulting state estimation approach involves a minimal number of assumptions and applies directly to nonlinear systems with both Gaussian and non-Gaussian probabilistic models. The performance of our approach is demonstrated on several examples; a challenging scalar system, a model of a simple robotic system, and a target tracking problem using a von Mises-Fisher distribution and outperforms alternative assumed Gaussian approaches to state estimation.

We address the non-convex optimisation problem of finding a sparse matrix on the Stiefel manifold (matrices with mutually orthogonal columns of unit length) that maximises (or minimises) a quadratic objective function. Optimisation problems on the Stiefel manifold occur for example in spectral relaxations of various combinatorial problems, such as graph matching, clustering, or permutation synchronisation. Although sparsity is a desirable property in such settings, it is mostly neglected in spectral formulations since existing solvers, e.g. based on eigenvalue decomposition, are unable to account for sparsity while at the same time maintaining global optimality guarantees. We fill this gap and propose a simple yet effective sparsity-promoting modification of the Orthogonal Iteration algorithm for finding the dominant eigenspace of a matrix. By doing so, we can guarantee that our method finds a Stiefel matrix that is globally optimal with respect to the quadratic objective function, while in addition being sparse. As a motivating application we consider the task of permutation synchronisation, which can be understood as a constrained clustering problem that has particular relevance for matching multiple images or 3D shapes in computer vision, computer graphics, and beyond. We demonstrate that the proposed approach outperforms previous methods in this domain.

Two novel numerical estimators are proposed for solving forward-backward stochastic differential equations (FBSDEs) appearing in the Feynman-Kac representation of the value function in stochastic optimal control problems. In contrast to the current numerical approaches which are based on the discretization of the continuous-time FBSDE, we propose a converse approach, namely, we obtain a discrete-time approximation of the on-policy value function, and then we derive a discrete-time estimator that resembles the continuous-time counterpart. The proposed approach allows for the construction of higher accuracy estimators along with error analysis. The approach is applied to the policy improvement step in reinforcement learning. Numerical results and error analysis are demonstrated using (i) a scalar nonlinear stochastic optimal control problem and (ii) a four-dimensional linear quadratic regulator (LQR) problem. The proposed estimators show significant improvement in terms of accuracy in both cases over Euler-Maruyama-based estimators used in competing approaches. In the case of LQR problems, we demonstrate that our estimators result in near machine-precision level accuracy, in contrast to previously proposed methods that can potentially diverge on the same problems.

This paper is concerned with error estimates of the fully discrete generalized finite element method (GFEM) with optimal local approximation spaces for solving elliptic problems with heterogeneous coefficients. The local approximation spaces are constructed using eigenvectors of local eigenvalue problems solved by the finite element method on some sufficiently fine mesh with mesh size $h$. The error bound of the discrete GFEM approximation is proved to converge as $h\rightarrow 0$ towards that of the continuous GFEM approximation, which was shown to decay nearly exponentially in previous works. Moreover, even for fixed mesh size $h$, a nearly exponential rate of convergence of the local approximation errors with respect to the dimension of the local spaces is established. An efficient and accurate method for solving the discrete eigenvalue problems is proposed by incorporating the discrete $A$-harmonic constraint directly into the eigensolver. Numerical experiments are carried out to confirm the theoretical results and to demonstrate the effectiveness of the method.

In this paper, we establish the almost sure convergence of two-timescale stochastic gradient descent algorithms in continuous time under general noise and stability conditions, extending well known results in discrete time. We analyse algorithms with additive noise and those with non-additive noise. In the non-additive case, our analysis is carried out under the assumption that the noise is a continuous-time Markov process, controlled by the algorithm states. The algorithms we consider can be applied to a broad class of bilevel optimisation problems. We study one such problem in detail, namely, the problem of joint online parameter estimation and optimal sensor placement for a partially observed diffusion process. We demonstrate how this can be formulated as a bilevel optimisation problem, and propose a solution in the form of a continuous-time, two-timescale, stochastic gradient descent algorithm. Furthermore, under suitable conditions on the latent signal, the filter, and the filter derivatives, we establish almost sure convergence of the online parameter estimates and optimal sensor placements to the stationary points of the asymptotic log-likelihood and asymptotic filter covariance, respectively. We also provide numerical examples, illustrating the application of the proposed methodology to a partially observed Bene\v{s} equation, and a partially observed stochastic advection-diffusion equation.

This letter is concerned with solving continuous-discrete Gaussian smoothing problems by using the Taylor moment expansion (TME) scheme. In the proposed smoothing method, we apply the TME method to approximate the transition density of the stochastic differential equation in the dynamic model. Furthermore, we derive a theoretical error bound (in the mean square sense) of the TME smoothing estimates showing that the smoother is stable under weak assumptions. Numerical experiments are presented in order to illustrate practical use of the method.

The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizing for expected performance, many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.

Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.

In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.

北京阿比特科技有限公司