亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Sample-based trajectory optimisers are a promising tool for the control of robotics with non-differentiable dynamics and cost functions. Contemporary approaches derive from a restricted subclass of stochastic optimal control where the optimal policy can be expressed in terms of an expectation over stochastic paths. By estimating the expectation with Monte Carlo sampling and reinterpreting the process as exploration noise, a stochastic search algorithm is obtained tailored to (deterministic) trajectory optimisation. For the purpose of future algorithmic development, it is essential to properly understand the underlying theoretical foundations that allow for a principled derivation of such methods. In this paper we make a connection between entropy regularisation in optimisation and deterministic optimal control. We then show that the optimal policy is given by a belief function rather than a deterministic function. The policy belief is governed by a Bayesian-type update where the likelihood can be expressed in terms of a conditional expectation over paths induced by a prior policy. Our theoretical investigation firmly roots sample based trajectory optimisation in the larger family of control as inference. It allows us to justify a number of heuristics that are common in the literature and motivate a number of new improvements that benefit convergence.

相關內容

The travel time functional measures the time taken for a particle trajectory to travel from a given initial position to the boundary of the domain. Such evaluation is paramount in the post-closure safety assessment of deep geological storage facilities for radioactive waste where leaked, non-sorbing, solutes can be transported to the surface of the site by the surrounding groundwater. The accurate simulation of this transport can be attained using standard dual-weighted-residual techniques to derive goal-oriented $a$ $posteriori$ error bounds. This work provides a key aspect in obtaining a suitable error estimate for the travel time functional: the evaluation of its G\^ateaux derivative. A mixed finite element method is implemented to approximate Darcy's equations and numerical experiments are presented to test the performance of the proposed error estimator. In particular, we consider a test case inspired by the Sellafield site located in Cumbria, in the UK.

The growth of deep reinforcement learning (RL) has brought multiple exciting tools and methods to the field. This rapid expansion makes it important to understand the interplay between individual elements of the RL toolbox. We approach this task from an empirical perspective by conducting a study in the continuous control setting. We present multiple insights of fundamental nature, including: an average of multiple actors trained from the same data boosts performance; the existing methods are unstable across training runs, epochs of training, and evaluation runs; a commonly used additive action noise is not required for effective training; a strategy based on posterior sampling explores better than the approximated UCB combined with the weighted Bellman backup; the weighted Bellman backup alone cannot replace the clipped double Q-Learning; the critics' initialization plays the major role in ensemble-based actor-critic exploration. As a conclusion, we show how existing tools can be brought together in a novel way, giving rise to the Ensemble Deep Deterministic Policy Gradients (ED2) method, to yield state-of-the-art results on continuous control tasks from OpenAI Gym MuJoCo. From the practical side, ED2 is conceptually straightforward, easy to code, and does not require knowledge outside of the existing RL toolbox.

We present Path Integral Sampler~(PIS), a novel algorithm to draw samples from unnormalized probability density functions. The PIS is built on the Schr\"odinger bridge problem which aims to recover the most likely evolution of a diffusion process given its initial distribution and terminal distribution. The PIS draws samples from the initial distribution and then propagates the samples through the Schr\"odinger bridge to reach the terminal distribution. Applying the Girsanov theorem, with a simple prior diffusion, we formulate the PIS as a stochastic optimal control problem whose running cost is the control energy and terminal cost is chosen according to the target distribution. By modeling the control as a neural network, we establish a sampling algorithm that can be trained end-to-end. We provide theoretical justification of the sampling quality of PIS in terms of Wasserstein distance when sub-optimal control is used. Moreover, the path integrals theory is used to compute importance weights of the samples to compensate for the bias induced by the sub-optimality of the controller and time-discretization. We experimentally demonstrate the advantages of PIS compared with other start-of-the-art sampling methods on a variety of tasks.

We study a general class of entropy-regularized multi-variate LQG mean field games (MFGs) in continuous time with $K$ distinct sub-population of agents. We extend the notion of actions to action distributions (exploratory actions), and explicitly derive the optimal action distributions for individual agents in the limiting MFG. We demonstrate that the optimal set of action distributions yields an $\epsilon$-Nash equilibrium for the finite-population entropy-regularized MFG. Furthermore, we compare the resulting solutions with those of classical LQG MFGs and establish the equivalence of their existence.

We consider pessimistic bilevel stochastic programs in which the follower maximizes over a fixed compact convex set a strictly convex quadratic function, whose Hessian depends on the leader's decision. The resulting random variable is evaluated by a convex risk measure. Under assumptions including real analyticity of the lower-level goal function, we prove existence of optimal solutions. We discuss an alternate model where the leader hedges against optimal lower-level solutions, and show that in this case solvability can be guaranteed under weaker conditions both in a deterministic and in a stochastic setting. The approach is applied to a mechanical shape optimization problem in which the leader decides on an optimal material distribution to minimize a tracking-type cost functional, whereas the follower chooses forces from an admissible set to maximize a compliance objective. The material distribution is considered to be stochastically perturbed in the actual construction phase. Computational results illustrate the bilevel optimization concept and demonstrate the interplay of follower and leader in shape design and testing.

This work recasts time-dependent optimal control problems governed by partial differential equations in a Dynamic Mode Decomposition with control framework. Indeed, since the numerical solution of such problems requires a lot of computational effort, we rely on this specific data-driven technique, using both solution and desired state measurements to extract the underlying system dynamics. Thus, after the Dynamic Mode Decomposition operators construction, we reconstruct and perform future predictions for all the variables of interest at a lower computational cost with respect to the standard space-time discretized models. We test the methodology in terms of relative reconstruction and prediction errors on a boundary control for a Graetz flow and on a distributed control with Stokes constraints.

In this work we consider the well-known Secretary Problem -- a number $n$ of elements, each having an adversarial value, are arriving one-by-one according to some random order, and the goal is to choose the highest value element. The decisions are made online and are irrevocable -- if the algorithm decides to choose or not to choose the currently seen element, based on the previously observed values, it cannot change its decision later regarding this element. The measure of success is the probability of selecting the highest value element, minimized over all adversarial assignments of values. We show existential and constructive upper bounds on approximation of the success probability in this problem, depending on the entropy of the randomly chosen arrival order, including the lowest possible entropy $O(\log\log (n))$ for which the probability of success could be constant. We show that below entropy level $\mathcal{H}<0.5\log\log n$, all algorithms succeed with probability $0$ if random order is selected uniformly at random from some subset of permutations, while we are able to construct in polynomial time a non-uniform distribution with entropy $\mathcal{H}$ resulting in success probability of at least $\Omega\left(\frac{1}{(\log\log n +3\log\log\log n -\mathcal{H})^{2+\epsilon}}\right)$, for any constant $\epsilon>0$. We also prove that no algorithm using entropy $\mathcal{H}=O((\log\log n)^a)$ can improve our result by more than polynomially, for any constant $0<a<1$. For entropy $\log\log (n)$ and larger, our analysis precisely quantifies both multiplicative and additive approximation of the success probability. In particular, we improve more than doubly exponentially on the best previously known additive approximation guarantee for the secretary problem.

With the rapid expansion of graphs and networks and the growing magnitude of data from all areas of science, effective treatment and compression schemes of context-dependent data is extremely desirable. A particularly interesting direction is to compress the data while keeping the "structural information" only and ignoring the concrete labelings. Under this direction, Choi and Szpankowski introduced the structures (unlabeled graphs) which allowed them to compute the structural entropy of the Erd\H{o}s--R\'enyi random graph model. Moreover, they also provided an asymptotically optimal compression algorithm that (asymptotically) achieves this entropy limit and runs in expectation in linear time. In this paper, we consider the Stochastic Block Models with an arbitrary number of parts. Indeed, we define a partitioned structural entropy for Stochastic Block Models, which generalizes the structural entropy for unlabeled graphs and encodes the partition information as well. We then compute the partitioned structural entropy of the Stochastic Block Models, and provide a compression scheme that asymptotically achieves this entropy limit.

This paper studies optimal motion planning subject to motion and environment uncertainties. By modeling the system as a probabilistic labeled Markov decision process (PL-MDP), the control objective is to synthesize a finite-memory policy, under which the agent satisfies high-level complex tasks expressed as linear temporal logic (LTL) with desired satisfaction probability. In particular, the cost optimization of the trajectory that satisfies infinite-horizon tasks is considered, and the trade-off between reducing the expected mean cost and maximizing the probability of task satisfaction is analyzed. Instead of using traditional Rabin automata, the LTL formulas are converted to limit-deterministic B\"uchi automata (LDBA) with a more straightforward accepting condition and a more compact graph structure. The novelty of this work lies in the consideration of the cases that LTL specifications can be potentially infeasible and the development of a relaxed product MDP between PL-MDP and LDBA. The relaxed product MDP allows the agent to revise its motion plan whenever the task is not fully feasible and to quantify the violation measurement of the revised plan. A multi-objective optimization problem is then formulated to jointly consider the probability of the task satisfaction, the violation with respect to original task constraints, and the implementation cost of the policy execution, which is solved via coupled linear programs. To the best of our knowledge, it is the first work that bridges the gap between planning revision and optimal control synthesis of both plan prefix and plan suffix of the agent trajectory over the infinite horizon. Experimental results are provided to demonstrate the effectiveness of the proposed framework.

The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.

北京阿比特科技有限公司