亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Neural ordinary differential equations (Neural ODEs) model continuous time dynamics as differential equations parametrized with neural networks. Thanks to their modeling flexibility, they have been adopted for multiple tasks where the continuous time nature of the process is specially relevant, as in system identification and time series analysis. When applied in a control setting, it is possible to adapt their use to approximate optimal nonlinear feedback policies. This formulation follows the same approach as policy gradients in reinforcement learning, covering the case where the environment consists of known deterministic dynamics given by a system of differential equations. The white box nature of the model specification allows the direct calculation of policy gradients through sensitivity analysis, avoiding the inexact and inefficient gradient estimation through sampling. In this work we propose the use of a neural control policy posed as a Neural ODE to solve general nonlinear optimal control problems while satisfying both state and control constraints, which are crucial for real world scenarios. Since the state feedback policy partially modifies the model dynamics, the whole space phase of the system is reshaped upon the optimization. This approach is a sensible approximation to the historically intractable closed loop solution of nonlinear control problems that efficiently exploits the availability of a dynamical system model.

相關內容

In modern graph analytics, the shortest path is a fundamental concept. Numerous \rrev{recent works} concentrate mostly on the distance of these shortest paths. Nevertheless, in the era of betweenness analysis, the counting of the shortest path between $s$ and $t$ is equally crucial. \rrev{It} is \rev{also} an important issue in the area of graph databases. In recent years, several studies have been conducted in an effort to tackle such issues. Nonetheless, the present technique faces a considerable barrier to parallel due to the dependencies in the index construction stage, hence limiting its application possibilities and wasting the potential hardware performance. To address this problem, we provide a parallel shortest path counting method that could avoid these dependencies and obtain approximately linear index time speedup as the number of threads increases. Our empirical evaluations verify the efficiency and effectiveness.

Information is often stored in a distributed and proprietary form, and agents who own information are often self-interested and require incentives to reveal their information. Suitable mechanisms are required to elicit and aggregate such distributed information for decision making. In this paper, we use simulations to investigate the use of decision markets as mechanisms in a multi-agent learning system to aggregate distributed information for decision-making in a contextual bandit problem. The system utilises strictly proper decision scoring rules to assess the accuracy of probabilistic reports from agents, which allows agents to learn to solve the contextual bandit problem jointly. Our simulations show that our multi-agent system with distributed information can be trained as efficiently as a centralised counterpart with a single agent that receives all information. Moreover, we use our system to investigate scenarios with deterministic decision scoring rules which are not incentive compatible. We observe the emergence of more complex dynamics with manipulative behaviour, which agrees with existing theoretical analyses.

Efficient motion planning algorithms are of central importance for deploying robots in the real world. Unfortunately, these algorithms often drastically reduce the dimensionality of the problem for the sake of feasibility, thereby foregoing optimal solutions. This limitation is most readily observed in agile robots, where the solution space can have multiple additional dimensions. Optimal control approaches partially solve this problem by finding optimal solutions without sacrificing the complexity of the environment, but do not meet the efficiency demands of real-world applications. This work proposes an approach to resolve these issues simultaneously by training a machine learning model on the outputs of an optimal control approach.

Variational Bayes methods are a scalable estimation approach for many complex state space models. However, existing methods exhibit a trade-off between accurate estimation and computational efficiency. This paper proposes a variational approximation that mitigates this trade-off. This approximation is based on importance densities that have been proposed in the context of efficient importance sampling. By directly conditioning on the observed data, the proposed method produces an accurate approximation to the exact posterior distribution. Because the steps required for its calibration are computationally efficient, the approach is faster than existing variational Bayes methods. The proposed method can be applied to any state space model that has a closed-form measurement density function and a state transition distribution that belongs to the exponential family of distributions. We illustrate the method in numerical experiments with stochastic volatility models and a macroeconomic empirical application using a high-dimensional state space model.

A key barrier to using reinforcement learning (RL) in many real-world applications is the requirement of a large number of system interactions to learn a good control policy. Off-policy and Offline RL methods have been proposed to reduce the number of interactions with the physical environment by learning control policies from historical data. However, their performances suffer from the lack of exploration and the distributional shifts in trajectories once controllers are updated. Moreover, most RL methods require that all states are directly observed, which is difficult to be attained in many settings. To overcome these challenges, we propose a trajectory generation algorithm, which adaptively generates new trajectories as if the system is being operated and explored under the updated control policies. Motivated by the fundamental lemma for linear systems, assuming sufficient excitation, we generate trajectories from linear combinations of historical trajectories. For linear feedback control, we prove that the algorithm generates trajectories with the exact distribution as if they are sampled from the real system using the updated control policy. In particular, the algorithm extends to systems where the states are not directly observed. Experiments show that the proposed method significantly reduces the number of sampled data needed for RL algorithms.

For basic machine learning problems, expected error is used to evaluate model performance. Since the distribution of data is usually unknown, we can make simple hypothesis that the data are sampled independently and identically distributed (i.i.d.) and the mean value of loss function is used as the empirical risk by Law of Large Numbers (LLN). This is known as the Monte Carlo method. However, when LLN is not applicable, such as imbalanced data problems, empirical risk will cause overfitting and might decrease robustness and generalization ability. Inspired by the framework of nonlinear expectation theory, we substitute the mean value of loss function with the maximum value of subgroup mean loss. We call it nonlinear Monte Carlo method. In order to use numerical method of optimization, we linearize and smooth the functional of maximum empirical risk and get the descent direction via quadratic programming. With the proposed method, we achieve better performance than SOTA backbone models with less training steps, and more robustness for basic regression and imbalanced classification tasks.

Adversarial Imitation Learning (AIL) is a class of popular state-of-the-art Imitation Learning algorithms commonly used in robotics. In AIL, an artificial adversary's misclassification is used as a reward signal that is optimized by any standard Reinforcement Learning (RL) algorithm. Unlike most RL settings, the reward in AIL is $differentiable$ but current model-free RL algorithms do not make use of this property to train a policy. The reward is AIL is also shaped since it comes from an adversary. We leverage the differentiability property of the shaped AIL reward function and formulate a class of Actor Residual Critic (ARC) RL algorithms. ARC algorithms draw a parallel to the standard Actor-Critic (AC) algorithms in RL literature and uses a residual critic, $C$ function (instead of the standard $Q$ function) to approximate only the discounted future return (excluding the immediate reward). ARC algorithms have similar convergence properties as the standard AC algorithms with the additional advantage that the gradient through the immediate reward is exact. For the discrete (tabular) case with finite states, actions, and known dynamics, we prove that policy iteration with $C$ function converges to an optimal policy. In the continuous case with function approximation and unknown dynamics, we experimentally show that ARC aided AIL outperforms standard AIL in simulated continuous-control and real robotic manipulation tasks. ARC algorithms are simple to implement and can be incorporated into any existing AIL implementation with an AC algorithm. Video and link to code are available at: //sites.google.com/view/actor-residual-critic.

Reinforcement Learning (RL) generally suffers from poor sample complexity, mostly due to the need to exhaustively explore the state space to find good policies. On the other hand, we postulate that expert knowledge of the system to control often allows us to design simple rules we expect good policies to follow at all times. In this work, we hence propose a simple yet effective modification of continuous actor-critic RL frameworks to incorporate such prior knowledge in the learned policies and constrain them to regions of the state space that are deemed interesting, thereby significantly accelerating their convergence. Concretely, we saturate the actions chosen by the agent if they do not comply with our intuition and, critically, modify the gradient update step of the policy to ensure the learning process does not suffer from the saturation step. On a room temperature control simulation case study, these modifications allow agents to converge to well-performing policies up to one order of magnitude faster than classical RL agents while retaining good final performance.

There are existing standard solvers for tackling discrete optimization problems. However, in practice, it is uncommon to apply them directly to the large input space typical of this class of problems. Rather, the input is preprocessed to look for simplifications and to extract the core subset of the problem space, which is called the Kernel. This pre-processing procedure is known in the context of parameterized complexity theory as Kernelization. In this thesis, I implement parallel versions of some Kernelization algorithms and evaluate their performance. The performance of Kernelization algorithms is measured either by the size of the output Kernel or by the time it takes to compute the kernel. Sometimes the Kernel is the same as the original input, so it is desirable to know this, as soon as possible. The problem scope is limited to a particular type of discrete optimisation problem which is a version of the K-clique problem in which nodes of the given graph are pre-coloured legally using k colours. The final evaluation shows that my parallel implementations achieve over 50% improvement in efficiency for at least one of these algorithms. This is attained not just in terms of speed, but it is also able to produce a smaller kernel.

In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.

北京阿比特科技有限公司