亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Recently there have been a lot of interests in introducing UAVs for a wide range of applications, making ensuring safety of multi-vehicle systems a highly crucial problem. Hamilton-Jacobi (HJ) reachability is a promising tool for analyzing safety of vehicles for low-dimensional systems. However, reachability suffers from the curse of dimensionality, making its direct application to more than two vehicles intractable. Recent works have made it tractable to guarantee safety for 3 and 4 vehicles with reachability. However, the number of vehicles safety can be guaranteed for remains small. In this paper, we propose a novel reachability-based approach that guarantees safety for any number of vehicles while vehicles complete their objectives of visiting multiple targets efficiently, given any K-vehicle collision avoidance algorithm where K can in general be a small number. We achieve this by developing an approach to group vehicles into clusters efficiently and a control strategy that guarantees safety for any in-cluster and cross-cluster pair of vehicles for all time. Our proposed method is scalable to large number of vehicles with little computation overhead. We demonstrate our proposed approach with a simulation on 15 vehicles. In addition, we contribute a more general solution to the 3-vehicle collision avoidance problem from a past recent work, show that the prior work is a special case of our proposed generalization, and prove its validity.

相關內容

維(wei)度災難是(shi)指在高維(wei)空(kong)間中(zhong)(zhong)分析(xi)和組織數據(ju)時出現的各種現象(xiang),這些現象(xiang)在低維(wei)設置(例如日常(chang)體(ti)驗(yan)的三維(wei)物理空(kong)間)中(zhong)(zhong)不會發生。

As the demands of autonomous mobile robots are increasing in recent years, the requirement of the path planning/navigation algorithm should not be content with the ability to reach the target without any collisions, but also should try to achieve possible optimal or suboptimal path from the initial position to the target according to the robot's constrains in practice. This report investigates path planning and control strategies for mobile robots with machine learning techniques, including ground mobile robots and flying UAVs. In this report, the hybrid reactive collision-free navigation problem under an unknown static environment is investigated firstly. By combining both the reactive navigation and Q-learning method, we intend to keep the good characteristics of reactive navigation algorithm and Q-learning and overcome the shortcomings of only relying on one of them. The proposed method is then extended into 3D environments. The performance of the mentioned strategies are verified by extensive computer simulations, and good results are obtained. Furthermore, the more challenging dynamic environment situation is taken into our consideration. We tackled this problem by developing a new path planning method that utilizes the integrated environment representation and reinforcement learning. Our novel approach enables to find the optimal path to the target efficiently and avoid collisions in a cluttered environment with steady and moving obstacles. The performance of these methods is compared with other different aspects.

Jointly achieving safety and efficiency in human-robot interaction (HRI) settings is a challenging problem, as the robot's planning objectives may be at odds with the human's own intent and expectations. Recent approaches ensure safe robot operation in uncertain environments through a supervisory control scheme, sometimes called "shielding", which overrides the robot's nominal plan with a safety fallback strategy when a safety-critical event is imminent. These reactive "last-resort" strategies (typically in the form of aggressive emergency maneuvers) focus on preserving safety without efficiency considerations; when the nominal planner is unaware of possible safety overrides, shielding can be activated more frequently than necessary, leading to degraded performance. In this work, we propose a new shielding-based planning approach that allows the robot to plan efficiently by explicitly accounting for possible future shielding events. Leveraging recent work on Bayesian human motion prediction, the resulting robot policy proactively balances nominal performance with the risk of high-cost emergency maneuvers triggered by low-probability human behaviors. We formalize Shielding-Aware Robust Planning (SHARP) as a stochastic optimal control problem and propose a computationally efficient framework for finding tractable approximate solutions at runtime. Our method outperforms the shielding-agnostic motion planning baseline (equipped with the same human intent inference scheme) on simulated driving examples with human trajectories taken from the recently released Waymo Open Motion Dataset.

This paper proposes a deep learning approach to channel sensing and downlink hybrid beamforming for massive multiple-input multiple-output systems operating in the time division duplex mode and employing either single-carrier or multi-carrier transmission. The conventional precoding design involves estimating the high dimensional channel and designing the precoders based on such estimate. This two-step process is, however, not necessarily optimal. This paper shows that by training the analog sensing and designing the hybrid downlink precoders directly from the received pilots without the intermediate high-dimensional channel estimation, the overall system performance can be significantly improved. However, the direct approach that simultaneously designs the hybrid precoders is difficult to train and only works for a fixed number of users. In this paper, we develop a simplified semi-direct approach that enjoys most of the advantages of the direct design while eliminating its drawbacks. Specifically, the proposed approach learns the uplink sensing stage and downlink analog precoder using deep learning and designs the digital precoder based on an estimate of the low-dimensional equivalent channel. Numerical comparisons show that the proposed methodology requires significantly less training overhead than the conventional strategy and further demonstrate its generalizability to various different system settings.

This paper presents Learning-based Autonomous Guidance with RObustness and Stability guarantees (LAG-ROS), which provides machine learning-based nonlinear motion planners with formal robustness and stability guarantees, by designing a differential Lyapunov function using contraction theory. LAG-ROS utilizes a neural network to model a robust tracking controller independently of a target trajectory, for which we show that the Euclidean distance between the target and controlled trajectories is exponentially bounded linearly in the learning error, even under the existence of bounded external disturbances. We also present a convex optimization approach that minimizes the steady-state bound of the tracking error to construct the robust control law for neural network training. In numerical simulations, it is demonstrated that the proposed method indeed possesses superior properties of robustness and nonlinear stability resulting from contraction theory, whilst retaining the computational efficiency of existing learning-based motion planners.

Competent multi-lane cruising requires using lane changes and within-lane maneuvers to achieve good speed and maintain safety. This paper proposes a design for autonomous multi-lane cruising by combining a hierarchical reinforcement learning framework with a novel state-action space abstraction. While the proposed solution follows the classical hierarchy of behavior decision, motion planning and control, it introduces a key intermediate abstraction within the motion planner to discretize the state-action space according to high level behavioral decisions. We argue that this design allows principled modular extension of motion planning, in contrast to using either monolithic behavior cloning or a large set of hand-written rules. Moreover, we demonstrate that our state-action space abstraction allows transferring of the trained models without retraining from a simulated environment with virtually no dynamics to one with significantly more realistic dynamics. Together, these results suggest that our proposed hierarchical architecture is a promising way to allow reinforcement learning to be applied to complex multi-lane cruising in the real world.

Motion planning under uncertainty is one of the main challenges in developing autonomous driving vehicles. In this work, we focus on the uncertainty in sensing and perception, resulted from a limited field of view, occlusions, and sensing range. This problem is often tackled by considering hypothetical hidden objects in occluded areas or beyond the sensing range to guarantee passive safety. However, this may result in conservative planning and expensive computation, particularly when numerous hypothetical objects need to be considered. We propose a reinforcement learning (RL) based solution to manage uncertainty by optimizing for the worst case outcome. This approach is in contrast to traditional RL, where the agents try to maximize the average expected reward. The proposed approach is built on top of the Distributional RL with its policy optimization maximizing the stochastic outcomes' lower bound. This modification can be applied to a range of RL algorithms. As a proof-of-concept, the approach is applied to two different RL algorithms, Soft Actor-Critic and DQN. The approach is evaluated against two challenging scenarios of pedestrians crossing with occlusion and curved roads with a limited field of view. The algorithm is trained and evaluated using the SUMO traffic simulator. The proposed approach yields much better motion planning behavior compared to conventional RL algorithms and behaves comparably to humans driving style.

Quadrotors are among the most agile flying robots. However, planning time-optimal trajectories at the actuation limit through multiple waypoints remains an open problem. This is crucial for applications such as inspection, delivery, search and rescue, and drone racing. Early works used polynomial trajectory formulations, which do not exploit the full actuator potential because of their inherent smoothness. Recent works resorted to numerical optimization but require waypoints to be allocated as costs or constraints at specific discrete times. However, this time allocation is a priori unknown and renders previous works incapable of producing truly time-optimal trajectories. To generate truly time-optimal trajectories, we propose a solution to the time allocation problem while exploiting the full quadrotor's actuator potential. We achieve this by introducing a formulation of progress along the trajectory, which enables the simultaneous optimization of the time allocation and the trajectory itself. We compare our method against related approaches and validate it in real-world flights in one of the world's largest motion-capture systems, where we outperform human expert drone pilots in a drone-racing task.

Retrosynthetic planning is a fundamental problem in chemistry for finding a pathway of reactions to synthesize a target molecule. Recently, search algorithms have shown promising results for solving this problem by using deep neural networks (DNNs) to expand their candidate solutions, i.e., adding new reactions to reaction pathways. However, the existing works on this line are suboptimal; the retrosynthetic planning problem requires the reaction pathways to be (a) represented by real-world reactions and (b) executable using "building block" molecules, yet the DNNs expand reaction pathways without fully incorporating such requirements. Motivated by this, we propose an end-to-end framework for directly training the DNNs towards generating reaction pathways with the desirable properties. Our main idea is based on a self-improving procedure that trains the model to imitate successful trajectories found by itself. We also propose a novel reaction augmentation scheme based on a forward reaction model. Our experiments demonstrate that our scheme significantly improves the success rate of solving the retrosynthetic problem from 86.84% to 96.32% while maintaining the performance of DNN for predicting valid reactions.

Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampled action subsets. This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm based upon policy iteration. Concretely, we propose Sampled MuZero, an extension of the MuZero algorithm that is able to learn in domains with arbitrarily complex action spaces by planning over sampled actions. We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: DeepMind Control Suite and Real-World RL Suite.

Detection of malicious behavior is a fundamental problem in security. One of the major challenges in using detection systems in practice is in dealing with an overwhelming number of alerts that are triggered by normal behavior (the so-called false positives), obscuring alerts resulting from actual malicious activity. While numerous methods for reducing the scope of this issue have been proposed, ultimately one must still decide how to prioritize which alerts to investigate, and most existing prioritization methods are heuristic, for example, based on suspiciousness or priority scores. We introduce a novel approach for computing a policy for prioritizing alerts using adversarial reinforcement learning. Our approach assumes that the attackers know the full state of the detection system and dynamically choose an optimal attack as a function of this state, as well as of the alert prioritization policy. The first step of our approach is to capture the interaction between the defender and attacker in a game theoretic model. To tackle the computational complexity of solving this game to obtain a dynamic stochastic alert prioritization policy, we propose an adversarial reinforcement learning framework. In this framework, we use neural reinforcement learning to compute best response policies for both the defender and the adversary to an arbitrary stochastic policy of the other. We then use these in a double-oracle framework to obtain an approximate equilibrium of the game, which in turn yields a robust stochastic policy for the defender. Extensive experiments using case studies in fraud and intrusion detection demonstrate that our approach is effective in creating robust alert prioritization policies.

北京阿比特科技有限公司