Time-optimal motion planning of autonomous vehicles in complex environments is a highly researched topic. This paper describes a novel approach to optimize and execute locally feasible trajectories for the maneuvering of a truck-trailer Autonomous Mobile Robot (AMR), by dividing the environment in a sequence or route of freely accessible overlapping corridors. Multi-stage optimal control generates local trajectories through advancing subsets of this route. To cope with the advancing subsets and changing environments, the optimal control problem is solved online with a receding horizon in a Model Predictive Control (MPC) fashion with an improved update strategy. This strategy seamlessly integrates the computationally expensive MPC updates with a low-cost feedback controller for trajectory tracking, for disturbance rejection, and for stabilization of the unstable kinematics of the reversing truck-trailer AMR. This methodology is implemented in a flexible software framework for an effortless transition from offline simulations to deployment of experiments. An experimental setup showcasing the truck-trailer AMR performing two reverse parking maneuvers validates the presented method.
Autonomous agents that drive on roads shared with human drivers must reason about the nuanced interactions among traffic participants. This poses a highly challenging decision making problem since human behavior is influenced by a multitude of factors (e.g., human intentions and emotions) that are hard to model. This paper presents a decision making approach for autonomous driving, focusing on the complex task of merging into moving traffic where uncertainty emanates from the behavior of other drivers and imperfect sensor measurements. We frame the problem as a partially observable Markov decision process (POMDP) and solve it online with Monte Carlo tree search. The solution to the POMDP is a policy that performs high-level driving maneuvers, such as giving way to an approaching car, keeping a safe distance from the vehicle in front or merging into traffic. Our method leverages a model learned from data to predict the future states of traffic while explicitly accounting for interactions among the surrounding agents. From these predictions, the autonomous vehicle can anticipate the future consequences of its actions on the environment and optimize its trajectory accordingly. We thoroughly test our approach in simulation, showing that the autonomous vehicle can adapt its behavior to different situations. We also compare against other methods, demonstrating an improvement with respect to the considered performance metrics.
Thanks to the augmented convenience, safety advantages, and potential commercial value, Intelligent vehicles (IVs) have attracted wide attention throughout the world. Although a few autonomous driving unicorns assert that IVs will be commercially deployable by 2025, their implementation is still restricted to small-scale validation due to various issues, among which precise computation of control commands or trajectories by planning methods remains a prerequisite for IVs. This paper aims to review state-of-the-art planning methods, including pipeline planning and end-to-end planning methods. In terms of pipeline methods, a survey of selecting algorithms is provided along with a discussion of the expansion and optimization mechanisms, whereas in end-to-end methods, the training approaches and verification scenarios of driving tasks are points of concern. Experimental platforms are reviewed to facilitate readers in selecting suitable training and validation methods. Finally, the current challenges and future directions are discussed. The side-by-side comparison presented in this survey not only helps to gain insights into the strengths and limitations of the reviewed methods but also assists with system-level design choices.
Under ray-optical light transport, the classical ray serves as a local and linear "point query" of light's behaviour. Such point queries are useful, and sophisticated path tracing and sampling techniques enable efficiently computing solutions to light transport problems in complex, real-world settings and environments. However, such formulations are firmly confined to the realm of ray optics, while many applications of interest, in computer graphics and computational optics, demand a more precise understanding of light. We rigorously formulate the generalized ray, which enables local and linear point queries of the wave-optical phase space. Furthermore, we present sample-solve: a simple method that serves as a novel link between path tracing and computational optics. We will show that this link enables the application of modern path tracing techniques for wave-optical rendering, improving upon the state-of-the-art in terms of the generality and accuracy of the formalism, ease of application, as well as performance. Sampling using generalized rays enables interactive rendering under rigorous wave optics, with orders-of-magnitude faster performance compared to existing techniques.
Learning various motor skills for quadrupedal robots is a challenging problem that requires careful design of task-specific mathematical models or reward descriptions. In this work, we propose to learn a single capable policy using deep reinforcement learning by imitating a large number of reference motions, including walking, turning, pacing, jumping, sitting, and lying. On top of the existing motion imitation framework, we first carefully design the observation space, the action space, and the reward function to improve the scalability of the learning as well as the robustness of the final policy. In addition, we adopt a novel adaptive motion sampling (AMS) method, which maintains a balance between successful and unsuccessful behaviors. This technique allows the learning algorithm to focus on challenging motor skills and avoid catastrophic forgetting. We demonstrate that the learned policy can exhibit diverse behaviors in simulation by successfully tracking both the training dataset and out-of-distribution trajectories. We also validate the importance of the proposed learning formulation and the adaptive motion sampling scheme by conducting experiments.
For a self-driving car to operate reliably, its perceptual system must generalize to the end-user's environment -- ideally without additional annotation efforts. One potential solution is to leverage unlabeled data (e.g., unlabeled LiDAR point clouds) collected from the end-users' environments (i.e. target domain) to adapt the system to the difference between training and testing environments. While extensive research has been done on such an unsupervised domain adaptation problem, one fundamental problem lingers: there is no reliable signal in the target domain to supervise the adaptation process. To overcome this issue we observe that it is easy to collect unsupervised data from multiple traversals of repeated routes. While different from conventional unsupervised domain adaptation, this assumption is extremely realistic since many drivers share the same roads. We show that this simple additional assumption is sufficient to obtain a potent signal that allows us to perform iterative self-training of 3D object detectors on the target domain. Concretely, we generate pseudo-labels with the out-of-domain detector but reduce false positives by removing detections of supposedly mobile objects that are persistent across traversals. Further, we reduce false negatives by encouraging predictions in regions that are not persistent. We experiment with our approach on two large-scale driving datasets and show remarkable improvement in 3D object detection of cars, pedestrians, and cyclists, bringing us a step closer to generalizable autonomous driving.
Motion planning methods like navigation functions and harmonic potential fields provide (almost) global convergence and are suitable for obstacle avoidance in dynamically changing environments due to their reactive nature. A common assumption in the control design is that the robot operates in a disjoint star world, i.e. all obstacles are strictly starshaped and mutually disjoint. However, in real-life scenarios obstacles may intersect due to expanded obstacle regions corresponding to robot radius or safety margins. To broaden the applicability of aforementioned reactive motion planning methods, we propose a method to reshape a workspace of intersecting obstacles into a disjoint star world. The algorithm is based on two novel concepts presented here, namely admissible kernel and starshaped hull with specified kernel, which are closely related to the notion of starshaped hull. The utilization of the proposed method is illustrated with examples of a robot operating in a 2D workspace using a harmonic potential field approach in combination with the developed algorithm.
The growth in the number of low-cost narrow band radios such as Bluetooth low energy (BLE) enabled applications such as asset tracking, human behavior monitoring, and keyless entry. The accurate range estimation is a must in such applications. Phase-based ranging has recently gained momentum due to its high accuracy in multipath environment compared to traditional schemes such as ranging based on received signal strength. The phase-based ranging requires tone exchange on multiple frequencies on a uniformly sampled frequency grid. Such tone exchange may not be possible due to some missing tones, e.g., reserved advertisement channels. Furthermore, the IQ values at a given tone may be distorted by interference. In this paper, we proposed two phase-based ranging schemes which deal with the missing/interfered tones. We compare the performance and complexity of the proposed schemes using simulations, complexity analysis, and measurement setups. In particular, we show that for small number of missing/interfered tones, the proposed system based on employing a trained neural network (NN) performs very close to a reference ranging system where there is no missing/interference tones. Interestingly, this high performance is at the cost of negligible additional computational complexity and up to 60.5 Kbytes of additional required memory compared to the reference system, making it an attractive solution for ranging using hardware-limited radios such as BLE.
Intelligent manufacturing is becoming increasingly important due to the growing demand for maximizing productivity and flexibility while minimizing waste and lead times. This work investigates automated secondary robotic food packaging solutions that transfer food products from the conveyor belt into containers. A major problem in these solutions is varying product supply which can cause drastic productivity drops. Conventional rule-based approaches, used to address this issue, are often inadequate, leading to violation of the industry's requirements. Reinforcement learning, on the other hand, has the potential of solving this problem by learning responsive and predictive policy, based on experience. However, it is challenging to utilize it in highly complex control schemes. In this paper, we propose a reinforcement learning framework, designed to optimize the conveyor belt speed while minimizing interference with the rest of the control system. When tested on real-world data, the framework exceeds the performance requirements (99.8% packed products) and maintains quality (100% filled boxes). Compared to the existing solution, our proposed framework improves productivity, has smoother control, and reduces computation time.
Learning-based approaches have achieved impressive performance for autonomous driving and an increasing number of data-driven works are being studied in the decision-making and planning module. However, the reliability and the stability of the neural network is still full of challenges. In this paper, we introduce a hierarchical imitation method including a high-level grid-based behavior planner and a low-level trajectory planner, which is not only an individual data-driven driving policy and can also be easily embedded into the rule-based architecture. We evaluate our method both in closed-loop simulation and real world driving, and demonstrate the neural network planner has outstanding performance in complex urban autonomous driving scenarios.
The advent of artificial intelligence technology paved the way of many researches to be made within air combat sector. Academicians and many other researchers did a research on a prominent research direction called autonomous maneuver decision of UAV. Elaborative researches produced some outcomes, but decisions that include Reinforcement Learning(RL) came out to be more efficient. There have been many researches and experiments done to make an agent reach its target in an optimal way, most prominent are Genetic Algorithm(GA) , A star, RRT and other various optimization techniques have been used. But Reinforcement Learning is the well known one for its success. In DARPHA Alpha Dogfight Trials, reinforcement learning prevailed against a real veteran F16 human pilot who was trained by Boeing. This successor model was developed by Heron Systems. After this accomplishment, reinforcement learning bring tremendous attention on itself. In this research we aimed our UAV which has a dubin vehicle dynamic property to move to the target in two dimensional space in an optimal path using Twin Delayed Deep Deterministic Policy Gradients (TD3) and used in experience replay Hindsight Experience Replay(HER).We did tests on two different environments and used simulations.