In many scenarios, unmanned aerial vehicles (UAVs), aka drones, need to have the capability of autonomous flying to carry out their mission successfully. In order to allow these autonomous flights, drones need to know their location constantly. Then, based on the current position and the final destination, navigation commands will be generated and drones will be guided to their destination. Localization can be easily carried out in outdoor environments using GPS signals and drone inertial measurement units (IMUs). However, such an approach is not feasible in indoor environments or GPS-denied areas. In this paper, we propose a localization scheme for drones called PILOT (High-Precision Indoor Localization for Autonomous Drones) that is specifically designed for indoor environments. PILOT relies on ultrasonic acoustic signals to estimate the target drone's location. In order to have a precise final estimation of the drone's location, PILOT deploys a three-stage localization scheme. The first two stages provide robustness against the multi-path fading effect of indoor environments and mitigate the ranging error. Then, in the third stage, PILOT deploys a simple yet effective technique to reduce the localization error induced by the relative geometry between transmitters and receivers and significantly reduces the height estimation error. The performance of PILOT was assessed under different scenarios and the results indicate that PILOT achieves centimeter-level accuracy for three-dimensional localization of drones.
Traversability prediction is a fundamental perception capability for autonomous navigation. The diversity of data in different domains imposes significant gaps to the prediction performance of the perception model. In this work, we make efforts to reduce the gaps by proposing a novel coarse-to-fine unsupervised domain adaptation (UDA) model - CALI. Our aim is to transfer the perception model with high data efficiency, eliminate the prohibitively expensive data labeling, and improve the generalization capability during the adaptation from easy-to-obtain source domains to various challenging target domains. We prove that a combination of a coarse alignment and a fine alignment can be beneficial to each other and further design a first-coarse-then-fine alignment process. This proposed work bridges theoretical analyses and algorithm designs, leading to an efficient UDA model with easy and stable training. We show the advantages of our proposed model over multiple baselines in several challenging domain adaptation setups. To further validate the effectiveness of our model, we then combine our perception model with a visual planner to build a navigation system and show the high reliability of our model in complex natural environments where no labeled data is available.
The adoption of Unmanned Aerial Vehicles (UAVs) for public safety applications has skyrocketed in the last years. Leveraging on Physical Random Access Channel (PRACH) preambles, in this paper we pioneer a novel localization technique for UAVs equipped with cellular base stations used in emergency scenarios. We exploit the new concept of Orthogonal Time Frequency Space (OTFS) modulation (tolerant to channel Doppler spread caused by UAVs motion) to build a fully standards-compliant OTFS-modulated PRACH transmission and reception scheme able to perform time-of-arrival (ToA) measurements. First, we analyze such novel ToA ranging technique, both analytically and numerically, to accurately and iteratively derive the distance between localized users and the points traversed by the UAV along its trajectory. Then, we determine the optimal UAV speed as a trade-off between the accuracy of the ranging technique and the power needed by the UAV to reach and keep its speed during emergency operations. Finally, we demonstrate that our solution outperforms standard PRACH-based localization techniques in terms of Root Mean Square Error (RMSE) by about 20% in quasi-static conditions and up to 80% in high-mobility conditions.
Unmanned aerial vehicles (UAVs), commonly known as drones, are being increasingly deployed throughout the globe as a means to streamline monitoring, inspection, mapping, and logistic routines. When dispatched on autonomous missions, drones require an intelligent decision-making system for trajectory planning and tour optimization. Given the limited capacity of their onboard batteries, a key design challenge is to ensure the underlying algorithms can efficiently optimize the mission objectives along with recharging operations during long-haul flights. With this in view, the present work undertakes a comprehensive study on automated tour management systems for an energy-constrained drone: (1) We construct a machine learning model that estimates the energy expenditure of typical multi-rotor drones while accounting for real-world aspects and extrinsic meteorological factors. (2) Leveraging this model, the joint program of flight mission planning and recharging optimization is formulated as a multi-criteria Asymmetric Traveling Salesman Problem (ATSP), wherein a drone seeks for the time-optimal energy-feasible tour that visits all the target sites and refuels whenever necessary. (3) We devise an efficient approximation algorithm with provable worst-case performance guarantees and implement it in a drone management system, which supports real-time flight path tracking and re-computation in dynamic environments. (4) The effectiveness and practicality of the proposed approach are validated through extensive numerical simulations as well as real-world experiments.
Ball 3D localization in team sports has various applications including automatic offside detection in soccer, or shot release localization in basketball. Today, this task is either resolved by using expensive multi-views setups, or by restricting the analysis to ballistic trajectories. In this work, we propose to address the task on a single image from a calibrated monocular camera by estimating ball diameter in pixels and use the knowledge of real ball diameter in meters. This approach is suitable for any game situation where the ball is (even partly) visible. To achieve this, we use a small neural network trained on image patches around candidates generated by a conventional ball detector. Besides predicting ball diameter, our network outputs the confidence of having a ball in the image patch. Validations on 3 basketball datasets reveals that our model gives remarkable predictions on ball 3D localization. In addition, through its confidence output, our model improves the detection rate by filtering the candidates produced by the detector. The contributions of this work are (i) the first model to address 3D ball localization on a single image, (ii) an effective method for ball 3D annotation from single calibrated images, (iii) a high quality 3D ball evaluation dataset annotated from a single viewpoint. In addition, the code to reproduce this research is be made freely available at //github.com/gabriel-vanzandycke/deepsport.
Most existing deblurring methods focus on removing global blur caused by camera shake, while they cannot well handle local blur caused by object movements. To fill the vacancy of local deblurring in real scenes, we establish the first real local motion blur dataset (ReLoBlur), which is captured by a synchronized beam-splitting photographing system and corrected by a post-progressing pipeline. Based on ReLoBlur, we propose a Local Blur-Aware Gated network (LBAG) and several local blur-aware techniques to bridge the gap between global and local deblurring: 1) a blur detection approach based on background subtraction to localize blurred regions; 2) a gate mechanism to guide our network to focus on blurred regions; and 3) a blur-aware patch cropping strategy to address data imbalance problem. Extensive experiments prove the reliability of ReLoBlur dataset, and demonstrate that LBAG achieves better performance than state-of-the-art global deblurring methods without our proposed local blur-aware techniques.
Nanodrone swarm is formulated by multiple light-weight and low-cost nanodrones to perform the tasks in very challenging environments. Therefore, it is essential to estimate the relative position of nanodrones in the swarm for accurate and safe platooning in inclement indoor environment. However, the vision and infrared sensors are constrained by the line-of-sight perception, and instrumenting extra motion sensors on drone's body is constrained by the nanodrone's form factor and energy-efficiency. This paper presents the design, implementation and evaluation of RFDrone, a system that can sense the relative position of nanodrone in the swarm using wireless signals, which can naturally identify each individual nanodrone. To do so, each light-weight nanodrone is attached with a RF sticker (i.e., called RFID tag), which will be localized by the external RFID reader in the inclement indoor environment. Instead of accurately localizing each RFID-tagged nanodrone, we propose to estimate the relative position of all the RFID-tagged nanodrones in the swarm based on the spatial-temporal phase profiling. We implement an end-to-end physical prototype of RFDrone. Our experimental results show that RFDrone can accurately estimate the relative position of nanodrones in the swarm with average relative localization accuracy of around 0.95 across x, y and z axis, and average accuracy of around 0.93 for nanodrone swarm's geometry estimation.
Automotive radar provides reliable environmental perception in all-weather conditions with affordable cost, but it hardly supplies semantic and geometry information due to the sparsity of radar detection points. With the development of automotive radar technologies in recent years, instance segmentation becomes possible by using automotive radar. Its data contain contexts such as radar cross section and micro-Doppler effects, and sometimes can provide detection when the field of view is obscured. The outcome from instance segmentation could be potentially used as the input of trackers for tracking targets. The existing methods often utilize a clustering-based classification framework, which fits the need of real-time processing but has limited performance due to minimum information provided by sparse radar detection points. In this paper, we propose an efficient method based on clustering of estimated semantic information to achieve instance segmentation for the sparse radar detection points. In addition, we show that the performance of the proposed approach can be further enhanced by incorporating the visual multi-layer perceptron. The effectiveness of the proposed method is verified by experimental results on the popular RadarScenes dataset, achieving 89.53% mean coverage and 86.97% mean average precision with the IoU threshold of 0.5, which is superior to other approaches in the literature. More significantly, the consumed memory is around 1MB, and the inference time is less than 40ms, indicating that our proposed algorithm is storage and time efficient. These two criteria ensure the practicality of the proposed method in real-world systems.
Simultaneous Localization and Mapping (SLAM) estimates agents' trajectories and constructs maps, and localization is a fundamental kernel in autonomous machines at all computing scales, from drones, AR, VR to self-driving cars. In this work, we present an energy-efficient and runtime-reconfigurable FPGA-based accelerator for robotic localization. We exploit SLAM-specific data locality, sparsity, reuse, and parallelism, and achieve >5x performance improvement over the state-of-the-art. Especially, our design is reconfigurable at runtime according to the environment to save power while sustaining accuracy and performance.
Leveraging line features to improve localization accuracy of point-based visual-inertial SLAM (VINS) is gaining interest as they provide additional constraints on scene structure. However, real-time performance when incorporating line features in VINS has not been addressed. This paper presents PL-VINS, a real-time optimization-based monocular VINS method with point and line features, developed based on the state-of-the-art point-based VINS-Mono \cite{vins}. We observe that current works use the LSD \cite{lsd} algorithm to extract line features; however, LSD is designed for scene shape representation instead of the pose estimation problem, which becomes the bottleneck for the real-time performance due to its high computational cost. In this paper, a modified LSD algorithm is presented by studying a hidden parameter tuning and length rejection strategy. The modified LSD can run at least three times as fast as LSD. Further, by representing space lines with the Pl\"{u}cker coordinates, the residual error in line estimation is modeled in terms of the point-to-line distance, which is then minimized by iteratively updating the minimum four-parameter orthonormal representation of the Pl\"{u}cker coordinates. Experiments in a public benchmark dataset show that the localization error of our method is 12-16\% less than that of VINS-Mono at the same pose update frequency. %For the benefit of the community, The source code of our method is available at: //github.com/cnqiangfu/PL-VINS.
Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.