Image-to-point cloud (I2P) registration is a fundamental task in the field of autonomous vehicles and transportation systems for cross-modality data fusion and localization. Existing I2P registration methods estimate correspondences at the point/pixel level, often overlooking global alignment. However, I2P matching can easily converge to a local optimum when performed without high-level guidance from global constraints. To address this issue, this paper introduces CoFiI2P, a novel I2P registration network that extracts correspondences in a coarse-to-fine manner to achieve the globally optimal solution. First, the image and point cloud data are processed through a Siamese encoder-decoder network for hierarchical feature extraction. Second, a coarse-to-fine matching module is designed to leverage these features and establish robust feature correspondences. Specifically, In the coarse matching phase, a novel I2P transformer module is employed to capture both homogeneous and heterogeneous global information from the image and point cloud data. This enables the estimation of coarse super-point/super-pixel matching pairs with discriminative descriptors. In the fine matching module, point/pixel pairs are established with the guidance of super-point/super-pixel correspondences. Finally, based on matching pairs, the transform matrix is estimated with the EPnP-RANSAC algorithm. Extensive experiments conducted on the KITTI dataset demonstrate that CoFiI2P achieves impressive results, with a relative rotation error (RRE) of 1.14 degrees and a relative translation error (RTE) of 0.29 meters. These results represent a significant improvement of 84\% in RRE and 89\% in RTE compared to the current state-of-the-art (SOTA) method. Qualitative results are available at //youtu.be/ovbedasXuZE. The source code will be publicly released at //github.com/kang-1-2-3/CoFiI2P.
Reliable object grasping is a crucial capability for autonomous robots. However, many existing grasping approaches focus on general clutter removal without explicitly modeling objects and thus only relying on the visible local geometry. We introduce CenterGrasp, a novel framework that combines object awareness and holistic grasping. CenterGrasp learns a general object prior by encoding shapes and valid grasps in a continuous latent space. It consists of an RGB-D image encoder that leverages recent advances to detect objects and infer their pose and latent code, and a decoder to predict shape and grasps for each object in the scene. We perform extensive experiments on simulated as well as real-world cluttered scenes and demonstrate strong scene reconstruction and 6-DoF grasp-pose estimation performance. Compared to the state of the art, CenterGrasp achieves an improvement of 38.5 mm in shape reconstruction and 33 percentage points on average in grasp success. We make the code and trained models publicly available at //centergrasp.cs.uni-freiburg.de.
Cooperative perception is crucial for connected automated vehicles in intelligent transportation systems (ITSs); however, ensuring the authenticity of perception data remains a challenge as the vehicles cannot verify events that they do not witness independently. Various studies have been conducted on establishing the authenticity of data, such as trust-based statistical methods and plausibility-based methods. However, these methods are limited as they require prior knowledge such as previous sender behaviors or predefined rules to evaluate the authenticity. To overcome this limitation, this study proposes a novel approach called zero-knowledge Proof of Traffic (zk-PoT), which involves generating cryptographic proofs to the traffic observations. Multiple independent proofs regarding the same vehicle can be deterministically cross-verified by any receivers without relying on ground truth, probabilistic, or plausibility evaluations. Additionally, no private information is compromised during the entire procedure. A full on-board unit software stack that reflects the behavior of zk-PoT is implemented within a specifically designed simulator called Flowsim. A comprehensive experimental analysis is then conducted using synthesized city-scale simulations, which demonstrates that zk-PoT's cross-verification ratio ranges between 80 % to 96 %, and 80 % of the verification is achieved in 2 s, with a protocol overhead of approximately 25 %. Furthermore, the analyses of various attacks indicate that most of the attacks could be prevented, and some, such as collusion attacks, can be mitigated. The proposed approach can be incorporated into existing works, including the European Telecommunications Standards Institute (ETSI) and the International Organization for Standardization (ISO) ITS standards, without disrupting the backward compatibility.
The ability to accurately predict the trajectory of surrounding vehicles is a critical hurdle to overcome on the journey to fully autonomous vehicles. To address this challenge, we pioneer a novel behavior-aware trajectory prediction model (BAT) that incorporates insights and findings from traffic psychology, human behavior, and decision-making. Our model consists of behavior-aware, interaction-aware, priority-aware, and position-aware modules that perceive and understand the underlying interactions and account for uncertainty and variability in prediction, enabling higher-level learning and flexibility without rigid categorization of driving behavior. Importantly, this approach eliminates the need for manual labeling in the training process and addresses the challenges of non-continuous behavior labeling and the selection of appropriate time windows. We evaluate BAT's performance across the Next Generation Simulation (NGSIM), Highway Drone (HighD), Roundabout Drone (RounD), and Macao Connected Autonomous Driving (MoCAD) datasets, showcasing its superiority over prevailing state-of-the-art (SOTA) benchmarks in terms of prediction accuracy and efficiency. Remarkably, even when trained on reduced portions of the training data (25%), our model outperforms most of the baselines, demonstrating its robustness and efficiency in predicting vehicle trajectories, and the potential to reduce the amount of data required to train autonomous vehicles, especially in corner cases. In conclusion, the behavior-aware model represents a significant advancement in the development of autonomous vehicles capable of predicting trajectories with the same level of proficiency as human drivers. The project page is available at //github.com/Petrichor625/BATraj-Behavior-aware-Model.
Controlling marine vehicles in challenging environments is a complex task due to the presence of nonlinear hydrodynamics and uncertain external disturbances. Despite nonlinear model predictive control (MPC) showing potential in addressing these issues, its practical implementation is often constrained by computational limitations. In this paper, we propose an efficient controller for trajectory tracking of marine vehicles by employing a convex error-state MPC on the Lie group. By leveraging the inherent geometric properties of the Lie group, we can construct globally valid error dynamics and formulate a quadratic programming-based optimization problem. Our proposed MPC demonstrates effectiveness in trajectory tracking through extensive-numerical simulations, including scenarios involving ocean currents. Notably, our method substantially reduces computation time compared to nonlinear MPC, making it well-suited for real-time control applications with long prediction horizons or involving small marine vehicles.
Click-through rate (CTR) prediction is a vital task in industry advertising systems. Most existing methods focus on the structure design of neural network for better accuracy and suffer from the data sparsity problem. Especially in industry advertising systems, the widely applied negative sample downsampling technique due to resource limitation worsens the problem, resulting in a decline in performance. In this paper, we propose \textbf{A}uxiliary Match \textbf{T}asks for enhancing \textbf{C}lick-\textbf{T}hrough \textbf{R}ate performance (AT4CTR) to alleviate the data sparsity problem. Specifically, we design two match tasks inspired by collaborative filtering to enhance the relevance between user and item. As the "click" action is a strong signal which indicates user's preference towards item directly, we make the first match task aim at pulling closer the representation between user and item regarding the positive samples. Since the user's past click behaviors can also be treated as the user him/herself, we apply the next item prediction as the second match task. For both the match tasks, we choose the InfoNCE in contrastive learning as their loss function. The two match tasks can provide meaningful training signals to speed up the model's convergence and alleviate the data sparsity. We conduct extensive experiments on a public dataset and a large-scale industry advertising dataset. The results demonstrate the effectiveness of the proposed auxiliary match tasks. AT4CTR has been deployed in the real industry advertising system and gains remarkable revenue.
Reconfigurable intelligent surface (RIS) is a promising technology that can reshape the electromagnetic environment in wireless networks, offering various possibilities for enhancing wireless channels. Motivated by this, we investigate the channel optimization for multiple-input multiple-output (MIMO) systems assisted by RIS. In this paper, an efficient RIS optimization method is proposed to enhance the effective rank of the MIMO channel for achievable rate improvement. Numerical results are presented to verify the effectiveness of RIS in improving MIMO channels. Additionally, we construct a 2$\times$2 RIS-assisted MIMO prototype to perform experimental measurements and validate the performance of our proposed algorithm. The results reveal a significant increase in effective rank and achievable rate for the RIS-assisted MIMO channel compared to the MIMO channel without RIS.
Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles. However, the absence of feature maps shared from other vehicles can lead to a significant decline in object detection performance for cooperative perception models compared to standalone 3D detection models. This drawback impedes the adoption of cooperative perception as vehicle resources are often insufficient to concurrently employ two perception models. To tackle this issue, we present Simultaneous Individual and Cooperative Perception (SiCP), a generic framework that supports a wide range of the state-of-the-art standalone perception backbones and enhances them with a novel Dual-Perception Network (DP-Net) designed to facilitate both individual and cooperative perception. In addition to its lightweight nature with only 0.13M parameters, DP-Net is robust and retains crucial gradient information during feature map fusion. As demonstrated in a comprehensive evaluation on the OPV2V dataset, thanks to DP-Net, SiCP surpasses state-of-the-art cooperative perception solutions while preserving the performance of standalone perception solutions.
Intelligent reflecting surface (IRS)-assisted unmanned aerial vehicle (UAV) communications are expected to alleviate the load of ground base stations in a cost-effective way. Existing studies mainly focus on the deployment and resource allocation of a single IRS instead of multiple IRSs, whereas it is extremely challenging for joint multi-IRS multi-user association in UAV communications with constrained reflecting resources and dynamic scenarios. To address the aforementioned challenges, we propose a new optimization algorithm for joint IRS-user association, trajectory optimization of UAVs, successive interference cancellation (SIC) decoding order scheduling and power allocation to maximize system energy efficiency. We first propose an inverse soft-Q learning-based algorithm to optimize multi-IRS multi-user association. Then, SCA and Dinkelbach-based algorithm are leveraged to optimize UAV trajectory followed by the optimization of SIC decoding order scheduling and power allocation. Finally, theoretical analysis and performance results show significant advantages of the designed algorithm in convergence rate and energy efficiency.
Multi-modal fusion is a fundamental task for the perception of an autonomous driving system, which has recently intrigued many researchers. However, achieving a rather good performance is not an easy task due to the noisy raw data, underutilized information, and the misalignment of multi-modal sensors. In this paper, we provide a literature review of the existing multi-modal-based methods for perception tasks in autonomous driving. Generally, we make a detailed analysis including over 50 papers leveraging perception sensors including LiDAR and camera trying to solve object detection and semantic segmentation tasks. Different from traditional fusion methodology for categorizing fusion models, we propose an innovative way that divides them into two major classes, four minor classes by a more reasonable taxonomy in the view of the fusion stage. Moreover, we dive deep into the current fusion methods, focusing on the remaining problems and open-up discussions on the potential research opportunities. In conclusion, what we expect to do in this paper is to present a new taxonomy of multi-modal fusion methods for the autonomous driving perception tasks and provoke thoughts of the fusion-based techniques in the future.
Owing to effective and flexible data acquisition, unmanned aerial vehicle (UAV) has recently become a hotspot across the fields of computer vision (CV) and remote sensing (RS). Inspired by recent success of deep learning (DL), many advanced object detection and tracking approaches have been widely applied to various UAV-related tasks, such as environmental monitoring, precision agriculture, traffic management. This paper provides a comprehensive survey on the research progress and prospects of DL-based UAV object detection and tracking methods. More specifically, we first outline the challenges, statistics of existing methods, and provide solutions from the perspectives of DL-based models in three research topics: object detection from the image, object detection from the video, and object tracking from the video. Open datasets related to UAV-dominated object detection and tracking are exhausted, and four benchmark datasets are employed for performance evaluation using some state-of-the-art methods. Finally, prospects and considerations for the future work are discussed and summarized. It is expected that this survey can facilitate those researchers who come from remote sensing field with an overview of DL-based UAV object detection and tracking methods, along with some thoughts on their further developments.