In the driving scene, the road agents usually conduct frequent interactions and intention understanding of the surroundings. Ego-agent (each road agent itself) predicts what behavior will be engaged by other road users all the time and expects a shared and consistent understanding for safe movement. Behavioral Intention Prediction (BIP) simulates such a human consideration process and fulfills the early prediction of specific behaviors. Similar to other prediction tasks, such as trajectory prediction, data-driven deep learning methods have taken the primary pipeline in research. The rapid development of BIP inevitably leads to new issues and challenges. To catalyze future research, this work provides a comprehensive review of BIP from the available datasets, key factors and challenges, pedestrian-centric and vehicle-centric BIP approaches, and BIP-aware applications. Based on the investigation, data-driven deep learning approaches have become the primary pipelines. The behavioral intention types are still monotonous in most current datasets and methods (e.g., Crossing (C) and Not Crossing (NC) for pedestrians and Lane Changing (LC) for vehicles) in this field. In addition, for the safe-critical scenarios (e.g., near-crashing situations), current research is limited. Through this investigation, we identify open issues in behavioral intention prediction and suggest possible insights for future research.
Clinical trials are typically run in order to understand the effects of a new treatment on a given population of patients. However, patients in large populations rarely respond the same way to the same treatment. This heterogeneity in patient responses necessitates trials that investigate effects on multiple subpopulations - especially when a treatment has marginal or no benefit for the overall population but might have significant benefit for a particular subpopulation. Motivated by this need, we propose Syntax, an exploratory trial design that identifies subpopulations with positive treatment effect among many subpopulations. Syntax is sample efficient as it (i) recruits and allocates patients adaptively and (ii) estimates treatment effects by forming synthetic controls for each subpopulation that combines control samples from other subpopulations. We validate the performance of Syntax and provide insights into when it might have an advantage over conventional trial designs through experiments.
Autonomous overtaking at high speeds is a challenging multi-agent robotics research problem. The high-speed and close proximity situations that arise in multi-agent autonomous racing require designing algorithms that trade off aggressive overtaking maneuvers and minimize the risk of collision with the opponent. In this paper, we study a special case of multi-agent autonomous race, called the head-to-head autonomous race, that requires two racecars with similar performance envelopes. We present a mathematical formulation of an overtake and position defense in this head-to-head autonomous racing scenario, and we introduce the Automaton Referencing Guided Overtake System (ARGOS) framework that supervises the execution of an overtake or position defense maneuver depending on the current role of the racecar. The ARGOS framework works by decomposing complex overtake and position-defense maneuvers into sequential and temporal submaneuvers that are individually managed and supervised by a network of automatons. We verify the properties of the ARGOS framework using model-checking and demonstrate results from multiple simulations, which show that the framework meets the desired specifications. The ARGOS framework performs similar to what can be observed from real-world human-driven motor sport racing.
We investigate the fundamental limits of the recently proposed random access coverage depth problem for DNA data storage. Under this paradigm, it is assumed that the user information consists of $k$ information strands, which are encoded into $n$ strands via some generator matrix $G$. In the sequencing process, the strands are read uniformly at random, since each strand is available in a large number of copies. In this context, the random access coverage depth problem refers to the expected number of reads (i.e., sequenced strands) until it is possible to decode a specific information strand, which is requested by the user. The goal is to minimize the maximum expectation over all possible requested information strands, and this value is denoted by $T_{\max}(G)$. This paper introduces new techniques to investigate the random access coverage depth problem, which capture its combinatorial nature. We establish two general formulas to find $T_{max}(G)$ for arbitrary matrices. We introduce the concept of recovery balanced codes and combine all these results and notions to compute $T_{\max}(G)$ for MDS, simplex, and Hamming codes. We also study the performance of modified systematic MDS matrices and our results show that the best results for $T_{\max}(G)$ are achieved with a specific mix of encoded strands and replication of the information strands.
Air hockey is a highly reactive game which requires the player to quickly reason over stochastic puck and contact dynamics. We implement a hierarchical framework which combines stochastic optimal control for planning shooting angles and sampling-based model-predictive control for continuously generating constrained mallet trajectories. Our agent was deployed and evaluated in simulation and on a physical setup as part of the Robot Air-Hockey challenge competition at NeurIPS 2023.
Lane detection is to determine the precise location and shape of lanes on the road. Despite efforts made by current methods, it remains a challenging task due to the complexity of real-world scenarios. Existing approaches, whether proposal-based or keypoint-based, suffer from depicting lanes effectively and efficiently. Proposal-based methods detect lanes by distinguishing and regressing a collection of proposals in a streamlined top-down way, yet lack sufficient flexibility in lane representation. Keypoint-based methods, on the other hand, construct lanes flexibly from local descriptors, which typically entail complicated post-processing. In this paper, we present a "Sketch-and-Refine" paradigm that utilizes the merits of both keypoint-based and proposal-based methods. The motivation is that local directions of lanes are semantically simple and clear. At the "Sketch" stage, local directions of keypoints can be easily estimated by fast convolutional layers. Then we can build a set of lane proposals accordingly with moderate accuracy. At the "Refine" stage, we further optimize these proposals via a novel Lane Segment Association Module (LSAM), which allows adaptive lane segment adjustment. Last but not least, we propose multi-level feature integration to enrich lane feature representations more efficiently. Based on the proposed "Sketch and Refine" paradigm, we propose a fast yet effective lane detector dubbed "SRLane". Experiments show that our SRLane can run at a fast speed (i.e., 278 FPS) while yielding an F1 score of 78.9\%. The source code is available at: //github.com/passerer/SRLane.
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in many tasks. One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data. In this survey, we frame DA methods into three categories based on the diversity of augmented data, including paraphrasing, noising, and sampling. Our paper sets out to analyze DA methods in detail according to the above categories. Further, we also introduce their applications in NLP tasks as well as the challenges.
Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.
Autonomous driving is regarded as one of the most promising remedies to shield human beings from severe crashes. To this end, 3D object detection serves as the core basis of such perception system especially for the sake of path planning, motion prediction, collision avoidance, etc. Generally, stereo or monocular images with corresponding 3D point clouds are already standard layout for 3D object detection, out of which point clouds are increasingly prevalent with accurate depth information being provided. Despite existing efforts, 3D object detection on point clouds is still in its infancy due to high sparseness and irregularity of point clouds by nature, misalignment view between camera view and LiDAR bird's eye of view for modality synergies, occlusions and scale variations at long distances, etc. Recently, profound progress has been made in 3D object detection, with a large body of literature being investigated to address this vision task. As such, we present a comprehensive review of the latest progress in this field covering all the main topics including sensors, fundamentals, and the recent state-of-the-art detection methods with their pros and cons. Furthermore, we introduce metrics and provide quantitative comparisons on popular public datasets. The avenues for future work are going to be judiciously identified after an in-deep analysis of the surveyed works. Finally, we conclude this paper.
Generative adversarial networks (GANs) have been extensively studied in the past few years. Arguably their most significant impact has been in the area of computer vision where great advances have been made in challenges such as plausible image generation, image-to-image translation, facial attribute manipulation and similar domains. Despite the significant successes achieved to date, applying GANs to real-world problems still poses significant challenges, three of which we focus on here. These are: (1) the generation of high quality images, (2) diversity of image generation, and (3) stable training. Focusing on the degree to which popular GAN technologies have made progress against these challenges, we provide a detailed review of the state of the art in GAN-related research in the published scientific literature. We further structure this review through a convenient taxonomy we have adopted based on variations in GAN architectures and loss functions. While several reviews for GANs have been presented to date, none have considered the status of this field based on their progress towards addressing practical challenges relevant to computer vision. Accordingly, we review and critically discuss the most popular architecture-variant, and loss-variant GANs, for tackling these challenges. Our objective is to provide an overview as well as a critical analysis of the status of GAN research in terms of relevant progress towards important computer vision application requirements. As we do this we also discuss the most compelling applications in computer vision in which GANs have demonstrated considerable success along with some suggestions for future research directions. Code related to GAN-variants studied in this work is summarized on //github.com/sheqi/GAN_Review.
Distant supervision can effectively label data for relation extraction, but suffers from the noise labeling problem. Recent works mainly perform soft bag-level noise reduction strategies to find the relatively better samples in a sentence bag, which is suboptimal compared with making a hard decision of false positive samples in sentence level. In this paper, we introduce an adversarial learning framework, which we named DSGAN, to learn a sentence-level true-positive generator. Inspired by Generative Adversarial Networks, we regard the positive samples generated by the generator as the negative samples to train the discriminator. The optimal generator is obtained until the discrimination ability of the discriminator has the greatest decline. We adopt the generator to filter distant supervision training dataset and redistribute the false positive instances into the negative set, in which way to provide a cleaned dataset for relation classification. The experimental results show that the proposed strategy significantly improves the performance of distant supervision relation extraction comparing to state-of-the-art systems.