Recently significant progress has been made in vehicle prediction and planning algorithms for autonomous driving. However, it remains quite challenging for an autonomous vehicle to plan its trajectory in complex scenarios when it is difficult to accurately predict its surrounding vehicles' behaviors and trajectories. In this work, to maximize performance while ensuring safety, we propose a novel speculative planning framework based on a prediction-planning interface that quantifies both the behavior-level and trajectory-level uncertainties of surrounding vehicles. Our framework leverages recent prediction algorithms that can provide one or more possible behaviors and trajectories of the surrounding vehicles with probability estimation. It adapts those predictions based on the latest system states and traffic environment, and conducts planning to maximize the expected reward of the ego vehicle by considering the probabilistic predictions of all scenarios and ensure system safety by ruling out actions that may be unsafe in worst case. We demonstrate the effectiveness of our approach in improving system performance and ensuring system safety over other baseline methods, via extensive simulations in SUMO on a challenging multi-lane highway lane-changing case study.
Creating diverse and high-quality 3D assets with an automatic generative model is highly desirable. Despite extensive efforts on 3D generation, most existing works focus on the generation of a single category or a few categories. In this paper, we introduce a diffusion-based feed-forward framework for synthesizing massive categories of real-world 3D objects with a single generative model. Notably, there are three major challenges for this large-vocabulary 3D generation: a) the need for expressive yet efficient 3D representation; b) large diversity in geometry and texture across categories; c) complexity in the appearances of real-world objects. To this end, we propose a novel triplane-based 3D-aware Diffusion model with TransFormer, DiffTF, for handling challenges via three aspects. 1) Considering efficiency and robustness, we adopt a revised triplane representation and improve the fitting speed and accuracy. 2) To handle the drastic variations in geometry and texture, we regard the features of all 3D objects as a combination of generalized 3D knowledge and specialized 3D features. To extract generalized 3D knowledge from diverse categories, we propose a novel 3D-aware transformer with shared cross-plane attention. It learns the cross-plane relations across different planes and aggregates the generalized 3D knowledge with specialized 3D features. 3) In addition, we devise the 3D-aware encoder/decoder to enhance the generalized 3D knowledge in the encoded triplanes for handling categories with complex appearances. Extensive experiments on ShapeNet and OmniObject3D (over 200 diverse real-world categories) convincingly demonstrate that a single DiffTF model achieves state-of-the-art large-vocabulary 3D object generation performance with large diversity, rich semantics, and high quality.
Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. For instance, they may have parametric knowledge about the ordering of countries by size or may be able to rank reviews by sentiment. Recent work focuses on pairwise, pointwise, and listwise prompting techniques to elicit a language model's ranking knowledge. However, we find that even with careful calibration and constrained decoding, prompting-based techniques may not always be self-consistent in the rankings they produce. This motivates us to explore an alternative approach that is inspired by an unsupervised probing method called Contrast-Consistent Search (CCS). The idea is to train a probing model guided by a logical constraint: a model's representation of a statement and its negation must be mapped to contrastive true-false poles consistently across multiple statements. We hypothesize that similar constraints apply to ranking tasks where all items are related via consistent pairwise or listwise comparisons. To this end, we extend the binary CCS method to Contrast-Consistent Ranking (CCR) by adapting existing ranking methods such as the Max-Margin Loss, Triplet Loss, and Ordinal Regression objective. Our results confirm that, for the same language model, CCR probing outperforms prompting and even performs on a par with prompting much larger language models.
With recent advances in Generative AI, it is becoming easier to automatically manipulate 3D models. However, current methods tend to apply edits to models globally, which risks compromising the intended functionality of the 3D model when fabricated in the physical world. For example, modifying functional segments in 3D models, such as the base of a vase, could break the original functionality of the model, thus causing the vase to fall over. We introduce a method for automatically segmenting 3D models into functional and aesthetic elements. This method allows users to selectively modify aesthetic segments of 3D models, without affecting the functional segments. To develop this method we first create a taxonomy of functionality in 3D models by qualitatively analyzing 1000 models sourced from a popular 3D printing repository, Thingiverse. With this taxonomy, we develop a semi-automatic classification method to decompose 3D models into functional and aesthetic elements. We propose a system called Style2Fab that allows users to selectively stylize 3D models without compromising their functionality. We evaluate the effectiveness of our classification method compared to human-annotated data, and demonstrate the utility of Style2Fab with a user study to show that functionality-aware segmentation helps preserve model functionality.
Reasoning presents a significant and challenging issue for Large Language Models (LLMs). The predominant focus of research has revolved around developing diverse prompting strategies to guide and structure the reasoning processes of LLMs. However, these approaches based on decoder-only causal language models often operate the input question in a single forward pass, potentially missing the rich, back-and-forth interactions inherent in human reasoning. Scant attention has been paid to a critical dimension, i.e., the input question itself embedded within the prompts. In response, we introduce a deceptively simple yet highly effective prompting strategy, termed question "re-reading". Drawing inspiration from human learning and problem-solving, re-reading entails revisiting the question information embedded within input prompts. This approach aligns seamlessly with the cognitive principle of reinforcement, enabling LLMs to extract deeper insights, identify intricate patterns, establish more nuanced connections, and ultimately enhance their reasoning capabilities across various tasks. Experiments conducted on a series of reasoning benchmarks serve to underscore the effectiveness and generality of our method. Moreover, our findings demonstrate that our approach seamlessly integrates with various language models, though-eliciting prompting methods, and ensemble techniques, further underscoring its versatility and compatibility in the realm of LLMs.
Magnetic Particle Imaging (MPI) is a promising noninvasive in vivo imaging modality that makes it possible to map the spatial distribution of superparamagnetic nanoparticles by exposing them to dynamic magnetic fields. In the Field-Free Line (FFL) scanner topology, the spatial encoding of the particle distribution is performed by applying magnetic fields vanishing on straight lines. The voltage induced in the receiving coils by the particles when exposed to the magnetic fields constitute the signal from which the particle distribution is to be reconstructed. To avoid lengthy calibration, model-based reconstruction formulae have been developed for the 2D FFL scanning topology. In this work we develop reconstruction formulae for 3D FFL. Moreover, we provide a model-based reconstruction algorithm for 3D FFL and we validate it with a numerical experiment.
One way of ensuring operator's safety during human-robot collaboration is through Speed and Separation Monitoring (SSM), as defined in ISO standard ISO/TS 15066. In general, it is impossible to avoid all human-robot collisions: consider for instance the case when the robot does not move at all, a human operator can still collide with it by hitting it of her own voluntary motion. In the SSM framework, it is possible however to minimize harm by requiring this: \emph{if} a collision ever occurs, then the robot must be in a \emph{stationary state} (all links have zero velocity) at the time instant of the collision. In this paper, we propose a time-optimal control policy based on Time-Optimal Path Parameterization (TOPP) to guarantee such a behavior. Specifically, we show that: for any robot motion that is strictly faster than the motion recommended by our policy, there exists a human motion that results in a collision with the robot in a non-stationary state. Correlatively, we show, in simulation, that our policy is strictly less conservative than state-of-the-art safe robot control methods. Additionally, we propose a parallelization method to reduce the computation time of our pre-computation phase (down to 0.5 sec, practically), which enables the whole pipeline (including the pre-computation) to be executed at runtime, nearly in real-time. Finally, we demonstrate the application of our method in a scenario: time-optimal, safe control of a 6-dof industrial robot.
Signalized intersections in arterial roads result in persistent vehicle idling and excess accelerations, contributing to fuel consumption and CO2 emissions. There has thus been a line of work studying eco-driving control strategies to reduce fuel consumption and emission levels at intersections. However, methods to devise effective control strategies across a variety of traffic settings remain elusive. In this paper, we propose a reinforcement learning (RL) approach to learn effective eco-driving control strategies. We analyze the potential impact of a learned strategy on fuel consumption, CO2 emission, and travel time and compare with naturalistic driving and model-based baselines. We further demonstrate the generalizability of the learned policies under mixed traffic scenarios. Simulation results indicate that scenarios with 100% penetration of connected autonomous vehicles (CAV) may yield as high as 18% reduction in fuel consumption and 25% reduction in CO2 emission levels while even improving travel speed by 20%. Furthermore, results indicate that even 25% CAV penetration can bring at least 50% of the total fuel and emission reduction benefits.
The development of unmanned aerial vehicles (UAVs) has been gaining momentum in recent years owing to technological advances and a significant reduction in their cost. UAV technology can be used in a wide range of domains, including communication, agriculture, security, and transportation. It may be useful to group the UAVs into clusters/flocks in certain domains, and various challenges associated with UAV usage can be alleviated by clustering. Several computational challenges arise in UAV flock management, which can be solved by using machine learning (ML) methods. In this survey, we describe the basic terms relating to UAVS and modern ML methods, and we provide an overview of related tutorials and surveys. We subsequently consider the different challenges that appear in UAV flocks. For each issue, we survey several machine learning-based methods that have been suggested in the literature to handle the associated challenges. Thereafter, we describe various open issues in which ML can be applied to solve the different challenges of flocks, and we suggest means of using ML methods for this purpose. This comprehensive review may be useful for both researchers and developers in providing a wide view of various aspects of state-of-the-art ML technologies that are applicable to flock management.
Graph Convolutional Network (GCN) has been widely applied in transportation demand prediction due to its excellent ability to capture non-Euclidean spatial dependence among station-level or regional transportation demands. However, in most of the existing research, the graph convolution was implemented on a heuristically generated adjacency matrix, which could neither reflect the real spatial relationships of stations accurately, nor capture the multi-level spatial dependence of demands adaptively. To cope with the above problems, this paper provides a novel graph convolutional network for transportation demand prediction. Firstly, a novel graph convolution architecture is proposed, which has different adjacency matrices in different layers and all the adjacency matrices are self-learned during the training process. Secondly, a layer-wise coupling mechanism is provided, which associates the upper-level adjacency matrix with the lower-level one. It also reduces the scale of parameters in our model. Lastly, a unitary network is constructed to give the final prediction result by integrating the hidden spatial states with gated recurrent unit, which could capture the multi-level spatial dependence and temporal dynamics simultaneously. Experiments have been conducted on two real-world datasets, NYC Citi Bike and NYC Taxi, and the results demonstrate the superiority of our model over the state-of-the-art ones.
We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.