国产亚洲欧美日韩精品色狠二区-亚洲日韩网站在线观看

Roadside perception can greatly increase the safety of autonomous vehicles by extending their perception ability beyond the visual range and addressing blind spots. However, current state-of-the-art vision-based roadside detection methods possess high accuracy on labeled scenes but have inferior performance on new scenes. This is because roadside cameras remain stationary after installation and can only collect data from a single scene, resulting in the algorithm overfitting these roadside backgrounds and camera poses. To address this issue, in this paper, we propose an innovative Scenario Generalization Framework for Vision-based Roadside 3D Object Detection, dubbed SGV3D. Specifically, we employ a Background-suppressed Module (BSM) to mitigate background overfitting in vision-centric pipelines by attenuating background features during the 2D to bird's-eye-view projection. Furthermore, by introducing the Semi-supervised Data Generation Pipeline (SSDG) using unlabeled images from new scenes, diverse instance foregrounds with varying camera poses are generated, addressing the risk of overfitting specific camera poses. We evaluate our method on two large-scale roadside benchmarks. Our method surpasses all previous methods by a significant margin in new scenes, including +42.57% for vehicle, +5.87% for pedestrian, and +14.89% for cyclist compared to BEVHeight on the DAIR-V2X-I heterologous benchmark. On the larger-scale Rope3D heterologous benchmark, we achieve notable gains of 14.48% for car and 12.41% for large vehicle. We aspire to contribute insights on the exploration of roadside perception techniques, emphasizing their capability for scenario generalization. The code will be available at {\url{ //github.com/yanglei18/SGV3D}}

相關內容

泛化理論

關注 25

LIDAR · 估計/估計量 · MoDELS · Integration · 近似誤差 ·

2024 年 3 月 12 日

LiDAR Point Cloud-based Multiple Vehicle Tracking with Probabilistic Measurement-Region Association

Guanhua Ding,Jianan Liu,Yuxuan Xia,Tao Huang,Bing Zhu,Jinping Sun

Multiple extended target tracking (ETT) has gained increasing attention due to the development of high-precision LiDAR and radar sensors in automotive applications. For LiDAR point cloud-based vehicle tracking, this paper presents a probabilistic measurement-region association (PMRA) ETT model, which can describe the complex measurement distribution by partitioning the target extent into different regions. The PMRA model overcomes the drawbacks of previous data-region association (DRA) models by eliminating the approximation error of constrained estimation and using continuous integrals to more reliably calculate the association probabilities. Furthermore, the PMRA model is integrated with the Poisson multi-Bernoulli mixture (PMBM) filter for tracking multiple vehicles. Simulation results illustrate the superior estimation accuracy of the proposed PMRA-PMBM filter in terms of both positions and extents of the vehicles comparing with PMBM filters using the gamma Gaussian inverse Wishart and DRA implementations.

ASSETS · 大語言模型 · INTERACT · Extensibility · 估計/估計量 ·

2024 年 3 月 11 日

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

Yuxi Wei,Zi Wang,Yifan Lu,Chenxin Xu,Changxing Liu,Hao Zhao,Siheng Chen,Yanfeng Wang

Scene simulation in autonomous driving has gained significant attention because of its huge potential for generating customized data. However, existing editable scene simulation approaches face limitations in terms of user interaction efficiency, multi-camera photo-realistic rendering and external digital assets integration. To address these challenges, this paper introduces ChatSim, the first system that enables editable photo-realistic 3D driving scene simulations via natural language commands with external digital assets. To enable editing with high command flexibility,~ChatSim leverages a large language model (LLM) agent collaboration framework. To generate photo-realistic outcomes, ChatSim employs a novel multi-camera neural radiance field method. Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering. Our experiments on Waymo Open Dataset demonstrate that ChatSim can handle complex language commands and generate corresponding photo-realistic scene videos.

設計 · 可辨認的 · 講稿 · AVS · prototype ·

2024 年 3 月 11 日

Designing for Projection-based Communication between Autonomous Vehicles and Pedestrians

Trung Thanh Nguyen,Kai Hollander,Marius Hoggenmueller,Callum Parker,Martin Tomitsch

Recent studies have investigated new approaches for communicating an autonomous vehicle's (AV) intent and awareness to pedestrians. This paper adds to this body of work by presenting the design and evaluation of in-situ projections on the road. Our design combines common traffic light patterns with aesthetic visual elements. We describe the iterative design process and the prototyping methods used in each stage. The final design concept was represented as a virtual reality simulation and evaluated with 18 participants in four different street crossing scenarios, which included three scenarios that simulated various degrees of system errors. We found that different design elements were able to support participants' confidence in their decision even when the AV failed to correctly detect their presence. We also identified elements in our design that needed to be more clearly communicated. Based on these findings, the paper presents a series of design recommendations for projection-based communication between AVs and pedestrians.

圖像字幕 · 目標檢測 · Performer · MoDELS · Learning ·

2024 年 3 月 10 日

Transformer based Multitask Learning for Image Captioning and Object Detection

Debolena Basak,P. K. Srijith,Maunendra Sankar Desarkar

from arxiv, Accepted at PAKDD 2024

In several real-world scenarios like autonomous navigation and mobility, to obtain a better visual understanding of the surroundings, image captioning and object detection play a crucial role. This work introduces a novel multitask learning framework that combines image captioning and object detection into a joint model. We propose TICOD, Transformer-based Image Captioning and Object detection model for jointly training both tasks by combining the losses obtained from image captioning and object detection networks. By leveraging joint training, the model benefits from the complementary information shared between the two tasks, leading to improved performance for image captioning. Our approach utilizes a transformer-based architecture that enables end-to-end network integration for image captioning and object detection and performs both tasks jointly. We evaluate the effectiveness of our approach through comprehensive experiments on the MS-COCO dataset. Our model outperforms the baselines from image captioning literature by achieving a 3.65% improvement in BERTScore.

MoDELS · 動力系統 · Learning · Transformer模型 · 變換 ·

2024 年 3 月 10 日

Pedestrian Trajectory Prediction Using Dynamics-based Deep Learning

Honghui Wang,Weiming Zhi,Gustavo Batista,Rohitash Chandra

from arxiv, 8 pages (including references), 7 figures, accepted by ICRA2024

Pedestrian trajectory prediction plays an important role in autonomous driving systems and robotics. Recent work utilizing prominent deep learning models for pedestrian motion prediction makes limited a priori assumptions about human movements, resulting in a lack of explainability and explicit constraints enforced on predicted trajectories. We present a dynamics-based deep learning framework with a novel asymptotically stable dynamical system integrated into a Transformer-based model. We use an asymptotically stable dynamical system to model human goal-targeted motion by enforcing the human walking trajectory, which converges to a predicted goal position, and to provide the Transformer model with prior knowledge and explainability. Our framework features the Transformer model that works with a goal estimator and dynamical system to learn features from pedestrian motion history. The results show that our framework outperforms prominent models using five benchmark human motion datasets.

回合 · 推斷 · 情景 · INTERACT · 估計/估計量 ·

2024 年 3 月 9 日

Scene Informer: Anchor-based Occlusion Inference and Trajectory Prediction in Partially Observable Environments

Bernard Lange,Jiachen Li,Mykel J. Kochenderfer

from arxiv, Accepted to 2024 IEEE International Conference on Robotics and Automation (ICRA)

Navigating complex and dynamic environments requires autonomous vehicles (AVs) to reason about both visible and occluded regions. This involves predicting the future motion of observed agents, inferring occluded ones, and modeling their interactions based on vectorized scene representations of the partially observable environment. However, prior work on occlusion inference and trajectory prediction have developed in isolation, with the former based on simplified rasterized methods and the latter assuming full environment observability. We introduce the Scene Informer, a unified approach for predicting both observed agent trajectories and inferring occlusions in a partially observable setting. It uses a transformer to aggregate various input modalities and facilitate selective queries on occlusions that might intersect with the AV's planned path. The framework estimates occupancy probabilities and likely trajectories for occlusions, as well as forecast motion for observed agents. We explore common observability assumptions in both domains and their performance impact. Our approach outperforms existing methods in both occupancy prediction and trajectory prediction in partially observable setting on the Waymo Open Motion Dataset.

AVS · 傳感器 · INFORMS · 泛函 · Vision ·

2024 年 3 月 8 日

A Detection and Filtering Framework for Collaborative Localization

Thirumalaesh Ashokkumar,Katherine A Skinner,Siddarth Agarwal,Ankit Vora,Ashutosh Bhown

Increasingly, autonomous vehicles (AVs) are becoming a reality, such as the Advanced Driver Assistance Systems (ADAS) in vehicles that assist drivers in driving and parking functions with vehicles today. The localization problem for AVs relies primarily on multiple sensors, including cameras, LiDARs, and radars. Manufacturing, installing, calibrating, and maintaining these sensors can be very expensive, thereby increasing the overall cost of AVs. This research explores the means to improve localization on vehicles belonging to the ADAS category in a platooning context, where an ADAS vehicle follows a lead "Smart" AV equipped with a highly accurate sensor suite. We propose and produce results by using a filtering framework to combine pose information derived from vision and odometry to improve the localization of the ADAS vehicle that follows the smart vehicle.

INTERACT · Performer · 可約的 · Pivotal（公司） · 似然 ·

2024 年 3 月 7 日

LitSim: Conflict-aware Policy for Long-term Interactive Traffic Simulation

Haojie Xin,Xiaodong Zhang,Renzhi Tang,Songyang Yan,Qianrui Zhao,Chunze Yang,Zijiang Yang

Simulation is pivotal in evaluating the performance of autonomous driving systems due to the advantages in efficiency and cost compared to on-road testing. Realistic multi-agent behavior~(e.g., interactive and long-term) is needed to narrow the gap between the simulation and the reality. The existing work has the following shortcomings in achieving this goal:~(1) log replay offers realistic scenarios but leads to unrealistic collisions due to lacking dynamic interactions, and~(2) model-based and learning-based solutions encourage interactions but often deviate from real-world data in long horizons. In this work, we propose LitSim, a long-term interactive simulation approach that maximizes realism while avoiding unrealistic collisions. Specifically, we replay the log for most scenarios and intervene only when LitSim predicts unrealistic conflicts. We then encourage interactions among the agents and resolve the conflicts, thereby reducing the likelihood of unrealistic collisions. We train and validate our model on the real-world dataset NGSIM, and the experimental results demonstrate that LitSim outperforms the current popular approaches in realism and reactivity.

控制器 · 回合 · Learning · 約束 · Performer ·

2024 年 3 月 7 日

Incremental Bayesian Learning for Fail-Operational Control in Autonomous Driving

Lei Zheng,Rui Yang,Zengqi Peng,Wei Yan,Michael Yu Wang,Jun Ma

from arxiv, 8 pages, 8 figures, accepted for publication in the 22nd European Control Conference (ECC 2024)

Abrupt maneuvers by surrounding vehicles (SVs) can typically lead to safety concerns and affect the task efficiency of the ego vehicle (EV), especially with model uncertainties stemming from environmental disturbances. This paper presents a real-time fail-operational controller that ensures the asymptotic convergence of an uncertain EV to a safe state, while preserving task efficiency in dynamic environments. An incremental Bayesian learning approach is developed to facilitate online learning and inference of changing environmental disturbances. Leveraging disturbance quantification and constraint transformation, we develop a stochastic fail-operational barrier based on the control barrier function (CBF). With this development, the uncertain EV is able to converge asymptotically from an unsafe state to a defined safe state with probabilistic stability. Subsequently, the stochastic fail-operational barrier is integrated into an efficient fail-operational controller based on quadratic programming (QP). This controller is tailored for the EV operating under control constraints in the presence of environmental disturbances, with both safety and efficiency objectives taken into consideration. We validate the proposed framework in connected cruise control (CCC) tasks, where SVs perform aggressive driving maneuvers. The simulation results demonstrate that our method empowers the EV to swiftly return to a safe state while upholding task efficiency in real time, even under time-varying environmental disturbances.

置換 · 振蕩 · Performer · 統計量 · 優化器 ·

2024 年 3 月 6 日

Forecasting and Mitigating Disruptions in Public Bus Transit Services

Chaeeun Han,Jose Paolo Talusan,Dan Freudberg,Ayan Mukhopadhyay,Abhishek Dubey,Aron Laszka

Public transportation systems often suffer from unexpected fluctuations in demand and disruptions, such as mechanical failures and medical emergencies. These fluctuations and disruptions lead to delays and overcrowding, which are detrimental to the passengers' experience and to the overall performance of the transit service. To proactively mitigate such events, many transit agencies station substitute (reserve) vehicles throughout their service areas, which they can dispatch to augment or replace vehicles on routes that suffer overcrowding or disruption. However, determining the optimal locations where substitute vehicles should be stationed is a challenging problem due to the inherent randomness of disruptions and due to the combinatorial nature of selecting locations across a city. In collaboration with the transit agency of Nashville, TN, we address this problem by introducing data-driven statistical and machine-learning models for forecasting disruptions and an effective randomized local-search algorithm for selecting locations where substitute vehicles are to be stationed. Our research demonstrates promising results in proactive disruption management, offering a practical and easily implementable solution for transit agencies to enhance the reliability of their services. Our results resonate beyond mere operational efficiency: by advancing proactive strategies, our approach fosters more resilient and accessible public transportation, contributing to equitable urban mobility and ultimately benefiting the communities that rely on public transportation the most.