清纯唯美另类亚洲欧美综合,亚洲精品无码中出中文字幕,日本三级带黄在线观看欧美,国产日韩高清中文无码AV,国产一级麻豆免费观看

Lane detection is a vital task for vehicles to navigate and localize their position on the road. To ensure reliable driving, lane detection models must have robust generalization performance in various road environments. However, despite the advanced performance in the trained domain, their generalization performance still falls short of expectations due to the domain discrepancy. To bridge this gap, we propose a novel generative framework using HD Maps for Single-Source Domain Generalization (SSDG) in lane detection. We first generate numerous front-view images from lane markings of HD Maps. Next, we strategically select a core subset among the generated images using (i) lane structure and (ii) road surrounding criteria to maximize their diversity. In the end, utilizing this core set, we train lane detection models to boost their generalization performance. We validate that our generative framework from HD Maps outperforms the Domain Adaptation model MLDA with +3.01%p accuracy improvement, even though we do not access the target domain images.

相關內容

泛化理論

關注 25

MoDELS · Feel · tuning · GPT-4 · 可辨認的 ·

2024 年 7 月 12 日

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Youliang Yuan,Wenxiang Jiao,Wenxuan Wang,Jen-tse Huang,Jiahao Xu,Tian Liang,Pinjia He,Zhaopeng Tu

This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position, significantly enhancing their safety capabilities. DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation (MLE) with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful response sequence. Our empirical evaluation, conducted using LLaMA3 and Mistral model families across six attack scenarios, demonstrates that our method not only improves model safety without compromising performance but also surpasses well-known models such as GPT-4 in defending against attacks. Importantly, our approach successfully defends recent advanced attack methods (e.g., CodeAttack) that have jailbroken GPT-4 and LLaMA3-70B-Instruct. Our code and data can be found at //github.com/RobustNLP/DeRTa.

Attention · 3D · 變換 · MoDELS · Extensibility ·

2024 年 7 月 12 日

ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers

Jinke Li,Xiao He,Chonghua Zhou,Xiaoqiang Cheng,Yang Wen,Dan Zhang

3D occupancy, an advanced perception technology for driving scenarios, represents the entire scene without distinguishing between foreground and background by quantifying the physical space into a grid map. The widely adopted projection-first deformable attention, efficient in transforming image features into 3D representations, encounters challenges in aggregating multi-view features due to sensor deployment constraints. To address this issue, we propose our learning-first view attention mechanism for effective multi-view feature aggregation. Moreover, we showcase the scalability of our view attention across diverse multi-view 3D tasks, including map construction and 3D object detection. Leveraging the proposed view attention as well as an additional multi-frame streaming temporal attention, we introduce ViewFormer, a vision-centric transformer-based framework for spatiotemporal feature aggregation. To further explore occupancy-level flow representation, we present FlowOcc3D, a benchmark built on top of existing high-quality datasets. Qualitative and quantitative analyses on this benchmark reveal the potential to represent fine-grained dynamic scenes. Extensive experiments show that our approach significantly outperforms prior state-of-the-art methods. The codes are available at \url{//github.com/ViewFormerOcc/ViewFormer-Occ}.

查全率/召回率 · HTTPS · contrastive · 查準率/準確率 · 代價 ·

2024 年 7 月 12 日

FD-SOS: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images

Marawan Elbatel,Keyuan Liu,Yanqi Yang,Xiaomeng Li

from arxiv, MICCAI 2024

Accurate detection of bone fenestration and dehiscence (FD) is crucial for effective treatment planning in dentistry. While cone-beam computed tomography (CBCT) is the gold standard for evaluating FD, it comes with limitations such as radiation exposure, limited accessibility, and higher cost compared to intraoral images. In intraoral images, dentists face challenges in the differential diagnosis of FD. This paper presents a novel and clinically significant application of FD detection solely from intraoral images. To achieve this, we propose FD-SOS, a novel open-set object detector for FD detection from intraoral images. FD-SOS has two novel components: conditional contrastive denoising (CCDN) and teeth-specific matching assignment (TMA). These modules enable FD-SOS to effectively leverage external dental semantics. Experimental results showed that our method outperformed existing detection methods and surpassed dental professionals by 35% recall under the same level of precision. Code is available at: //github.com/xmed-lab/FD-SOS.

Performer · contrastive · 模態 · 數據集 · 對比學習 ·

2024 年 7 月 12 日

Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework

Haoqin Sun,Shiwan Zhao,Shaokai Li,Xiangyu Kong,Xuechen Wang,Aobo Kong,Jiaming Zhou,Yong Chen,Wenjia Zeng,Yong Qin

Multimodal emotion recognition systems rely heavily on the full availability of modalities, suffering significant performance declines when modal data is incomplete. To tackle this issue, we present the Cross-Modal Alignment, Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that sequentially engages in cross-modal alignment, reconstruction, and refinement phases to handle missing modalities and enhance emotion recognition. This framework utilizes unsupervised distribution-based contrastive learning to align heterogeneous modal distributions, reducing discrepancies and modeling semantic uncertainty effectively. The reconstruction phase applies normalizing flow models to transform these aligned distributions and recover missing modalities. The refinement phase employs supervised point-based contrastive learning to disrupt semantic correlations and accentuate emotional traits, thereby enriching the affective content of the reconstructed representations. Extensive experiments on the IEMOCAP and MSP-IMPROV datasets confirm the superior performance of CM-ARR under conditions of both missing and complete modalities. Notably, averaged across six scenarios of missing modalities, CM-ARR achieves absolute improvements of 2.11% in WAR and 2.12% in UAR on the IEMOCAP dataset, and 1.71% and 1.96% in WAR and UAR, respectively, on the MSP-IMPROV dataset.

Mamba · 估計/估計量 · MoDELS · Networking · MINE ·

2024 年 7 月 11 日

ST-Mamba: Spatial-Temporal Mamba for Traffic Flow Estimation Recovery using Limited Data

Doncheng Yuan,Jianzhe Xue,Jinshan Su,Wenchao Xu,Haibo Zhou

from arxiv, Accepted by 2024 IEEE/CIC International Conference on Communications in China (ICCC)

Traffic flow estimation (TFE) is crucial for urban intelligent traffic systems. While traditional on-road detectors are hindered by limited coverage and high costs, cloud computing and data mining of vehicular network data, such as driving speeds and GPS coordinates, present a promising and cost-effective alternative. Furthermore, minimizing data collection can significantly reduce overhead. However, limited data can lead to inaccuracies and instability in TFE. To address this, we introduce the spatial-temporal Mamba (ST-Mamba), a deep learning model combining a convolutional neural network (CNN) with a Mamba framework. ST-Mamba is designed to enhance TFE accuracy and stability by effectively capturing the spatial-temporal patterns within traffic flow. Our model aims to achieve results comparable to those from extensive data sets while only utilizing minimal data. Simulations using real-world datasets have validated our model's ability to deliver precise and stable TFE across an urban landscape based on limited data, establishing a cost-efficient solution for TFE.

值域 · 傳感器 · 論文 · CASES · SimPLe ·

2024 年 7 月 10 日

Viability of Low-Cost Infrared Sensors for Short Range Tracking

Noah Haeske

from arxiv, For program, see //github.com/noah-haeske/research/blob/main/experimentProgram.py For sensor datasheet, see //www.st.com/en/imaging-and-photonics-solutions/vl53l7cx.html#overview

A classic task in robotics is tracking a target in the external environment. There are several well-documented approaches to this problem. This paper presents a novel approach to this problem using infrared time of flight sensors. The use of infrared time of flight sensors is not common as a tracking approach, typically used for simple motion detectors. However, with the approach highlighted in this paper they can be used to accurately track the position of a moving subject. Traditional approaches to the tracking problem often include cameras, or ultrasonic sensors. These approaches can be expensive and overcompensating in some use cases. The method focused on in this paper can be superior in terms of cost and simplicity.

MoDELS · Performer · SimPLe · 講稿 · Automator ·

2024 年 7 月 10 日

The Hybrid Extended Bicycle: A Simple Model for High Dynamic Vehicle Trajectory Planning

Agapius Bou Ghosn,Philip Polack,Arnaud de La Fortelle

While highly automated driving relies most of the time on a smooth driving assumption, the possibility of a vehicle performing harsh maneuvers with high dynamic driving to face unexpected events is very likely. The modeling of the behavior of the vehicle in these events is crucial to proper planning and controlling; the used model should present accurate and computationally efficient properties to ensure consistency with the dynamics of the vehicle and to be employed in real-time systems. In this article, we propose an LSTM-based hybrid extended bicycle model able to present an accurate description of the state of the vehicle for both normal and aggressive situations. The introduced model is used in a Model Predictive Path Integral (MPPI) plan and control framework for performing trajectories in high-dynamic scenarios. The proposed model and framework prove their ability to plan feasible trajectories ensuring an accurate vehicle behavior even at the limits of handling.

MoDELS · 語言模型化 · Prompt · 大語言模型 · Performer ·

2024 年 7 月 10 日

Review-LLM: Harnessing Large Language Models for Personalized Review Generation

Qiyao Peng,Hongtao Liu,Hongyan Xu,Qing Yang,Minglai Shao,Wenjun Wang

Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, Large Language Models (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' phenomenon of the LLMs and could not generate personalized reviews (e.g., negative reviews). In this paper, we propose Review-LLM that customizes LLMs for personalized review generation. Firstly, we construct the prompt input by aggregating user historical behaviors, which include corresponding item titles and reviews. This enables the LLMs to capture user interest features and review writing style. Secondly, we incorporate ratings as indicators of satisfaction into the prompt, which could further improve the model's understanding of user preferences and the sentiment tendency control of generated reviews. Finally, we feed the prompt text into LLMs, and use Supervised Fine-Tuning (SFT) to make the model generate personalized reviews for the given user and target item. Experimental results on the real-world dataset show that our fine-tuned model could achieve better review generation performance than existing close-source LLMs.

INTERACT · VR · AVS · 系統設計 · 回合 ·

2024 年 7 月 9 日

Evaluating a VR System for Collecting Safety-Critical Vehicle-Pedestrian Interactions

Erica Weng,Kenta Mukoya,Deva Ramanan,Kris Kitani

from arxiv, Spotlight paper in the Data Generation for Robotics Workshop at RSS 2024

Autonomous vehicles (AVs) require comprehensive and reliable pedestrian trajectory data to ensure safe operation. However, obtaining data of safety-critical scenarios such as jaywalking and near-collisions, or uncommon agents such as children, disabled pedestrians, and vulnerable road users poses logistical and ethical challenges. This paper evaluates a Virtual Reality (VR) system designed to collect pedestrian trajectory and body pose data in a controlled, low-risk environment. We substantiate the usefulness of such a system through semi-structured interviews with professionals in the AV field, and validate the effectiveness of the system through two empirical studies: a first-person user evaluation involving 62 participants, and a third-person evaluative survey involving 290 respondents. Our findings demonstrate that the VR-based data collection system elicits realistic responses for capturing pedestrian data in safety-critical or uncommon vehicle-pedestrian interaction scenarios.

Performer · 判別器 · 正例 · 假陽性 · 監督 ·

2018 年 5 月 24 日

DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction

Pengda Qin,Weiran Xu,William Yang Wang

Distant supervision can effectively label data for relation extraction, but suffers from the noise labeling problem. Recent works mainly perform soft bag-level noise reduction strategies to find the relatively better samples in a sentence bag, which is suboptimal compared with making a hard decision of false positive samples in sentence level. In this paper, we introduce an adversarial learning framework, which we named DSGAN, to learn a sentence-level true-positive generator. Inspired by Generative Adversarial Networks, we regard the positive samples generated by the generator as the negative samples to train the discriminator. The optimal generator is obtained until the discrimination ability of the discriminator has the greatest decline. We adopt the generator to filter distant supervision training dataset and redistribute the false positive instances into the negative set, in which way to provide a cleaned dataset for relation classification. The experimental results show that the proposed strategy significantly improves the performance of distant supervision relation extraction comparing to state-of-the-art systems.