午夜剧场成年免费视_欧美亚一区二区三区不卡视频_国产拍拍黄色视频_欧美精品日韩一区二区三区_欧美日韩国产精品一区在线播放_黄片免费在线国产_精品久久久久亚洲国产电影

Autonomous driving (AD) perception today relies heavily on deep learning based architectures requiring large scale annotated datasets with their associated costs for curation and annotation. The 3D semantic data are useful for core perception tasks such as obstacle detection and ego-vehicle localization. We propose a new dataset, Navya 3D Segmentation (Navya3DSeg), with a diverse label space corresponding to a large scale production grade operational domain, including rural, urban, industrial sites and universities from 13 countries. It contains 23 labeled sequences and 25 supplementary sequences without labels, designed to explore self-supervised and semi-supervised semantic segmentation benchmarks on point clouds. We also propose a novel method for sequential dataset split generation based on iterative multi-label stratification, and demonstrated to achieve a +1.2% mIoU improvement over the original split proposed by SemanticKITTI dataset. A complete benchmark for semantic segmentation task was performed, with state of the art methods. Finally, we demonstrate an active learning (AL) based dataset distillation framework. We introduce a novel heuristic-free sampling method called distance sampling in the context of AL. A detailed presentation on the dataset is available at //www.youtube.com/watch?v=5m6ALIs-s20 .

相關內容

數(shu)據(ju)集(ji)

關注 88

數據集，又稱為資料集、數據集合或資料集合，是一種由數據所組成的集合。
Data set（或dataset）是一個數據的集合，通常以表格形式出現。每一列代表一個特定變量。每一行都對應于某一成員的數據集的問題。它列出的價值觀為每一個變量，如身高和體重的一個物體或價值的隨機數。每個數值被稱為數據資料。對應于行數，該數據集的數據可能包括一個或多個成員。

邊界框 · 分割 · 敏感性 · 搜索 · 初始化 ·

2023 年 4 月 11 日

Split, Merge, and Refine: Fitting Tight Bounding Boxes via Learned Over-Segmentation and Iterative Search

Chanhyeok Park,Minhyuk Sung

We present a novel framework for finding a set of tight bounding boxes of a 3D shape via neural-network-based over-segmentation and iterative merging and refinement. Achieving tight bounding boxes of a shape while guaranteeing the complete boundness is an essential task for efficient geometric operations and unsupervised semantic part detection, but previous methods fail to achieve both full coverage and tightness. Neural-network-based methods are not suitable for these goals due to the non-differentiability of the objective, and also classic iterative search methods suffer from their sensitivity to the initialization. We demonstrate that the best integration of the learning-based and iterative search methods can achieve the bounding boxes with both properties. We employ an existing unsupervised segmentation network to split the shape and obtain over-segmentation. Then, we apply hierarchical merging with our novel tightness-aware merging and stopping criteria. To overcome the sensitivity to the initialization, we also refine the bounding box parameters in a game setup with a soft reward function promoting a wider exploration. Lastly, we further improve the bounding boxes with a MCTS-based multi-action space exploration. Our experimental results demonstrate the full coverage, tightness, and the adequate number of bounding boxes of our method.

檢測器 · MS · 無監督域適應 · 域適應 · 三維物體檢測 ·

2023 年 4 月 10 日

MS3D: Leveraging Multiple Detectors for Unsupervised Domain Adaptation in 3D Object Detection

Darren Tsai,Julie Stephany Berrio,Mao Shan,Eduardo Nebot,Stewart Worrall

from arxiv, Our code is available at //github.com/darrenjkt/MS3D

We introduce Multi-Source 3D (MS3D), a new self-training pipeline for unsupervised domain adaptation in 3D object detection. Despite the remarkable accuracy of 3D detectors, they often overfit to specific domain biases, leading to suboptimal performance in various sensor setups and environments. Existing methods typically focus on adapting a single detector to the target domain, overlooking the fact that different detectors possess distinct expertise on different unseen domains. MS3D leverages this by combining different pre-trained detectors from multiple source domains and incorporating temporal information to produce high-quality pseudo-labels for fine-tuning. Our proposed Kernel-Density Estimation (KDE) Box Fusion method fuses box proposals from multiple domains to obtain pseudo-labels that surpass the performance of the best source domain detectors. MS3D exhibits greater robustness to domain shifts and produces accurate pseudo-labels over greater distances, making it well-suited for high-to-low beam domain adaptation and vice versa. Our method achieved state-of-the-art performance on all evaluated datasets, and we demonstrate that the choice of pre-trained source detectors has minimal impact on the self-training result, making MS3D suitable for real-world applications.

自主導航 · 農業機器人 · 語義分割 · 分割 · 數據集 ·

2023 年 4 月 10 日

Agronav: Autonomous Navigation Framework for Agricultural Robots and Vehicles using Semantic Segmentation and Semantic Line Detection

Shivam K Panda,Yongkyu Lee,M. Khalid Jawed

The successful implementation of vision-based navigation in agricultural fields hinges upon two critical components: 1) the accurate identification of key components within the scene, and 2) the identification of lanes through the detection of boundary lines that separate the crops from the traversable ground. We propose Agronav, an end-to-end vision-based autonomous navigation framework, which outputs the centerline from the input image by sequentially processing it through semantic segmentation and semantic line detection models. We also present Agroscapes, a pixel-level annotated dataset collected across six different crops, captured from varying heights and angles. This ensures that the framework trained on Agroscapes is generalizable across both ground and aerial robotic platforms. Codes, models and dataset will be released at \href{//github.com/shivamkumarpanda/agronav}{github.com/shivamkumarpanda/agronav}.

特征融合 · 預訓練 · 融合 · 音頻編碼 · 融合機制 ·

2023 年 4 月 8 日

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Yusong Wu,Ke Chen,Tianyu Zhang,Yuchen Hui,Taylor Berg-Kirkpatrick,Shlomo Dubnov

Contrastive learning has shown remarkable success in the field of multimodal representation learning. In this paper, we propose a pipeline of contrastive language-audio pretraining to develop an audio representation by combining audio data with natural language descriptions. To accomplish this target, we first release LAION-Audio-630K, a large collection of 633,526 audio-text pairs from different data sources. Second, we construct a contrastive language-audio pretraining model by considering different audio encoders and text encoders. We incorporate the feature fusion mechanism and keyword-to-caption augmentation into the model design to further enable the model to process audio inputs of variable lengths and enhance the performance. Third, we perform comprehensive experiments to evaluate our model across three tasks: text-to-audio retrieval, zero-shot audio classification, and supervised audio classification. The results demonstrate that our model achieves superior performance in text-to-audio retrieval task. In audio classification tasks, the model achieves state-of-the-art performance in the zero-shot setting and is able to obtain performance comparable to models' results in the non-zero-shot setting. LAION-Audio-630K and the proposed model are both available to the public.

幾何感知 · 三維物體檢測 · 三維物體 · 預訓練 · 物體檢測 ·

2023 年 4 月 7 日

Geometric-aware Pretraining for Vision-centric 3D Object Detection

Linyan Huang,Huijie Wang,Jia Zeng,Shengchuan Zhang,Liujuan Cao,Junchi Yan,Hongyang Li

from arxiv, 15 pages, 3 figures

Multi-camera 3D object detection for autonomous driving is a challenging problem that has garnered notable attention from both academia and industry. An obstacle encountered in vision-based techniques involves the precise extraction of geometry-conscious features from RGB images. Recent approaches have utilized geometric-aware image backbones pretrained on depth-relevant tasks to acquire spatial information. However, these approaches overlook the critical aspect of view transformation, resulting in inadequate performance due to the misalignment of spatial knowledge between the image backbone and view transformation. To address this issue, we propose a novel geometric-aware pretraining framework called GAPretrain. Our approach incorporates spatial and structural cues to camera networks by employing the geometric-rich modality as guidance during the pretraining phase. The transference of modal-specific attributes across different modalities is non-trivial, but we bridge this gap by using a unified bird's-eye-view (BEV) representation and structural hints derived from LiDAR point clouds to facilitate the pretraining process. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. Our experiments demonstrate the effectiveness and generalization ability of the proposed method. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively. We also conduct experiments on various image backbones and view transformations to validate the efficacy of our approach. Code will be released at //github.com/OpenDriveLab/BEVPerception-Survey-Recipe.

自主駕駛汽車 · 控制器 · 潛在 · RGBD圖像 · 集成 ·

2023 年 4 月 5 日

DeepIPC: Deeply Integrated Perception and Control for an Autonomous Vehicle in Real Environments

Oskar Natan,Jun Miura

We propose DeepIPC, an end-to-end autonomous driving model that handles both perception and control tasks in driving a vehicle. The model consists of two main parts, perception and controller modules. The perception module takes an RGBD image to perform semantic segmentation and bird's eye view (BEV) semantic mapping along with providing their encoded features. Meanwhile, the controller module processes these features with the measurement of GNSS locations and angular speed to estimate waypoints that come with latent features. Then, two different agents are used to translate waypoints and latent features into a set of navigational controls to drive the vehicle. The model is evaluated by predicting driving records and performing automated driving under various conditions in real environments. The experimental results show that DeepIPC achieves the best drivability and multi-task performance even with fewer parameters compared to the other models. Codes are available at //github.com/oskarnatan/DeepIPC.

深度估計 · 3D · 自動駕駛 · 駕駛環境 · 網絡設計 ·

2023 年 4 月 4 日

A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving

Wanshui Gan,Ningkai Mo,Hongbin Xu,Naoto Yokoya

The task of estimating 3D occupancy from surrounding view images is an exciting development in the field of autonomous driving, following the success of Birds Eye View (BEV) perception.This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. However, there is still a lack of a baseline to define the task, such as network design, optimization, and evaluation. In this work, we present a simple attempt for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation, stereo matching, and BEV perception (3D object detection and map segmentation), which could advance the study on 3D occupancy estimation. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish a new benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets.The relevant code will be available in //github.com/GANWANSHUI/SimpleOccupancy

協作 · 語義分割 · 機器人 · 分割 · 數據集 ·

2023 年 4 月 4 日

COVERED, CollabOratiVE Robot Environment Dataset for 3D Semantic segmentation

Charith Munasinghe,Fatemeh Mohammadi Amin,Davide Scaramuzza,Hans Wernher van de Venn

Safe human-robot collaboration (HRC) has recently gained a lot of interest with the emerging Industry 5.0 paradigm. Conventional robots are being replaced with more intelligent and flexible collaborative robots (cobots). Safe and efficient collaboration between cobots and humans largely relies on the cobot's comprehensive semantic understanding of the dynamic surrounding of industrial environments. Despite the importance of semantic understanding for such applications, 3D semantic segmentation of collaborative robot workspaces lacks sufficient research and dedicated datasets. The performance limitation caused by insufficient datasets is called 'data hunger' problem. To overcome this current limitation, this work develops a new dataset specifically designed for this use case, named "COVERED", which includes point-wise annotated point clouds of a robotic cell. Lastly, we also provide a benchmark of current state-of-the-art (SOTA) algorithm performance on the dataset and demonstrate a real-time semantic segmentation of a collaborative robot workspace using a multi-LiDAR system. The promising results from using the trained Deep Networks on a real-time dynamically changing situation shows that we are on the right track. Our perception pipeline achieves 20Hz throughput with a prediction point accuracy of $>$96\% and $>$92\% mean intersection over union (mIOU) while maintaining an 8Hz throughput.

特征表示 · 域自適應 · 混合集成 · 自適應 · 混合 ·

2023 年 4 月 4 日

\emph{MEnsA}: Mix-up Ensemble Average for Unsupervised Multi Target Domain Adaptation on 3D Point Clouds

Ashish Sinha,Jonghyun Choi

from arxiv, Accepted as poster at L3D-IVU (arxival track) and CL VIsion (non-arxvial track) CVPR 2023

Unsupervised domain adaptation (UDA) addresses the problem of distribution shift between the unlabeled target domain and labelled source domain. While the single target domain adaptation (STDA) is well studied in both 2D and 3D vision literature, multi-target domain adaptation (MTDA) is barely explored for 3D data despite its wide real-world applications such as autonomous driving systems for various geographical and climatic conditions. We establish an MTDA baseline for 3D point cloud data by proposing to mix the feature representations from all domains together to achieve better domain adaptation performance by an ensemble average, which we call \emph{{\bf M}ixup {\bf Ens}emble {\bf A}verage} or {\bf \emph{MEnsA}}. With the mixed representation, we use a domain classifier to improve at distinguishing the feature representations of source domain from those of target domains in a shared latent space. In extensive empirical validations on the challenging PointDA-10 dataset, we showcase a clear benefit of our simple method over previous unsupervised STDA and MTDA methods by large margins (up to $17.10\%$ and $4.76\%$ on averaged over all domain shifts). We make the code publicly available \href{//github.com/sinAshish/MEnsA_mtda}{here}\footnote{\url{//github.com/sinAshish/MEnsA_mtda}}.

NVM · 移動控制 · 機器人 · 建模場景 · 幾何信息 ·

2023 年 4 月 3 日

Neural Volumetric Memory for Visual Locomotion Control

Ruihan Yang,Ge Yang,Xiaolong Wang

from arxiv, CVPR 2023 Highlight. Our project page with videos is //rchalyang.github.io/NVM

Legged robots have the potential to expand the reach of autonomy beyond paved roads. In this work, we consider the difficult problem of locomotion on challenging terrains using a single forward-facing depth camera. Due to the partial observability of the problem, the robot has to rely on past observations to infer the terrain currently beneath it. To solve this problem, we follow the paradigm in computer vision that explicitly models the 3D geometry of the scene and propose Neural Volumetric Memory (NVM), a geometric memory architecture that explicitly accounts for the SE(3) equivariance of the 3D world. NVM aggregates feature volumes from multiple camera views by first bringing them back to the ego-centric frame of the robot. We test the learned visual-locomotion policy on a physical robot and show that our approach, which explicitly introduces geometric priors during training, offers superior performance than more na\"ive methods. We also include ablation studies and show that the representations stored in the neural volumetric memory capture sufficient geometric information to reconstruct the scene. Our project page with videos is //rchalyang.github.io/NVM .