亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<li id='PK6ZQ'></li>

_{^{<dd id='PFgi8'><tbody id='scVNI'><td id='eACtQ'><optgroup id='h7m5P'><strong id='urySQ'></strong></optgroup><address id='veqzu'><ul id='r35Ae'></ul></address><big id='GlJn9'></big></td><table id='uKZuO'></table></tbody><pre id='pRsAn'></pre></dd><span id='3v5ef'><b id='xgyCM'></b></span>}}


<dfn id='Hbtl1'><optgroup id='OQVJw'></optgroup></dfn><tfoot id='Hd06u'><bdo id='ZnTe0'><div id='UeSSA'></div><i id='Zsexd'><dt id='QbK9d'></dt></i></bdo></tfoot>

_{<fieldset id='NCIJP'></fieldset>}

·

語義場景補全 · 變換 · 稀疏 · 3D · 遮擋 ·

2023 年 3 月 25 日

VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion

Yiming Li,Zhiding Yu,Christopher Choy,Chaowei Xiao,Jose M. Alvarez,Sanja Fidler,Chen Feng,Anima Anandkumar

from arxiv, CVPR 2023 Highlight (10% of accepted papers, 2.5% of submissions)

Humans can easily imagine the complete 3D geometry of occluded objects and scenes. This appealing ability is vital for recognition and understanding. To enable such capability in AI systems, we propose VoxFormer, a Transformer-based semantic scene completion framework that can output complete 3D volumetric semantics from only 2D images. Our framework adopts a two-stage design where we start from a sparse set of visible and occupied voxel queries from depth estimation, followed by a densification stage that generates dense 3D voxels from the sparse ones. A key idea of this design is that the visual features on 2D images correspond only to the visible scene structures rather than the occluded or empty spaces. Therefore, starting with the featurization and prediction of the visible structures is more reliable. Once we obtain the set of sparse queries, we apply a masked autoencoder design to propagate the information to all the voxels by self-attention. Experiments on SemanticKITTI show that VoxFormer outperforms the state of the art with a relative improvement of 20.0% in geometry and 18.1% in semantics and reduces GPU memory during training to less than 16GB. Our code is available on //github.com/NVlabs/VoxFormer.

相關內容

語義場景補全

語義場景補全

LIDAR · 示例 · Networking · 回合 · 推斷 ·

2023 年 5 月 16 日

InstaLoc: One-shot Global Lidar Localisation in Indoor Environments through Instance Learning

Lintong Zhang,Tejaswi Digumarti,Georgi Tinchev,Maurice Fallon

from arxiv, This paper is presented at the Robotics: Science and Systems (RSS) 2023

Localization for autonomous robots in prior maps is crucial for their functionality. This paper offers a solution to this problem for indoor environments called InstaLoc, which operates on an individual lidar scan to localize it within a prior map. We draw on inspiration from how humans navigate and position themselves by recognizing the layout of distinctive objects and structures. Mimicking the human approach, InstaLoc identifies and matches object instances in the scene with those from a prior map. As far as we know, this is the first method to use panoptic segmentation directly inferring on 3D lidar scans for indoor localization. InstaLoc operates through two networks based on spatially sparse tensors to directly infer dense 3D lidar point clouds. The first network is a panoptic segmentation network that produces object instances and their semantic classes. The second smaller network produces a descriptor for each object instance. A consensus based matching algorithm then matches the instances to the prior map and estimates a six degrees of freedom (DoF) pose for the input cloud in the prior map. The significance of InstaLoc is that it has two efficient networks. It requires only one to two hours of training on a mobile GPU and runs in real-time at 1 Hz. Our method achieves between two and four times more detections when localizing, as compared to baseline methods, and achieves higher precision on these detections.

相關系數 · Pyramid · Networking · 目標跟蹤 · Attention ·

2023 年 5 月 16 日

Correlation Pyramid Network for 3D Single Object Tracking

Mengmeng Wang,Teli Ma,Xingxing Zuo,Jiajun Lv,Yong Liu

3D LiDAR-based single object tracking (SOT) has gained increasing attention as it plays a crucial role in 3D applications such as autonomous driving. The central problem is how to learn a target-aware representation from the sparse and incomplete point clouds. In this paper, we propose a novel Correlation Pyramid Network (CorpNet) with a unified encoder and a motion-factorized decoder. Specifically, the encoder introduces multi-level self attentions and cross attentions in its main branch to enrich the template and search region features and realize their fusion and interaction, respectively. Additionally, considering the sparsity characteristics of the point clouds, we design a lateral correlation pyramid structure for the encoder to keep as many points as possible by integrating hierarchical correlated features. The output features of the search region from the encoder can be directly fed into the decoder for predicting target locations without any extra matcher. Moreover, in the decoder of CorpNet, we design a motion-factorized head to explicitly learn the different movement patterns of the up axis and the x-y plane together. Extensive experiments on two commonly-used datasets show our CorpNet achieves state-of-the-art results while running in real-time.

點云 · Learning · 掩碼 · 規范化的 · 曲率 ·

2023 年 5 月 15 日

GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training

Xiaoyu Tian,Haoxi Ran,Yue Wang,Hang Zhao

from arxiv, Accepted to CVPR 2023

This paper tries to address a fundamental question in point cloud self-supervised learning: what is a good signal we should leverage to learn features from point clouds without annotations? To answer that, we introduce a point cloud representation learning framework, based on geometric feature reconstruction. In contrast to recent papers that directly adopt masked autoencoder (MAE) and only predict original coordinates or occupancy from masked point clouds, our method revisits differences between images and point clouds and identifies three self-supervised learning objectives peculiar to point clouds, namely centroid prediction, normal estimation, and curvature prediction. Combined with occupancy prediction, these four objectives yield an nontrivial self-supervised learning task and mutually facilitate models to better reason fine-grained geometry of point clouds. Our pipeline is conceptually simple and it consists of two major steps: first, it randomly masks out groups of points, followed by a Transformer-based point cloud encoder; second, a lightweight Transformer decoder predicts centroid, normal, and curvature for points in each voxel. We transfer the pre-trained Transformer encoder to a downstream peception model. On the nuScene Datset, our model achieves 3.38 mAP improvment for object detection, 2.1 mIoU gain for segmentation, and 1.7 AMOTA gain for multi-object tracking. We also conduct experiments on the Waymo Open Dataset and achieve significant performance improvements over baselines as well.

TR · 圖 · Learning · 通用動力公司 · MoDELS ·

2023 年 5 月 15 日

Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

Jingyi Wang,Jinfa Huang,Can Zhang,Zhidong Deng

from arxiv, Preprint. Accepted by ICRA 2023

Dynamic scene graphs generated from video clips could help enhance the semantic visual understanding in a wide range of challenging tasks such as environmental perception, autonomous navigation, and task planning of self-driving vehicles and mobile robots. In the process of temporal and spatial modeling during dynamic scene graph generation, it is particularly intractable to learn time-variant relations in dynamic scene graphs among frames. In this paper, we propose a Time-variant Relation-aware TRansformer (TR$^2$), which aims to model the temporal change of relations in dynamic scene graphs. Explicitly, we leverage the difference of text embeddings of prompted sentences about relation labels as the supervision signal for relations. In this way, cross-modality feature guidance is realized for the learning of time-variant relations. Implicitly, we design a relation feature fusion module with a transformer and an additional message token that describes the difference between adjacent frames. Extensive experiments on the Action Genome dataset prove that our TR$^2$ can effectively model the time-variant relations. TR$^2$ significantly outperforms previous state-of-the-art methods under two different settings by 2.1% and 2.6% respectively.

估計/估計量 · 機器人 · 回合 · FAST · Continuity ·

2023 年 5 月 15 日

Fast Traversability Estimation for Wild Visual Navigation

Jonas Frey,Matias Mattamala,Nived Chebrolu,Cesar Cadena,Maurice Fallon,Marco Hutter

from arxiv, Accepted for Robotics: Science and Systems 2023

Natural environments such as forests and grasslands are challenging for robotic navigation because of the false perception of rigid obstacles from high grass, twigs, or bushes. In this work, we propose Wild Visual Navigation (WVN), an online self-supervised learning system for traversability estimation which uses only vision. The system is able to continuously adapt from a short human demonstration in the field. It leverages high-dimensional features from self-supervised visual transformer models, with an online scheme for supervision generation that runs in real-time on the robot. We demonstrate the advantages of our approach with experiments and ablation studies in challenging environments in forests, parks, and grasslands. Our system is able to bootstrap the traversable terrain segmentation in less than 5 min of in-field training time, enabling the robot to navigate in complex outdoor terrains - negotiating obstacles in high grass as well as a 1.4 km footpath following. While our experiments were executed with a quadruped robot, ANYmal, the approach presented can generalize to any ground robot.

估計/估計量 · 塑造 · 3D · 優化器 · MoDELS ·

2023 年 5 月 15 日

Optimal and Robust Category-level Perception: Object Pose and Shape Estimation from 2D and 3D Semantic Keypoints

Jingnan Shi,Heng Yang,Luca Carlone

from arxiv, arXiv admin note: text overlap with arXiv:2104.08383

We consider a category-level perception problem, where one is given 2D or 3D sensor data picturing an object of a given category (e.g., a car), and has to reconstruct the 3D pose and shape of the object despite intra-class variability (i.e., different car models have different shapes). We consider an active shape model, where -for an object category- we are given a library of potential CAD models describing objects in that category, and we adopt a standard formulation where pose and shape are estimated from 2D or 3D keypoints via non-convex optimization. Our first contribution is to develop PACE3D* and PACE2D*, the first certifiably optimal solvers for pose and shape estimation using 3D and 2D keypoints, respectively. Both solvers rely on the design of tight (i.e., exact) semidefinite relaxations. Our second contribution is to develop outlier-robust versions of both solvers, named PACE3D# and PACE2D#. Towards this goal, we propose ROBIN, a general graph-theoretic framework to prune outliers, which uses compatibility hypergraphs to model measurements' compatibility. We show that in category-level perception problems these hypergraphs can be built from the winding orders of the keypoints (in 2D) or their convex hulls (in 3D), and many outliers can be filtered out via maximum hyperclique computation. The last contribution is an extensive experimental evaluation. Besides providing an ablation study on simulated datasets and on the PASCAL3D+ dataset, we combine our solver with a deep keypoint detector, and show that PACE3D# improves over the state of the art in vehicle pose estimation in the ApolloScape datasets, and its runtime is compatible with practical applications. We release our code at //github.com/MIT-SPARK/PACE.

Pyramid · Performer · 變換 · Backbone · 推斷 ·

2023 年 5 月 14 日

Pyramid Fusion Transformer for Semantic Segmentation

Zipeng Qin,Jianbo Liu,Xiaolin Zhang,Maoqing Tian,Aojun Zhou,Shuai Yi,Shaoan Qi,Hongsheng Li

The recently proposed MaskFormer gives a refreshed perspective on the task of semantic segmentation: it shifts from the popular pixel-level classification paradigm to a mask-level classification method. In essence, it generates paired probabilities and masks corresponding to category segments and combines them during inference for the segmentation maps. In our study, we find that per-mask classification decoder on top of a single-scale feature is not effective enough to extract reliable probability or mask. To mine for rich semantic information across the feature pyramid, we propose a transformer-based Pyramid Fusion Transformer (PFT) for per-mask approach semantic segmentation with multi-scale features. The proposed transformer decoder performs cross-attention between the learnable queries and each spatial feature from the feature pyramid in parallel and uses cross-scale inter-query attention to exchange complimentary information. We achieve competitive performance on three widely used semantic segmentation datasets. In particular, on ADE20K validation set, our result with Swin-B backbone surpasses that of MaskFormer's with a much larger Swin-L backbone in both single-scale and multi-scale inference, achieving 54.1 mIoU and 55.7 mIoU respectively. Using a Swin-L backbone, we achieve single-scale 56.1 mIoU and multi-scale 57.4 mIoU, obtaining state-of-the-art performance on the dataset. Extensive experiments on three widely used semantic segmentation datasets verify the effectiveness of our proposed method.

SLAM · 估計/估計量 · Performance · 機器人 · MoDELS ·

2023 年 5 月 12 日

An Object SLAM Framework for Association, Mapping, and High-Level Tasks

Yanmin Wu,Yunzhou Zhang,Delong Zhu,Zhiqiang Deng,Wenkai Sun,Xin Chen,Jian Zhang

from arxiv, Accepted by IEEE Transactions on Robotics(T-RO)

Object SLAM is considered increasingly significant for robot high-level perception and decision-making. Existing studies fall short in terms of data association, object representation, and semantic mapping and frequently rely on additional assumptions, limiting their performance. In this paper, we present a comprehensive object SLAM framework that focuses on object-based perception and object-oriented robot tasks. First, we propose an ensemble data association approach for associating objects in complicated conditions by incorporating parametric and nonparametric statistic testing. In addition, we suggest an outlier-robust centroid and scale estimation algorithm for modeling objects based on the iForest and line alignment. Then a lightweight and object-oriented map is represented by estimated general object models. Taking into consideration the semantic invariance of objects, we convert the object map to a topological map to provide semantic descriptors to enable multi-map matching. Finally, we suggest an object-driven active exploration strategy to achieve autonomous mapping in the grasping scenario. A range of public datasets and real-world results in mapping, augmented reality, scene matching, relocalization, and robotic manipulation have been used to evaluate the proposed object SLAM framework for its efficient performance.

神經場 · 潛在 · 分層潛在擴散模型 · 擴散模型 · 模型生成 ·

2023 年 4 月 19 日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Seung Wook Kim,Bradley Brown,Kangxue Yin,Karsten Kreis,Katja Schwarz,Daiqing Li,Robin Rombach,Antonio Torralba,Sanja Fidler

from arxiv, CVPR 2023

Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene. To further compress this representation, we train a latent-autoencoder that maps the voxel grids to a set of latent representations. A hierarchical diffusion model is then fit to the latents to complete the scene generation pipeline. We achieve a substantial improvement over existing state-of-the-art scene generation models. Additionally, we show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.

Branch · 監督 · Extensibility · Performer · 語義相似度 ·

2021 年 12 月 16 日

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Jie Qin,Jie Wu,Xuefeng Xiao,Lujun Li,Xingang Wang

from arxiv, Accepted by AAAI2022

Image-level weakly supervised semantic segmentation (WSSS) is a fundamental yet challenging computer vision task facilitating scene understanding and automatic driving. Most existing methods resort to classification-based Class Activation Maps (CAMs) to play as the initial pseudo labels, which tend to focus on the discriminative image regions and lack customized characteristics for the segmentation task. To alleviate this issue, we propose a novel activation modulation and recalibration (AMR) scheme, which leverages a spotlight branch and a compensation branch to obtain weighted CAMs that can provide recalibration supervision and task-specific concepts. Specifically, an attention modulation module (AMM) is employed to rearrange the distribution of feature importance from the channel-spatial sequential perspective, which helps to explicitly model channel-wise interdependencies and spatial encodings to adaptively modulate segmentation-oriented activation responses. Furthermore, we introduce a cross pseudo supervision for dual branches, which can be regarded as a semantic similar regularization to mutually refine two branches. Extensive experiments show that AMR establishes a new state-of-the-art performance on the PASCAL VOC 2012 dataset, surpassing not only current methods trained with the image-level of supervision but also some methods relying on stronger supervision, such as saliency label. Experiments also reveal that our scheme is plug-and-play and can be incorporated with other approaches to boost their performance.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

語義場景補全

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='yG0RD'></tfoot>

<legend id='ogOXi'><style id='cW9lv'><dir id='ckN0Z'><q id='Us2EG'></q></dir></style></legend>

<i id='9LnYJ'><tr id='QdD25'><dt id='IAVfK'><q id='NXS8M'><span id='T4DHC'><b id='JZUHS'><form id='x7nFz'><ins id='uPmrU'></ins><ul id='DvsWP'></ul><sub id='DBBV0'></sub></form><legend id='LZrb1'></legend><bdo id='fiPwa'><pre id='1s6iH'><center id='LnqPq'></center></pre></bdo></b><th id='9LAnj'></th></span></q></dt></tr></i><div id='aye3C'><tfoot id='FCWx3'></tfoot><dl id='Dtxc0'><fieldset id='7aMve'></fieldset></dl></div>

<li id='M0vcn'><abbr id='iOygm'></abbr></li>