亚洲黄色网站不卡免费,国产一区二区日韩欧美在线,97无码久久久久中文字幕精品

This paper introduces an Online Localisation and Colored Mesh Reconstruction (OLCMR) ROS perception architecture for ground exploration robots aiming to perform robust Simultaneous Localisation And Mapping (SLAM) in challenging unknown environments and provide an associated colored 3D mesh representation in real time. It is intended to be used by a remote human operator to easily visualise the mapped environment during or after the mission or as a development base for further researches in the field of exploration robotics. The architecture is mainly composed of carefully-selected open-source ROS implementations of a LiDAR-based SLAM algorithm alongside a colored surface reconstruction procedure using a point cloud and RGB camera images projected into the 3D space. The overall performances are evaluated on the Newer College handheld LiDAR-Vision reference dataset and on two experimental trajectories gathered on board of representative wheeled robots in respectively urban and countryside outdoor environments. Index Terms: Field Robots, Mapping, SLAM, Colored Surface Reconstruction

相關內容

Color

關注 4

3D · 規范化的 · INFORMS · state-of-the-art · ASSETS ·

2022 年 9 月 19 日

BareSkinNet: De-makeup and De-lighting via 3D Face Reconstruction

Xingchao Yang,Takafumi Taketomi

from arxiv, accepted at PG2022

We propose BareSkinNet, a novel method that simultaneously removes makeup and lighting influences from the face image. Our method leverages a 3D morphable model and does not require a reference clean face image or a specified light condition. By combining the process of 3D face reconstruction, we can easily obtain 3D geometry and coarse 3D textures. Using this information, we can infer normalized 3D face texture maps (diffuse, normal, roughness, and specular) by an image-translation network. Consequently, reconstructed 3D face textures without undesirable information will significantly benefit subsequent processes, such as re-lighting or re-makeup. In experiments, we show that BareSkinNet outperforms state-of-the-art makeup removal methods. In addition, our method is remarkably helpful in removing makeup to generate consistent high-fidelity texture maps, which makes it extendable to many realistic face generation applications. It can also automatically build graphic assets of face makeup images before and after with corresponding 3D data. This will assist artists in accelerating their work, such as 3D makeup avatar creation.

估計/估計量 · INFORMS · LIDAR · INTERACT · Performer ·

2022 年 9 月 19 日

SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation

Yi Wei,Linqing Zhao,Wenzhao Zheng,Zheng Zhu,Yongming Rao,Guan Huang,Jiwen Lu,Jie Zhou

from arxiv, Accepted to CoRL 2022. Project page: //surrounddepth.ivg-research.xyz Code: //github.com/weiyithu/SurroundDepth

Depth estimation from images serves as the fundamental step of 3D perception for autonomous driving and is an economical alternative to expensive depth sensors like LiDAR. The temporal photometric constraints enables self-supervised depth estimation without labels, further facilitating its application. However, most existing methods predict the depth solely based on each monocular image and ignore the correlations among multiple surrounding cameras, which are typically available for modern self-driving vehicles. In this paper, we propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras. Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views. We apply cross-view self-attention to efficiently enable the global interactions between multi-camera feature maps. Different from self-supervised monocular depth estimation, we are able to predict real-world scales given multi-camera extrinsic matrices. To achieve this goal, we adopt the two-frame structure-from-motion to extract scale-aware pseudo depths to pretrain the models. Further, instead of predicting the ego-motion of each individual camera, we estimate a universal ego-motion of the vehicle and transfer it to each view to achieve multi-view ego-motion consistency. In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets DDAD and nuScenes.

標注 · state-of-the-art · 未標記 · 可約的 · 數據集 ·

2022 年 9 月 19 日

SLRNet: Semi-Supervised Semantic Segmentation Via Label Reuse for Human Decomposition Images

Sara Mousavi,Zhenning Yang,Kelley Cross,Dawnie Steadman,Audris Mockus

Semantic segmentation is a challenging computer vision task demanding a significant amount of pixel-level annotated data. Producing such data is a time-consuming and costly process, especially for domains with a scarcity of experts, such as medicine or forensic anthropology. While numerous semi-supervised approaches have been developed to make the most from the limited labeled data and ample amount of unlabeled data, domain-specific real-world datasets often have characteristics that both reduce the effectiveness of off-the-shelf state-of-the-art methods and also provide opportunities to create new methods that exploit these characteristics. We propose and evaluate a semi-supervised method that reuses available labels for unlabeled images of a dataset by exploiting existing similarities, while dynamically weighting the impact of these reused labels in the training process. We evaluate our method on a large dataset of human decomposition images and find that our method, while conceptually simple, outperforms state-of-the-art consistency and pseudo-labeling-based methods for the segmentation of this dataset. This paper includes graphic content of human decomposition.

Neural Networks · Networking · state-of-the-art · Vision · MoDELS ·

2022 年 9 月 18 日

StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels from a Stereo Camera Using Deep Neural Networks

Hongyu Li,Zhengang Li,Neset Unver Akmandor,Huaizu Jiang,Yanzhi Wang,Taskin Padir

Obstacle detection is a safety-critical problem in robot navigation, where stereo matching is a popular vision-based approach. While deep neural networks have shown impressive results in computer vision, most of the previous obstacle detection works only leverage traditional stereo matching techniques to meet the computational constraints for real-time feedback. This paper proposes a computationally efficient method that leverages a deep neural network to detect occupancy from stereo images directly. Instead of learning the point cloud correspondence from the stereo data, our approach extracts the compact obstacle distribution based on volumetric representations. In addition, we prune the computation of safety irrelevant spaces in a coarse-to-fine manner based on octrees generated by the decoder. As a result, we achieve real-time performance on the onboard computer (NVIDIA Jetson TX2). Our approach detects obstacles accurately in the range of 32 meters and achieves better IoU (Intersection over Union) and CD (Chamfer Distance) scores with only 2% of the computation cost of the state-of-the-art stereo model. Furthermore, we validate our method's robustness and real-world feasibility through autonomous navigation experiments with a real robot. Hence, our work contributes toward closing the gap between the stereo-based system in robot perception and state-of-the-art stereo models in computer vision. To counter the scarcity of high-quality real-world indoor stereo datasets, we collect a 1.36 hours stereo dataset with a Jackal robot which is used to fine-tune our model. The dataset, the code, and more visualizations are available at //lhy.xyz/stereovoxelnet/

INFORMS · 塑造 · 損失 · 3D · 三維重建 ·

2022 年 9 月 16 日

Capturing Shape Information with Multi-Scale Topological Loss Terms for 3D Reconstruction

Dominik J. E. Waibel,Scott Atwell,Matthias Meier,Carsten Marr,Bastian Rieck

from arxiv, Accepted at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI)

Reconstructing 3D objects from 2D images is both challenging for our brains and machine learning algorithms. To support this spatial reasoning task, contextual information about the overall shape of an object is critical. However, such information is not captured by established loss terms (e.g. Dice loss). We propose to complement geometrical shape information by including multi-scale topological features, such as connected components, cycles, and voids, in the reconstruction loss. Our method uses cubical complexes to calculate topological features of 3D volume data and employs an optimal transport distance to guide the reconstruction process. This topology-aware loss is fully differentiable, computationally efficient, and can be added to any neural network. We demonstrate the utility of our loss by incorporating it into SHAPR, a model for predicting the 3D cell shape of individual cells based on 2D microscopy images. Using a hybrid loss that leverages both geometrical and topological information of single objects to assess their shape, we find that topological information substantially improves the quality of reconstructions, thus highlighting its ability to extract more relevant features from image datasets.

Networking · Neural Networks · Extensibility · 機器人 · 可交換的 ·

2022 年 9 月 16 日

A Robotic Visual Grasping Design: Rethinking Convolution Neural Network with High-Resolutions

Zhangli Zhou,Shaochen Wang,Ziyang Chen,Mingyu Cai,Zhen Kan

High-resolution representations are important for vision-based robotic grasping problems. Existing works generally encode the input images into low-resolution representations via sub-networks and then recover high-resolution representations. This will lose spatial information, and errors introduced by the decoder will be more serious when multiple types of objects are considered or objects are far away from the camera. To address these issues, we revisit the design paradigm of CNN for robotic perception tasks. We demonstrate that using parallel branches as opposed to serial stacked convolutional layers will be a more powerful design for robotic visual grasping tasks. In particular, guidelines of neural network design are provided for robotic perception tasks, e.g., high-resolution representation and lightweight design, which respond to the challenges in different manipulation scenarios. We then develop a novel grasping visual architecture referred to as HRG-Net, a parallel-branch structure that always maintains a high-resolution representation and repeatedly exchanges information across resolutions. Extensive experiments validate that these two designs can effectively enhance the accuracy of visual-based grasping and accelerate network training. We show a series of comparative experiments in real physical environments at Youtube: //youtu.be/Jhlsp-xzHFY.

Learning · 不變 · 無監督 · contrastive · Less ·

2022 年 9 月 15 日

Delving into Inter-Image Invariance for Unsupervised Visual Representations

Jiahao Xie,Xiaohang Zhan,Ziwei Liu,Yew Soon Ong,Chen Change Loy

from arxiv, International Journal of Computer Vision (IJCV), 2022

Contrastive learning has recently shown immense potential in unsupervised visual representation learning. Existing studies in this track mainly focus on intra-image invariance learning. The learning typically uses rich intra-image transformations to construct positive pairs and then maximizes agreement using a contrastive loss. The merits of inter-image invariance, conversely, remain much less explored. One major obstacle to exploit inter-image invariance is that it is unclear how to reliably construct inter-image positive pairs, and further derive effective supervision from them since no pair annotations are available. In this work, we present a comprehensive empirical study to better understand the role of inter-image invariance learning from three main constituting components: pseudo-label maintenance, sampling strategy, and decision boundary design. To facilitate the study, we introduce a unified and generic framework that supports the integration of unsupervised intra- and inter-image invariance learning. Through carefully-designed comparisons and analysis, multiple valuable observations are revealed: 1) online labels converge faster and perform better than offline labels; 2) semi-hard negative samples are more reliable and unbiased than hard negative samples; 3) a less stringent decision boundary is more favorable for inter-image invariance learning. With all the obtained recipes, our final model, namely InterCLR, shows consistent improvements over state-of-the-art intra-image invariance learning methods on multiple standard benchmarks. We hope this work will provide useful experience for devising effective unsupervised inter-image invariance learning. Code: //github.com/open-mmlab/mmselfsup.

LIDAR · Projection · Unstructured · 3D · 目標檢測 ·

2022 年 9 月 15 日

FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D Object Detection

Chaokang Jiang,Guangming Wang,Jinxing Wu,Yanzi Miao,Hesheng Wang

from arxiv, 7 pages, 4 figures; under review

Promising complementarity exists between the texture features of color images and the geometric information of LiDAR point clouds. However, there still present many challenges for efficient and robust feature fusion in the field of 3D object detection. In this paper, first, unstructured 3D point clouds are filled in the 2D plane and 3D point cloud features are extracted faster using projection-aware convolution layers. Further, the corresponding indexes between different sensor signals are established in advance in the data preprocessing, which enables faster cross-modal feature fusion. To address LiDAR points and image pixels misalignment problems, two new plug-and-play fusion modules, LiCamFuse and BiLiCamFuse, are proposed. In LiCamFuse, soft query weights with perceiving the Euclidean distance of bimodal features are proposed. In BiLiCamFuse, the fusion module with dual attention is proposed to deeply correlate the geometric and textural features of the scene. The quantitative results on the KITTI dataset demonstrate that the proposed method achieves better feature-level fusion. In addition, the proposed network shows a shorter running time compared to existing methods.

Learning · 可約的 · Obvious · 損失函數（機器學習） · 正則化項 ·

2022 年 9 月 15 日

DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction

Kaichen Zhou,Lanqing Hong,Changhao Chen,Hang Xu,Chaoqiang Ye,Qingyong Hu,Zhenguo Li

from arxiv, Accepted by European Conference on Computer Vision 2022 (ECCV2022)

Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames. However, they neither fully exploit the 3D point-wise geometric correspondences, nor effectively tackle the ambiguities in the photometric warping caused by occlusions or illumination inconsistency. To address these problems, this work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework, that can consider 3D spatial information, and exploit stronger geometric constraints among adjacent camera frustums. Instead of directly regressing the pixel value from a single image, our DevNet divides the camera frustum into multiple parallel planes and predicts the pointwise occlusion probability density on each plane. The final depth map is generated by integrating the density along corresponding rays. During the training process, novel regularization strategies and loss functions are introduced to mitigate photometric ambiguities and overfitting. Without obviously enlarging model parameters size or running time, DevNet outperforms several representative baselines on both the KITTI-2015 outdoor dataset and NYU-V2 indoor dataset. In particular, the root-mean-square-deviation is reduced by around 4% with DevNet on both KITTI-2015 and NYU-V2 in the task of depth estimation. Code is available at //github.com/gitkaichenzhou/DevNet.

可理解性 · 邊界框 · RGB-D · 相互獨立的 · 3D ·

2020 年 2 月 27 日

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Yinyu Nie,Xiaoguang Han,Shihui Guo,Yujian Zheng,Jian Chang,Jian Jun Zhang

from arxiv, Accepted by CVPR 2020

Semantic reconstruction of indoor scenes refers to both scene understanding and object reconstruction. Existing works either address one part of this problem or focus on independent objects. In this paper, we bridge the gap between understanding and reconstruction, and propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image. Instead of separately resolving scene understanding and object reconstruction, our method builds upon a holistic scene context and proposes a coarse-to-fine hierarchy with three components: 1. room layout with camera pose; 2. 3D object bounding boxes; 3. object meshes. We argue that understanding the context of each component can assist the task of parsing the others, which enables joint understanding and reconstruction. The experiments on the SUN RGB-D and Pix3D datasets demonstrate that our method consistently outperforms existing methods in indoor layout estimation, 3D object detection and mesh reconstruction.