销魂美女一区二区三区AV,无码人妻综合精品一区色欲AV,欧美一道高清一区二区三区,一级A性色生活片久久毛片,欧美你懂的在线观看

In the last decade, numerous supervised deep learning approaches requiring large amounts of labeled data have been proposed for visual-inertial odometry (VIO) and depth map estimation. To overcome the data limitation, self-supervised learning has emerged as a promising alternative, exploiting constraints such as geometric and photometric consistency in the scene. In this study, we introduce a novel self-supervised deep learning-based VIO and depth map recovery approach (SelfVIO) using adversarial training and self-adaptive visual-inertial sensor fusion. SelfVIO learns to jointly estimate 6 degrees-of-freedom (6-DoF) ego-motion and a depth map of the scene from unlabeled monocular RGB image sequences and inertial measurement unit (IMU) readings. The proposed approach is able to perform VIO without the need for IMU intrinsic parameters and/or the extrinsic calibration between the IMU and the camera. estimation and single-view depth recovery network. We provide comprehensive quantitative and qualitative evaluations of the proposed framework comparing its performance with state-of-the-art VIO, VO, and visual simultaneous localization and mapping (VSLAM) approaches on the KITTI, EuRoC and Cityscapes datasets. Detailed comparisons prove that SelfVIO outperforms state-of-the-art VIO approaches in terms of pose estimation and depth recovery, making it a promising approach among existing methods in the literature.

相關內容

估計/估計量

關注 3

Networking · SimPLe · Less · 回合 · 有向 ·

2019 年 5 月 16 日

Monocular Plan View Networks for Autonomous Driving

Dequan Wang,Coline Devin,Qi-Zhi Cai,Philipp Kr?henbühl,Trevor Darrell

from arxiv, 8 pages, 9 figures

Convolutions on monocular dash cam videos capture spatial invariances in the image plane but do not explicitly reason about distances and depth. We propose a simple transformation of observations into a bird's eye view, also known as plan view, for end-to-end control. We detect vehicles and pedestrians in the first person view and project them into an overhead plan view. This representation provides an abstraction of the environment from which a deep network can easily deduce the positions and directions of entities. Additionally, the plan view enables us to leverage advances in 3D object detection in conjunction with deep policy learning. We evaluate our monocular plan view network on the photo-realistic Grand Theft Auto V simulator. A network using both a plan view and front view causes less than half as many collisions as previous detection-based methods and an order of magnitude fewer collisions than pure pixel-based policies.

機器人 · 端到端 · 估計/估計量 · state-of-the-art · 學成 ·

2019 年 5 月 16 日

Vision-based Robotic Grasping from Object Localization, Pose Estimation, Grasp Detection to Motion Planning: A Review

Guoguang Du,Kai Wang,Shiguo Lian

from arxiv, 24 pages, 36 figures

This paper presents a comprehensive survey on vision-based robotic grasping. We concluded four key tasks during robotic grasping, which are object localization, pose estimation, grasp detection and motion planning. In detail, object localization includes object detection and segmentation methods, pose estimation includes RGB-based and RGB-D-based methods, grasp detection includes traditional methods and deep learning-based methods, motion planning includes analytical methods, imitating learning methods, and reinforcement learning methods. Besides, lots of methods accomplish some of the tasks jointly, such as object-detection-combined 6D pose estimation, grasp detection without pose estimation, end-to-end grasp detection, and end-to-end motion planning. These methods are reviewed elaborately in this survey. What's more, related datasets are summarized and comparisons between state-of-the-art methods are given for each task. Challenges about robotic grasping are presented, and future directions in addressing these challenges are also pointed out.

稀疏 · 三維重建 · 學成 · 有向 · SLAM ·

2019 年 3 月 21 日

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Jiexiong Tang,John Folkesson,Patric Jensfelt

from arxiv, Accepted to ICRA 2019 (RA-L option), video demo available at //www.youtube.com/watch?v=3pbSHX72JC8&t=22s

In this paper, we proposed a new deep learning based dense monocular SLAM method. Compared to existing methods, the proposed framework constructs a dense 3D model via a sparse to dense mapping using learned surface normals. With single view learned depth estimation as prior for monocular visual odometry, we obtain both accurate positioning and high quality depth reconstruction. The depth and normal are predicted by a single network trained in a tightly coupled manner.Experimental results show that our method significantly improves the performance of visual tracking and depth prediction in comparison to the state-of-the-art in deep monocular dense SLAM.

估計/估計量 · 3D · 全 · 塑造 · 真實值 ·

2019 年 3 月 3 日

3D Hand Shape and Pose Estimation from a Single RGB Image

Liuhao Ge,Zhou Ren,Yuncheng Li,Zehao Xue,Yingying Wang,Jianfei Cai,Junsong Yuan

from arxiv, CVPR 2019 (Oral), //sites.google.com/site/geliuhaontu/home/cvpr2019

This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully express the 3D shape of hand. In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. To train networks with full supervision, we create a large-scale synthetic dataset containing both ground truth 3D meshes and 3D poses. When fine-tuning the networks on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth map as a weak supervision in training. Through extensive evaluations on our proposed new datasets and two public datasets, we show that our proposed method can produce accurate and reasonable 3D hand mesh, and can achieve superior 3D hand pose estimation accuracy when compared with state-of-the-art methods.

三維重建 · 示例 · Extensibility · 3D · 估計/估計量 ·

2019 年 2 月 26 日

Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding

Zehao Yu,Jia Zheng,Dongze Lian,Zihan Zhou,Shenghua Gao

from arxiv, To appear in CVPR 2019

Single-image piece-wise planar 3D reconstruction aims to simultaneously segment plane instances and recover 3D plane parameters from an image. Most recent approaches leverage convolutional neural networks (CNNs) and achieve promising results. However, these methods are limited to detecting a fixed number of planes with certain learned order. To tackle this problem, we propose a novel two-stage method based on associative embedding, inspired by its recent success in instance segmentation. In the first stage, we train a CNN to map each pixel to an embedding space where pixels from the same plane instance have similar embeddings. Then, the plane instances are obtained by grouping the embedding vectors in planar regions via an efficient mean shift clustering algorithm. In the second stage, we estimate the parameter for each plane instance by considering both pixel-level and instance-level consistencies. With the proposed method, we are able to detect an arbitrary number of planes. Extensive experiments on public datasets validate the effectiveness and efficiency of our method. Furthermore, our method runs at 30 fps at the testing time, thus could facilitate many real-time applications such as visual SLAM and human-robot interaction. Code is available at //github.com/svip-lab/PlanarReconstruction.

估計/估計量 · Performer · Extensibility · 可約的 · 層 ·

2018 年 10 月 3 日

SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation

Sudeep Pillai,Rares Ambrus,Adrien Gaidon

from arxiv, 6 pages, 5 figures, 2 tables, ICRA 2019 Submission

Recent techniques in self-supervised monocular depth estimation are approaching the performance of supervised methods, but operate in low resolution only. We show that high resolution is key towards high-fidelity self-supervised monocular depth prediction. Inspired by recent deep learning methods for Single-Image Super-Resolution, we propose a sub-pixel convolutional layer extension for depth super-resolution that accurately synthesizes high-resolution disparities from their corresponding low-resolution convolutional features. In addition, we introduce a differentiable flip-augmentation layer that accurately fuses predictions from the image and its horizontally flipped version, reducing the effect of left and right shadow regions generated in the disparity map due to occlusions. Both contributions provide significant performance gains over the state-of-the-art in self-supervised depth and pose estimation on the public KITTI benchmark. A video of our approach can be found at //youtu.be/jKNgBeBMx0I.

SLAM · 回合 · 即時定位與地圖構建 · Mono · 慕尼黑工業大學 ·

2018 年 9 月 10 日

Monocular Object and Plane SLAM in Structured Environments

Shichao Yang,Sebastian Scherer

We present a monocular Simultaneous Localization and Mapping (SLAM) using high level object and plane landmarks, in addition to points. The resulting map is denser, more compact and meaningful compared to point only SLAM. We first propose a high order graphical model to jointly infer the 3D object and layout planes from single image considering occlusions and semantic constraints. The extracted cuboid object and layout planes are further optimized in a unified SLAM framework. Objects and planes can provide more semantic constraints such as Manhattan and object supporting relationships compared to points. Experiments on various public and collected datasets including ICL NUIM and TUM mono show that our algorithm can improve camera localization accuracy compared to state-of-the-art SLAM and also generate dense maps in many structured environments.

即時定位與地圖構建 · SLAM · 回合 · 環 · 3D ·

2018 年 9 月 9 日

Simultaneous Localization and Mapping (SLAM) using RTAB-MAP

Sagarnil Das

from arxiv, 7 pages, 13 figures

This paper implements Simultaneous Localization and Mapping (SLAM) technique to construct a map of a given environment. A Real Time Appearance Based Mapping (RTAB-Map) approach was taken for accomplishing this task. Initially, a 2d occupancy grid and 3d octomap was created from a provided simulated environment. Next, a personal simulated environment was created for mapping as well. In this appearance based method, a process called Loop Closure is used to determine whether a robot has seen a location before or not. In this paper, it is seen that RTAB-Map is optimized for large scale and long term SLAM by using multiple strategies to allow for loop closure to be done in real time and the results depict that it can be an excellent solution for SLAM to develop robots that can map an environment in both 2d and 3d.

代價函數 · 目標跟蹤 · 優化器 · 泛函 · 代價 ·

2018 年 7 月 5 日

A Gauss-Newton Approach to Real-Time Monocular Multiple Object Tracking

Henning Tjaden,Ulrich Schwanecke,Elmar Sch?mer,Daniel Cremers

We propose an algorithm for real-time 6DOF pose tracking of rigid 3D objects using a monocular RGB camera. The key idea is to derive a region-based cost function using temporally consistent local color histograms. While such region-based cost functions are commonly optimized using first-order gradient descent techniques, we systematically derive a Gauss-Newton optimization scheme which gives rise to drastically faster convergence and highly accurate and robust tracking performance. We furthermore propose a novel complex dataset dedicated for the task of monocular object pose tracking and make it publicly available to the community. To our knowledge, It is the first to address the common and important scenario in which both the camera as well as the objects are moving simultaneously in cluttered scenes. In numerous experiments - including our own proposed data set - we demonstrate that the proposed Gauss-Newton approach outperforms existing approaches, in particular in the presence of cluttered backgrounds, heterogeneous objects and partial occlusions.

估計/估計量 · Performer · Vision · 目標檢測 · state-of-the-art ·

2018 年 1 月 24 日

The challenge of simultaneous object detection and pose estimation: a comparative study

Daniel O?oro-Rubio,Roberto J. López-Sastre,Carolina Redondo-Cabrera,Pedro Gil-Jiménez

Detecting objects and estimating their pose remains as one of the major challenges of the computer vision research community. There exists a compromise between localizing the objects and estimating their viewpoints. The detector ideally needs to be view-invariant, while the pose estimation process should be able to generalize towards the category-level. This work is an exploration of using deep learning models for solving both problems simultaneously. For doing so, we propose three novel deep learning architectures, which are able to perform a joint detection and pose estimation, where we gradually decouple the two tasks. We also investigate whether the pose estimation problem should be solved as a classification or regression problem, being this still an open question in the computer vision community. We detail a comparative analysis of all our solutions and the methods that currently define the state of the art for this problem. We use PASCAL3D+ and ObjectNet3D datasets to present the thorough experimental evaluation and main results. With the proposed models we achieve the state-of-the-art performance in both datasets.