五月丁香四月婷婷激情综合_91在线无码精品秘入口九色_亚洲综合中文字幕一区二区_欧美性色大片在线观看_欧美精品欧美激情一区二区_欧美一级性爱AAAAA影视_欧美综合视频一区二区三区

3D reconstruction aims to reconstruct 3D objects from 2D views. Previous works for 3D reconstruction mainly focus on feature matching between views or using CNNs as backbones. Recently, Transformers have been shown effective in multiple applications of computer vision. However, whether or not Transformers can be used for 3D reconstruction is still unclear. In this paper, we fill this gap by proposing 3D-RETR, which is able to perform end-to-end 3D REconstruction with TRansformers. 3D-RETR first uses a pretrained Transformer to extract visual features from 2D input images. 3D-RETR then uses another Transformer Decoder to obtain the voxel features. A CNN Decoder then takes as input the voxel features to obtain the reconstructed objects. 3D-RETR is capable of 3D reconstruction from a single view or multiple views. Experimental results on two datasets show that 3DRETR reaches state-of-the-art performance on 3D reconstruction. Additional ablation study also demonstrates that 3D-DETR benefits from using Transformers.

相關內容

三維(wei)重建

關注 1173

在計(ji)算機(ji)(ji)視覺中, 三(san)維(wei)(wei)重(zhong)(zhong)建(jian)是(shi)指(zhi)根據單(dan)視圖(tu)或者多(duo)視圖(tu)的(de)(de)(de)(de)圖(tu)像(xiang)重(zhong)(zhong)建(jian)三(san)維(wei)(wei)信(xin)息(xi)(xi)的(de)(de)(de)(de)過(guo)程(cheng). 由于單(dan)視頻的(de)(de)(de)(de)信(xin)息(xi)(xi)不(bu)完全(quan),因此三(san)維(wei)(wei)重(zhong)(zhong)建(jian)需要利(li)用(yong)經驗知(zhi)識(shi). 而多(duo)視圖(tu)的(de)(de)(de)(de)三(san)維(wei)(wei)重(zhong)(zhong)建(jian)(類似人的(de)(de)(de)(de)雙(shuang)目定位)相(xiang)對(dui)比(bi)較(jiao)容易, 其方法是(shi)先對(dui)攝(she)像(xiang)機(ji)(ji)進行標定, 即計(ji)算出(chu)攝(she)像(xiang)機(ji)(ji)的(de)(de)(de)(de)圖(tu)象(xiang)(xiang)坐標系與世界(jie)坐標系的(de)(de)(de)(de)關系.然后利(li)用(yong)多(duo)個二維(wei)(wei)圖(tu)象(xiang)(xiang)中的(de)(de)(de)(de)信(xin)息(xi)(xi)重(zhong)(zhong)建(jian)出(chu)三(san)維(wei)(wei)信(xin)息(xi)(xi)。物(wu)體(ti)(ti)三(san)維(wei)(wei)重(zhong)(zhong)建(jian)是(shi)計(ji)算機(ji)(ji)輔助幾(ji)(ji)(ji)何設(she)計(ji)(CAGD)、計(ji)算機(ji)(ji)圖(tu)形學(xue)(xue)(CG)、計(ji)算機(ji)(ji)動畫、計(ji)算機(ji)(ji)視覺、醫學(xue)(xue)圖(tu)像(xiang)處(chu)(chu)理(li)、科(ke)學(xue)(xue)計(ji)算和(he)虛(xu)擬(ni)現實(shi)(shi)、數字媒體(ti)(ti)創作等(deng)領域(yu)的(de)(de)(de)(de)共性科(ke)學(xue)(xue)問(wen)題(ti)和(he)核(he)心(xin)技術(shu)。在計(ji)算機(ji)(ji)內生(sheng)成物(wu)體(ti)(ti)三(san)維(wei)(wei)表示主要有兩(liang)類方法。一(yi)(yi)(yi)(yi)類是(shi)使用(yong)幾(ji)(ji)(ji)何建(jian)模軟(ruan)(ruan)件(jian)通過(guo)人機(ji)(ji)交互(hu)生(sheng)成人為控制(zhi)下的(de)(de)(de)(de)物(wu)體(ti)(ti)三(san)維(wei)(wei)幾(ji)(ji)(ji)何模型,另(ling)一(yi)(yi)(yi)(yi)類是(shi)通過(guo)一(yi)(yi)(yi)(yi)定的(de)(de)(de)(de)手段獲取(qu)(qu)真實(shi)(shi)物(wu)體(ti)(ti)的(de)(de)(de)(de)幾(ji)(ji)(ji)何形狀(zhuang)。前者實(shi)(shi)現技術(shu)已經十(shi)分(fen)成熟,現有若干軟(ruan)(ruan)件(jian)支持,比(bi)如:3DMAX、Maya、AutoCAD、UG等(deng)等(deng),它們(men)一(yi)(yi)(yi)(yi)般使用(yong)具有數學(xue)(xue)表達(da)式的(de)(de)(de)(de)曲線曲面表示幾(ji)(ji)(ji)何形狀(zhuang)。后者一(yi)(yi)(yi)(yi)般稱為三(san)維(wei)(wei)重(zhong)(zhong)建(jian)過(guo)程(cheng),三(san)維(wei)(wei)重(zhong)(zhong)建(jian)是(shi)指(zhi)利(li)用(yong)二維(wei)(wei)投影恢復物(wu)體(ti)(ti)三(san)維(wei)(wei)信(xin)息(xi)(xi)(形狀(zhuang)等(deng))的(de)(de)(de)(de)數學(xue)(xue)過(guo)程(cheng)和(he)計(ji)算機(ji)(ji)技術(shu),包括數據獲取(qu)(qu)、預(yu)處(chu)(chu)理(li)、點云(yun)拼接和(he)特(te)征分(fen)析等(deng)步(bu)驟。

壓縮感知 · Networking · Performer · Color · SOTA ·

2022 年 1 月 21 日

Two-Stage is Enough: A Concise Deep Unfolding Reconstruction Network for Flexible Video Compressive Sensing

Siming Zheng,Xiaoyu Yang,Xin Yuan

We consider the reconstruction problem of video compressive sensing (VCS) under the deep unfolding/rolling structure. Yet, we aim to build a flexible and concise model using minimum stages. Different from existing deep unfolding networks used for inverse problems, where more stages are used for higher performance but without flexibility to different masks and scales, hereby we show that a 2-stage deep unfolding network can lead to the state-of-the-art (SOTA) results (with a 1.7dB gain in PSNR over the single stage model, RevSCI) in VCS. The proposed method possesses the properties of adaptation to new masks and ready to scale to large data without any additional training thanks to the advantages of deep unfolding. Furthermore, we extend the proposed model for color VCS to perform joint reconstruction and demosaicing. Experimental results demonstrate that our 2-stage model has also achieved SOTA on color VCS reconstruction, leading to a >2.3dB gain in PSNR over the previous SOTA algorithm based on plug-and-play framework, meanwhile speeds up the reconstruction by >17 times. In addition, we have found that our network is also flexible to the mask modulation and scale size for color VCS reconstruction so that a single trained network can be applied to different hardware systems. The code and models will be released to the public.

視頻分類 · state-of-the-art · 變換 · MoDELS · INFORMS ·

2022 年 1 月 20 日

Multiview Transformers for Video Recognition

Shen Yan,Xuehan Xiong,Anurag Arnab,Zhichao Lu,Mi Zhang,Chen Sun,Cordelia Schmid

from arxiv, Tech report. arXiv v2: update results on MiT

Video understanding requires reasoning at multiple spatiotemporal resolutions -- from short fine-grained motions to events taking place over longer durations. Although transformer architectures have recently advanced the state-of-the-art, they have not explicitly modelled different spatiotemporal resolutions. To this end, we present Multiview Transformers for Video Recognition (MTV). Our model consists of separate encoders to represent different views of the input video with lateral connections to fuse information across views. We present thorough ablation studies of our model and show that MTV consistently performs better than single-view counterparts in terms of accuracy and computational cost across a range of model sizes. Furthermore, we achieve state-of-the-art results on five standard datasets, and improve even further with large-scale pretraining. We will release code and pretrained checkpoints.

3D · Extensibility · 塑造 · 三維重建 · 訓練數據 ·

2022 年 1 月 19 日

3D Shape Reconstruction from Free-Hand Sketches

Jiayun Wang,Jierui Lin,Qian Yu,Runtao Liu,Yubei Chen,Stella X. Yu

Sketches are the most abstract 2D representations of real-world objects. Although a sketch usually has geometrical distortion and lacks visual cues, humans can effortlessly envision a 3D object from it. This suggests that sketches encode the information necessary for reconstructing 3D shapes. Despite great progress achieved in 3D reconstruction from distortion-free line drawings, such as CAD and edge maps, little effort has been made to reconstruct 3D shapes from free-hand sketches. We study this task and aim to enhance the power of sketches in 3D-related applications such as interactive design and VR/AR games. Unlike previous works, which mostly study distortion-free line drawings, our 3D shape reconstruction is based on free-hand sketches. A major challenge for free-hand sketch 3D reconstruction comes from the insufficient training data and free-hand sketch diversity, e.g. individualized sketching styles. We thus propose data generation and standardization mechanisms. Instead of distortion-free line drawings, synthesized sketches are adopted as input training data. Additionally, we propose a sketch standardization module to handle different sketch distortions and styles. Extensive experiments demonstrate the effectiveness of our model and its strong generalizability to various free-hand sketches. Our code is publicly available at //github.com/samaonline/3D-Shape-Reconstruction-from-Free-Hand-Sketches.

contrastive · Vision · 變換 · 學成 · Performer ·

2022 年 1 月 19 日

RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training

Luya Wang,Feng Liang,Yangguang Li,Honggang Zhang,Wanli Ouyang,Jing Shao

Recently, self-supervised vision transformers have attracted unprecedented attention for their impressive representation learning ability. However, the dominant method, contrastive learning, mainly relies on an instance discrimination pretext task, which learns a global understanding of the image. This paper incorporates local feature learning into self-supervised vision transformers via Reconstructive Pre-training (RePre). Our RePre extends contrastive frameworks by adding a branch for reconstructing raw image pixels in parallel with the existing contrastive objective. RePre is equipped with a lightweight convolution-based decoder that fuses the multi-hierarchy features from the transformer encoder. The multi-hierarchy features provide rich supervisions from low to high semantic information, which are crucial for our RePre. Our RePre brings decent improvements on various contrastive frameworks with different vision transformer architectures. Transfer performance in downstream tasks outperforms supervised pre-training and state-of-the-art (SOTA) self-supervised counterparts.

帶符號距離 · Networking · Guidance · state-of-the-art · MoDELS ·

2022 年 1 月 17 日

Learning Implicit Body Representations from Double Diffusion Based Neural Radiance Fields

Guangming Yao,Hongzhi Wu,Yi Yuan,Lincheng Li,Kun Zhou,Xin Yu

from arxiv, 6 pages, 5 figures

In this paper, we present a novel double diffusion based neural radiance field, dubbed DD-NeRF, to reconstruct human body geometry and render the human body appearance in novel views from a sparse set of images. We first propose a double diffusion mechanism to achieve expressive representations of input images by fully exploiting human body priors and image appearance details at two levels. At the coarse level, we first model the coarse human body poses and shapes via an unclothed 3D deformable vertex model as guidance. At the fine level, we present a multi-view sampling network to capture subtle geometric deformations and image detailed appearances, such as clothing and hair, from multiple input views. Considering the sparsity of the two level features, we diffuse them into feature volumes in the canonical space to construct neural radiance fields. Then, we present a signed distance function (SDF) regression network to construct body surfaces from the diffused features. Thanks to our double diffused representations, our method can even synthesize novel views of unseen subjects. Experiments on various datasets demonstrate that our approach outperforms the state-of-the-art in both geometric reconstruction and novel view synthesis.

圖卷積神經網絡/圖卷積網絡 · 圖卷積 · Networking · 3D · 圖 ·

2020 年 3 月 12 日

Towards High-Fidelity 3D Face Reconstruction from In-the-Wild Images Using Graph Convolutional Networks

Jiangke Lin,Yi Yuan,Tianjia Shao,Kun Zhou

from arxiv, Accepted to CVPR 2020

3D Morphable Model (3DMM) based methods have achieved great success in recovering 3D face shapes from single-view images. However, the facial textures recovered by such methods lack the fidelity as exhibited in the input images. Recent work demonstrates high-quality facial texture recovering with generative networks trained from a large-scale database of high-resolution UV maps of face textures, which is hard to prepare and not publicly available. In this paper, we introduce a method to reconstruct 3D facial shapes with high-fidelity textures from single-view images in-the-wild, without the need to capture a large-scale face texture database. The main idea is to refine the initial texture generated by a 3DMM based method with facial details from the input image. To this end, we propose to use graph convolutional networks to reconstruct the detailed colors for the mesh vertices instead of reconstructing the UV map. Experiments show that our method can generate high-quality results and outperforms state-of-the-art methods in both qualitative and quantitative comparisons.

稀疏 · 三維重建 · 學成 · 有向 · SLAM ·

2019 年 3 月 21 日

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Jiexiong Tang,John Folkesson,Patric Jensfelt

from arxiv, Accepted to ICRA 2019 (RA-L option), video demo available at //www.youtube.com/watch?v=3pbSHX72JC8&t=22s

In this paper, we proposed a new deep learning based dense monocular SLAM method. Compared to existing methods, the proposed framework constructs a dense 3D model via a sparse to dense mapping using learned surface normals. With single view learned depth estimation as prior for monocular visual odometry, we obtain both accurate positioning and high quality depth reconstruction. The depth and normal are predicted by a single network trained in a tightly coupled manner.Experimental results show that our method significantly improves the performance of visual tracking and depth prediction in comparison to the state-of-the-art in deep monocular dense SLAM.

三維重建 · 示例 · Extensibility · 3D · 估計/估計量 ·

2019 年 2 月 26 日

Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding

Zehao Yu,Jia Zheng,Dongze Lian,Zihan Zhou,Shenghua Gao

from arxiv, To appear in CVPR 2019

Single-image piece-wise planar 3D reconstruction aims to simultaneously segment plane instances and recover 3D plane parameters from an image. Most recent approaches leverage convolutional neural networks (CNNs) and achieve promising results. However, these methods are limited to detecting a fixed number of planes with certain learned order. To tackle this problem, we propose a novel two-stage method based on associative embedding, inspired by its recent success in instance segmentation. In the first stage, we train a CNN to map each pixel to an embedding space where pixels from the same plane instance have similar embeddings. Then, the plane instances are obtained by grouping the embedding vectors in planar regions via an efficient mean shift clustering algorithm. In the second stage, we estimate the parameter for each plane instance by considering both pixel-level and instance-level consistencies. With the proposed method, we are able to detect an arbitrary number of planes. Extensive experiments on public datasets validate the effectiveness and efficiency of our method. Furthermore, our method runs at 30 fps at the testing time, thus could facilitate many real-time applications such as visual SLAM and human-robot interaction. Code is available at //github.com/svip-lab/PlanarReconstruction.

三維重建 · 3D · Networks · Networking · Neural Networks ·

2018 年 12 月 10 日

Occupancy Networks: Learning 3D Reconstruction in Function Space

Lars Mescheder,Michael Oechsle,Michael Niemeyer,Sebastian Nowozin,Andreas Geiger

With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose occupancy networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

3D · 學成 · 監督 · INFORMS · Performer ·

2018 年 11 月 15 日

Learning to Generate and Reconstruct 3D Meshes with only 2D Supervision

Paul Henderson,Vittorio Ferrari

from arxiv, BMVC 2018 (Oral). Differentiable renderer available at //github.com/pmh47/dirt

We present a unified framework tackling two problems: class-specific 3D reconstruction from a single image, and generation of new 3D shape samples. These tasks have received considerable attention recently; however, existing approaches rely on 3D supervision, annotation of 2D images with keypoints or poses, and/or training with multiple views of each object instance. Our framework is very general: it can be trained in similar settings to these existing approaches, while also supporting weaker supervision scenarios. Importantly, it can be trained purely from 2D images, without ground-truth pose annotations, and with a single view per instance. We employ meshes as an output representation, instead of voxels used in most prior work. This allows us to exploit shading information during training, which previous 2D-supervised methods cannot. Thus, our method can learn to generate and reconstruct concave object classes. We evaluate our approach on synthetic data in various settings, showing that (i) it learns to disentangle shape from pose; (ii) using shading in the loss improves performance; (iii) our model is comparable or superior to state-of-the-art voxel-based approaches on quantitative metrics, while producing results that are visually more pleasing; (iv) it still performs well when given supervision weaker than in prior works.