欧美精品日韩精品国内精品_国产精品性爱视频亚洲国产黄片_丝袜无码制服中文字幕_国产一区二区欧美一区二区_国产成A人亚洲精V品在线观看_高清精品完整版在线观看_日韩欧美午夜电影在线看

In this work, we present I$^2$-SDF, a new method for intrinsic indoor scene reconstruction and editing using differentiable Monte Carlo raytracing on neural signed distance fields (SDFs). Our holistic neural SDF-based framework jointly recovers the underlying shapes, incident radiance and materials from multi-view images. We introduce a novel bubble loss for fine-grained small objects and error-guided adaptive sampling scheme to largely improve the reconstruction quality on large-scale indoor scenes. Further, we propose to decompose the neural radiance field into spatially-varying material of the scene as a neural field through surface-based, differentiable Monte Carlo raytracing and emitter semantic segmentations, which enables physically based and photorealistic scene relighting and editing applications. Through a number of qualitative and quantitative experiments, we demonstrate the superior quality of our method on indoor scene reconstruction, novel view synthesis, and scene editing compared to state-of-the-art baselines.

相關內容

室內(nei)場景

關注 0

可理解性 · 3D · MoDELS · CASES · HTTPS ·

2023 年 5 月 19 日

Generating Visual Spatial Description via Holistic 3D Scene Understanding

Yu Zhao,Hao Fei,Wei Ji,Jianguo Wei,Meishan Zhang,Min Zhang,Tat-Seng Chua

Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images. Existing VSD work merely models the 2D geometrical vision features, thus inevitably falling prey to the problem of skewed spatial understanding of target objects. In this work, we investigate the incorporation of 3D scene features for VSD. With an external 3D scene extractor, we obtain the 3D objects and scene features for input images, based on which we construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes. Besides, we propose a scene subgraph selecting mechanism, sampling topologically-diverse subgraphs from Go3D-S2G, where the diverse local structure features are navigated to yield spatially-diversified text generation. Experimental results on two VSD datasets demonstrate that our framework outperforms the baselines significantly, especially improving on the cases with complex visual spatial relations. Meanwhile, our method can produce more spatially-diversified generation. Code is available at //github.com/zhaoyucs/VSD.

3D · NeRF · MoDELS · Extensibility · 估計/估計量 ·

2023 年 5 月 19 日

Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

Jingbo Zhang,Xiaoyu Li,Ziyu Wan,Can Wang,Jing Liao

from arxiv, Homepage: //eckertzhang.github.io/Text2NeRF.github.io/

Text-driven 3D scene generation is widely applicable to video gaming, film industry, and metaverse applications that have a large demand for 3D scenes. However, existing text-to-3D generation methods are limited to producing 3D objects with simple geometries and dreamlike styles that lack realism. In this work, we present Text2NeRF, which is able to generate a wide range of 3D scenes with complicated geometric structures and high-fidelity textures purely from a text prompt. To this end, we adopt NeRF as the 3D representation and leverage a pre-trained text-to-image diffusion model to constrain the 3D reconstruction of the NeRF to reflect the scene description. Specifically, we employ the diffusion model to infer the text-related image as the content prior and use a monocular depth estimation method to offer the geometric prior. Both content and geometric priors are utilized to update the NeRF model. To guarantee textured and geometric consistency between different views, we introduce a progressive scene inpainting and updating strategy for novel view synthesis of the scene. Our method requires no additional training data but only a natural language description of the scene as the input. Extensive experiments demonstrate that our Text2NeRF outperforms existing methods in producing photo-realistic, multi-view consistent, and diverse 3D scenes from a variety of natural language prompts.

歸納偏好 · 有偏 · 可約的 · Extensibility · INFORMS ·

2023 年 5 月 19 日

PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction

Hao Wu,Wei Xion,Fan Xu,Xiao Luo,Chong Chen,Xian-Sheng Hua,Haixin Wang

from arxiv, 11

In this paper, we investigate the challenge of spatio-temporal video prediction, which involves generating future videos based on historical data streams. Existing approaches typically utilize external information such as semantic maps to enhance video prediction, which often neglect the inherent physical knowledge embedded within videos. Furthermore, their high computational demands could impede their applications for high-resolution videos. To address these constraints, we introduce a novel approach called Physics-assisted Spatio-temporal Network (PastNet) for generating high-quality video predictions. The core of our PastNet lies in incorporating a spectral convolution operator in the Fourier domain, which efficiently introduces inductive biases from the underlying physical laws. Additionally, we employ a memory bank with the estimated intrinsic dimensionality to discretize local features during the processing of complex spatio-temporal signals, thereby reducing computational costs and facilitating efficient high-resolution video prediction. Extensive experiments on various widely-used datasets demonstrate the effectiveness and efficiency of the proposed PastNet compared with state-of-the-art methods, particularly in high-resolution scenarios.

可辨認的 · 損失 · state-of-the-art · HTTPS · 監督 ·

2023 年 5 月 18 日

Scribble-Supervised Target Extraction Method Based on Inner Structure-Constraint for Remote Sensing Images

Yitong Li,Chang Liu,Jie Ma

from arxiv, 5 pages, 4 figures, 1 table

Weakly supervised learning based on scribble annotations in target extraction of remote sensing images has drawn much interest due to scribbles' flexibility in denoting winding objects and low cost of manually labeling. However, scribbles are too sparse to identify object structure and detailed information, bringing great challenges in target localization and boundary description. To alleviate these problems, in this paper, we construct two inner structure-constraints, a deformation consistency loss and a trainable active contour loss, together with a scribble-constraint to supervise the optimization of the encoder-decoder network without introducing any auxiliary module or extra operation based on prior cues. Comprehensive experiments demonstrate our method's superiority over five state-of-the-art algorithms in this field. Source code is available at //github.com/yitongli123/ISC-TE.

LIDAR · 優化器 · Learning · 傳感器 · INFORMS ·

2023 年 5 月 17 日

Improving Extrinsics between RADAR and LIDAR using Learning

Peng Jiang,Srikanth Saripalli

from arxiv, accepted in IV 2023

LIDAR and RADAR are two commonly used sensors in autonomous driving systems. The extrinsic calibration between the two is crucial for effective sensor fusion. The challenge arises due to the low accuracy and sparse information in RADAR measurements. This paper presents a novel solution for 3D RADAR-LIDAR calibration in autonomous systems. The method employs simple targets to generate data, including correspondence registration and a one-step optimization algorithm. The optimization aims to minimize the reprojection error while utilizing a small multi-layer perception (MLP) to perform regression on the return energy of the sensor around the targets. The proposed approach uses a deep learning framework such as PyTorch and can be optimized through gradient descent. The experiment uses a 360-degree Ouster-128 LIDAR and a 360-degree Navtech RADAR, providing raw measurements. The results validate the effectiveness of the proposed method in achieving improved estimates of extrinsic calibration parameters.

Projection · 3D · 估計/估計量 · Networking · state-of-the-art ·

2023 年 5 月 17 日

Towards 3D Face Reconstruction in Perspective Projection: Estimating 6DoF Face Pose from Monocular Image

Yueying Kao,Bowen Pan,Miao Xu,Jiangjing Lyu,Xiangyu Zhu,Yuanzhang Chang,Xiaobo Li,Zhen Lei

from arxiv, Accepted by TIP

In 3D face reconstruction, orthogonal projection has been widely employed to substitute perspective projection to simplify the fitting process. This approximation performs well when the distance between camera and face is far enough. However, in some scenarios that the face is very close to camera or moving along the camera axis, the methods suffer from the inaccurate reconstruction and unstable temporal fitting due to the distortion under the perspective projection. In this paper, we aim to address the problem of single-image 3D face reconstruction under perspective projection. Specifically, a deep neural network, Perspective Network (PerspNet), is proposed to simultaneously reconstruct 3D face shape in canonical space and learn the correspondence between 2D pixels and 3D points, by which the 6DoF (6 Degrees of Freedom) face pose can be estimated to represent perspective projection. Besides, we contribute a large ARKitFace dataset to enable the training and evaluation of 3D face reconstruction solutions under the scenarios of perspective projection, which has 902,724 2D facial images with ground-truth 3D face mesh and annotated 6DoF pose parameters. Experimental results show that our approach outperforms current state-of-the-art methods by a significant margin. The code and data are available at //github.com/cbsropenproject/6dof_face.

3D · 數據集 · 相互獨立的 · state-of-the-art · 正則的 ·

2023 年 5 月 17 日

MVP-Human Dataset for 3D Human Avatar Reconstruction from Unconstrained Frames

Xiangyu Zhu,Tingting Liao,Jiangjing Lyu,Xiang Yan,Yunfeng Wang,Kan Guo,Qiong Cao,Stan Z. Li,Zhen Lei

from arxiv, Accepted by IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM)

In this paper, we consider a novel problem of reconstructing a 3D human avatar from multiple unconstrained frames, independent of assumptions on camera calibration, capture space, and constrained actions. The problem should be addressed by a framework that takes multiple unconstrained images as inputs, and generates a shape-with-skinning avatar in the canonical space, finished in one feed-forward pass. To this end, we present 3D Avatar Reconstruction in the wild (ARwild), which first reconstructs the implicit skinning fields in a multi-level manner, by which the image features from multiple images are aligned and integrated to estimate a pixel-aligned implicit function that represents the clothed shape. To enable the training and testing of the new framework, we contribute a large-scale dataset, MVP-Human (Multi-View and multi-Pose 3D Human), which contains 400 subjects, each of which has 15 scans in different poses and 8-view images for each pose, providing 6,000 3D scans and 48,000 images in total. Overall, benefits from the specific network architecture and the diverse data, the trained model enables 3D avatar reconstruction from unconstrained frames and achieves state-of-the-art performance.

三維重建 · 3D · Extensibility · 數據獲取 · state-of-the-art ·

2023 年 5 月 17 日

3D reconstruction from spherical images: A review of techniques, applications, and prospects

San Jiang,Yaxin Li,Duojie Weng,Kan You,Wu Chen

3D reconstruction plays an increasingly important role in modern photogrammetric systems. Conventional satellite or aerial-based remote sensing (RS) platforms can provide the necessary data sources for the 3D reconstruction of large-scale landforms and cities. Even with low-altitude UAVs (Unmanned Aerial Vehicles), 3D reconstruction in complicated situations, such as urban canyons and indoor scenes, is challenging due to frequent tracking failures between camera frames and high data collection costs. Recently, spherical images have been extensively used due to the capability of recording surrounding environments from one camera exposure. In contrast to perspective images with limited FOV (Field of View), spherical images can cover the whole scene with full horizontal and vertical FOV and facilitate camera tracking and data acquisition in these complex scenes. With the rapid evolution and extensive use of professional and consumer-grade spherical cameras, spherical images show great potential for the 3D modeling of urban and indoor scenes. Classical 3D reconstruction pipelines, however, cannot be directly used for spherical images. Besides, there exist few software packages that are designed for the 3D reconstruction of spherical images. As a result, this research provides a thorough survey of the state-of-the-art for 3D reconstruction of spherical images in terms of data acquisition, feature detection and matching, image orientation, and dense matching as well as presenting promising applications and discussing potential prospects. We anticipate that this study offers insightful clues to direct future research.

神經場 · 潛在 · 分層潛在擴散模型 · 擴散模型 · 模型生成 ·

2023 年 4 月 19 日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Seung Wook Kim,Bradley Brown,Kangxue Yin,Karsten Kreis,Katja Schwarz,Daiqing Li,Robin Rombach,Antonio Torralba,Sanja Fidler

from arxiv, CVPR 2023

Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene. To further compress this representation, we train a latent-autoencoder that maps the voxel grids to a set of latent representations. A hierarchical diffusion model is then fit to the latents to complete the scene generation pipeline. We achieve a substantial improvement over existing state-of-the-art scene generation models. Additionally, we show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.

圖 · Better · 學成 · 可理解性 · 特征提取 ·

2022 年 1 月 3 日

Scene Graph Generation: A Comprehensive Survey

Guangming Zhu,Liang Zhang,Youliang Jiang,Yixuan Dang,Haoran Hou,Peiyi Shen,Mingtao Feng,Xia Zhao,Qiguang Miao,Syed Afaq Ali Shah,Mohammed Bennamoun

from arxiv, Submitted to TPAMI

Deep learning techniques have led to remarkable breakthroughs in the field of generic object detection and have spawned a lot of scene-understanding tasks in recent years. Scene graph has been the focus of research because of its powerful semantic representation and applications to scene understanding. Scene Graph Generation (SGG) refers to the task of automatically mapping an image into a semantic structural scene graph, which requires the correct labeling of detected objects and their relationships. Although this is a challenging task, the community has proposed a lot of SGG approaches and achieved good results. In this paper, we provide a comprehensive survey of recent achievements in this field brought about by deep learning techniques. We review 138 representative works that cover different input modalities, and systematically summarize existing methods of image-based SGG from the perspective of feature extraction and fusion. We attempt to connect and systematize the existing visual relationship detection methods, to summarize, and interpret the mechanisms and the strategies of SGG in a comprehensive way. Finally, we finish this survey with deep discussions about current existing problems and future research directions. This survey will help readers to develop a better understanding of the current research status and ideas.