亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We present PHORHUM, a novel, end-to-end trainable, deep neural network methodology for photorealistic 3D human reconstruction given just a monocular RGB image. Our pixel-aligned method estimates detailed 3D geometry and, for the first time, the unshaded surface color together with the scene illumination. Observing that 3D supervision alone is not sufficient for high fidelity color reconstruction, we introduce patch-based rendering losses that enable reliable color reconstruction on visible parts of the human, and detailed and plausible color estimation for the non-visible parts. Moreover, our method specifically addresses methodological and practical limitations of prior work in terms of representing geometry, albedo, and illumination effects, in an end-to-end model where factors can be effectively disentangled. In extensive experiments, we demonstrate the versatility and robustness of our approach. Our state-of-the-art results validate the method qualitatively and for different metrics, for both geometric and color reconstruction.

相關內容

Wireless sensor networks are among the most promising technologies of the current era because of their small size, lower cost, and ease of deployment. With the increasing number of wireless sensors, the probability of generating missing data also rises. This incomplete data could lead to disastrous consequences if used for decision-making. There is rich literature dealing with this problem. However, most approaches show performance degradation when a sizable amount of data is lost. Inspired by the emerging field of graph signal processing, this paper performs a new study of a Sobolev reconstruction algorithm in wireless sensor networks. Experimental comparisons on several publicly available datasets demonstrate that the algorithm surpasses multiple state-of-the-art techniques by a maximum margin of 54%. We further show that this algorithm consistently retrieves the missing data even during massive data loss situations.

Deep image prior (DIP) was recently introduced as an effective unsupervised approach for image restoration tasks. DIP represents the image to be recovered as the output of a deep convolutional neural network, and learns the network's parameters such that the output matches the corrupted observation. Despite its impressive reconstructive properties, the approach is slow when compared to supervisedly learned, or traditional reconstruction techniques. To address the computational challenge, we bestow DIP with a two-stage learning paradigm: (i) perform a supervised pretraining of the network on a simulated dataset; (ii) fine-tune the network's parameters to adapt to the target reconstruction task. We provide a thorough empirical analysis to shed insights into the impacts of pretraining in the context of image reconstruction. We showcase that pretraining considerably speeds up and stabilizes the subsequent reconstruction task from real-measured 2D and 3D micro computed tomography data of biological specimens. The code and additional experimental materials are available at //educateddip.github.io/docs.educated_deep_image_prior/.

The free-form deformation model can represent a wide range of non-rigid deformations by manipulating a control point lattice over the image. However, due to a large number of parameters, it is challenging to fit the free-form deformation model directly to the deformed image for deformation estimation because of the complexity of the fitness landscape. In this paper, we cast the registration task as a multi-objective optimization problem (MOP) according to the fact that regions affected by each control point overlap with each other. Specifically, by partitioning the template image into several regions and measuring the similarity of each region independently, multiple objectives are built and deformation estimation can thus be realized by solving the MOP with off-the-shelf multi-objective evolutionary algorithms (MOEAs). In addition, a coarse-to-fine strategy is realized by image pyramid combined with control point mesh subdivision. Specifically, the optimized candidate solutions of the current image level are inherited by the next level, which increases the ability to deal with large deformation. Also, a post-processing procedure is proposed to generate a single output utilizing the Pareto optimal solutions. Comparative experiments on both synthetic and real-world images show the effectiveness and usefulness of our deformation estimation method.

A self-driving perception model aims to extract 3D semantic representations from multiple cameras collectively into the bird's-eye-view (BEV) coordinate frame of the ego car in order to ground downstream planner. Existing perception methods often rely on error-prone depth estimation of the whole scene or learning sparse virtual 3D representations without the target geometry structure, both of which remain limited in performance and/or capability. In this paper, we present a novel end-to-end architecture for ego 3D representation learning from an arbitrary number of unconstrained camera views. Inspired by the ray tracing principle, we design a polarized grid of "imaginary eyes" as the learnable ego 3D representation and formulate the learning process with the adaptive attention mechanism in conjunction with the 3D-to-2D projection. Critically, this formulation allows extracting rich 3D representation from 2D images without any depth supervision, and with the built-in geometry structure consistent w.r.t. BEV. Despite its simplicity and versatility, extensive experiments on standard BEV visual tasks (e.g., camera-based 3D object detection and BEV segmentation) show that our model outperforms all state-of-the-art alternatives significantly, with an extra advantage in computational efficiency from multi-task learning.

This paper presents a novel attention-based neural network for structured reconstruction, which takes a 2D raster image as an input and reconstructs a planar graph depicting an underlying geometric structure. The approach detects corners and classifies edge candidates between corners in an end-to-end manner. Our contribution is a holistic edge classification architecture, which 1) initializes the feature of an edge candidate by a trigonometric positional encoding of its end-points; 2) fuses image feature to each edge candidate by deformable attention; 3) employs two weight-sharing Transformer decoders to learn holistic structural patterns over the graph edge candidates; and 4) is trained with a masked learning strategy. The corner detector is a variant of the edge classification architecture, adapted to operate on pixels as corner candidates. We conduct experiments on two structured reconstruction tasks: outdoor building architecture and indoor floorplan planar graph reconstruction. Extensive qualitative and quantitative evaluations demonstrate the superiority of our approach over the state of the art. Code and pre-trained models are available at //heat-structured-reconstruction.github.io.

Scene flow represents the motion of points in the 3D space, which is the counterpart of the optical flow that represents the motion of pixels in the 2D image. However, it is difficult to obtain the ground truth of scene flow in the real scenes, and recent studies are based on synthetic data for training. Therefore, how to train a scene flow network with unsupervised methods based on real-world data shows crucial significance. A novel unsupervised learning method for scene flow is proposed in this paper, which utilizes the images of two consecutive frames taken by monocular camera without the ground truth of scene flow for training. Our method realizes the goal that training scene flow network with real-world data, which bridges the gap between training data and test data and broadens the scope of available data for training. Unsupervised learning of scene flow in this paper mainly consists of two parts: (i) depth estimation and camera pose estimation, and (ii) scene flow estimation based on four different loss functions. Depth estimation and camera pose estimation obtain the depth maps and camera pose between two consecutive frames, which provide further information for the next scene flow estimation. After that, we used depth consistency loss, dynamic-static consistency loss, Chamfer loss, and Laplacian regularization loss to carry out unsupervised training of the scene flow network. To our knowledge, this is the first paper that realizes the unsupervised learning of 3D scene flow from monocular camera. The experiment results on KITTI show that our method for unsupervised learning of scene flow meets great performance compared to traditional methods Iterative Closest Point (ICP) and Fast Global Registration (FGR). The source code is available at: //github.com/IRMVLab/3DUnMonoFlow.

The labels of monocular 3D object detection (M3OD) are expensive to obtain. Meanwhile, there usually exists numerous unlabeled data in practical applications, and pre-training is an efficient way of exploiting the knowledge in unlabeled data. However, the pre-training paradigm for M3OD is hardly studied. We aim to bridge this gap in this work. To this end, we first draw two observations: (1) The guideline of devising pre-training tasks is imitating the representation of the target task. (2) Combining depth estimation and 2D object detection is a promising M3OD pre-training baseline. Afterwards, following the guideline, we propose several strategies to further improve this baseline, which mainly include target guided semi-dense depth estimation, keypoint-aware 2D object detection, and class-level loss adjustment. Combining all the developed techniques, the obtained pre-training framework produces pre-trained backbones that improve M3OD performance significantly on both the KITTI-3D and nuScenes benchmarks. For example, by applying a DLA34 backbone to a naive center-based M3OD detector, the moderate ${\rm AP}_{3D}70$ score of Car on the KITTI-3D testing set is boosted by 18.71\% and the NDS score on the nuScenes validation set is improved by 40.41\% relatively.

Neural implicit functions have recently shown promising results on surface reconstructions from multiple views. However, current methods still suffer from excessive time complexity and poor robustness when reconstructing unbounded or complex scenes. In this paper, we present RegSDF, which shows that proper point cloud supervisions and geometry regularizations are sufficient to produce high-quality and robust reconstruction results. Specifically, RegSDF takes an additional oriented point cloud as input, and optimizes a signed distance field and a surface light field within a differentiable rendering framework. We also introduce the two critical regularizations for this optimization. The first one is the Hessian regularization that smoothly diffuses the signed distance values to the entire distance field given noisy and incomplete input. And the second one is the minimal surface regularization that compactly interpolates and extrapolates the missing geometry. Extensive experiments are conducted on DTU, BlendedMVS, and Tanks and Temples datasets. Compared with recent neural surface reconstruction approaches, RegSDF is able to reconstruct surfaces with fine details even for open scenes with complex topologies and unstructured camera trajectories.

In this work we present a novel method for reconstructing 3D surfaces using a multi-beam imaging sonar. We integrate the intensities measured by the sonar from different viewpoints for fixed cell positions in a 3D grid. For each cell we integrate a feature vector that holds the mean intensity for a discretized range of viewpoints. Based on the feature vectors and independent sparse range measurements that act as ground truth information, we train convolutional neural networks that allow us to predict the signed distance and direction to the nearest surface for each cell. The predicted signed distances can be projected into a truncated signed distance field (TSDF) along the predicted directions. Utilizing the marching cubes algorithm, a polygon mesh can be rendered from the TSDF. Our method allows a dense 3D reconstruction from a limited set of viewpoints and was evaluated on three real-world datasets.

We present a monocular Simultaneous Localization and Mapping (SLAM) using high level object and plane landmarks, in addition to points. The resulting map is denser, more compact and meaningful compared to point only SLAM. We first propose a high order graphical model to jointly infer the 3D object and layout planes from single image considering occlusions and semantic constraints. The extracted cuboid object and layout planes are further optimized in a unified SLAM framework. Objects and planes can provide more semantic constraints such as Manhattan and object supporting relationships compared to points. Experiments on various public and collected datasets including ICL NUIM and TUM mono show that our algorithm can improve camera localization accuracy compared to state-of-the-art SLAM and also generate dense maps in many structured environments.

北京阿比特科技有限公司