成人艳情一二三区按摩_国产精品亚洲综合久久_龟费黄色网站免费AV_免费一级作爱片在线观看无码_91茄子国产线观看免费_一区二区三区在线观看_日韩AV在线免费观看

We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input. Unlike existing indoor 3D detection methods that struggle to model scene geometry, our method makes novel use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance. Specifically, to avoid the significant extra latency associated with per-scene optimization of NeRF, we introduce sufficient geometry priors to enhance the generalizability of NeRF-MLP. Furthermore, we subtly connect the detection and NeRF branches through a shared MLP, enabling an efficient adaptation of NeRF to detection and yielding geometry-aware volumetric representations for 3D detection. Our method outperforms state-of-the-arts by 3.9 mAP and 3.1 mAP on the ScanNet and ARKITScenes benchmarks, respectively. We provide extensive analysis to shed light on how NeRF-Det works. As a result of our joint-training design, NeRF-Det is able to generalize well to unseen scenes for object detection, view synthesis, and depth estimation tasks without requiring per-scene optimization. Code is available at \url{//github.com/facebookresearch/NeRF-Det}.

相關內容

關注 36

3D是(shi)英文“Three Dimensions”的簡稱，中文是(shi)指三(san)維、三(san)個(ge)維度、三(san)個(ge)坐(zuo)標，即有(you)(you)長(chang)、有(you)(you)寬、有(you)(you)高，換句話(hua)說，就是(shi)立體(ti)的，是(shi)相對(dui)于只有(you)(you)長(chang)和寬的平面（2D）而言(yan)。

估計/估計量 · 泛化理論 · state-of-the-art · Integration · 值域 ·

2023 年 9 月 18 日

GEDepth: Ground Embedding for Monocular Depth Estimation

Xiaodong Yang,Zhuang Ma,Zhiyu Ji,Zhe Ren

from arxiv, ICCV 2023

Monocular depth estimation is an ill-posed problem as the same 2D image can be projected from infinite 3D scenes. Although the leading algorithms in this field have reported significant improvement, they are essentially geared to the particular compound of pictorial observations and camera parameters (i.e., intrinsics and extrinsics), strongly limiting their generalizability in real-world scenarios. To cope with this challenge, this paper proposes a novel ground embedding module to decouple camera parameters from pictorial cues, thus promoting the generalization capability. Given camera parameters, the proposed module generates the ground depth, which is stacked with the input image and referenced in the final depth prediction. A ground attention is designed in the module to optimally combine ground depth with residual depth. Our ground embedding is highly flexible and lightweight, leading to a plug-in module that is amenable to be integrated into various depth estimation networks. Experiments reveal that our approach achieves the state-of-the-art results on popular benchmarks, and more importantly, renders significant generalization improvement on a wide range of cross-domain tests.

Learning · 機器人 · 可約的 · Boosting（一種模型訓練加速方式） · Processing（編程語言） ·

2023 年 9 月 18 日

DFL-TORO: A One-Shot Demonstration Framework for Learning Time-Optimal Robotic Manufacturing Tasks

Alireza Barekatain,Hamed Habibi,Holger Voos

from arxiv, 7 pages, 7 figures

This paper presents DFL-TORO, a novel Demonstration Framework for Learning Time-Optimal Robotic tasks via One-shot kinesthetic demonstration. It aims at optimizing the process of Learning from Demonstration (LfD), applied in the manufacturing sector. As the effectiveness of LfD is challenged by the quality and efficiency of human demonstrations, our approach offers a streamlined method to intuitively capture task requirements from human teachers, by reducing the need for multiple demonstrations. Furthermore, we propose an optimization-based smoothing algorithm that ensures time-optimal and jerk-regulated demonstration trajectories, while also adhering to the robot's kinematic constraints. The result is a significant reduction in noise, thereby boosting the robot's operation efficiency. Evaluations using a Franka Emika Research 3 (FR3) robot for a reaching task further substantiate the efficacy of our framework, highlighting its potential to transform kinesthetic demonstrations in contemporary manufacturing environments.

RGB-D · Learning · 表示學習 · 表示 · INFORMS ·

2023 年 9 月 18 日

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

Bowen Yin,Xuying Zhang,Zhongyu Li,Li Liu,Ming-Ming Cheng,Qibin Hou

We present DFormer, a novel RGB-D pretraining framework to learn transferable representations for RGB-D segmentation tasks. DFormer has two new key innovations: 1) Unlike previous works that aim to encode RGB features,DFormer comprises a sequence of RGB-D blocks, which are tailored for encoding both RGB and depth information through a novel building block design; 2) We pre-train the backbone using image-depth pairs from ImageNet-1K, and thus the DFormer is endowed with the capacity to encode RGB-D representations. It avoids the mismatched encoding of the 3D geometry relationships in depth maps by RGB pre-trained backbones, which widely lies in existing methods but has not been resolved. We fine-tune the pre-trained DFormer on two popular RGB-D tasks, i.e., RGB-D semantic segmentation and RGB-D salient object detection, with a lightweight decoder head. Experimental results show that our DFormer achieves new state-of-the-art performance on these two tasks with less than half of the computational cost of the current best methods on two RGB-D segmentation datasets and five RGB-D saliency datasets. Our code is available at: //github.com/VCIP-RGBD/DFormer.

數據集 · DATE · Hugging Face · 設計 · 近似 ·

2023 年 9 月 18 日

HumTrans: A Novel Open-Source Dataset for Humming Melody Transcription and Beyond

Shansong Liu,Xu Li,Dian Li,Ying Shan

This paper introduces the HumTrans dataset, which is publicly available and primarily designed for humming melody transcription. The dataset can also serve as a foundation for downstream tasks such as humming melody based music generation. It consists of 500 musical compositions of different genres and languages, with each composition divided into multiple segments. In total, the dataset comprises 1000 music segments. To collect this humming dataset, we employed 10 college students, all of whom are either music majors or proficient in playing at least one musical instrument. Each of them hummed every segment twice using the web recording interface provided by our designed website. The humming recordings were sampled at a frequency of 44,100 Hz. During the humming session, the main interface provides a musical score for students to reference, with the melody audio playing simultaneously to aid in capturing both melody and rhythm. The dataset encompasses approximately 56.22 hours of audio, making it the largest known humming dataset to date. The dataset will be released on Hugging Face, and we will provide a GitHub repository containing baseline results and evaluation codes.

MINE · MoDELS · INFORMS · 端到端 · Unstructured ·

2023 年 9 月 17 日

AutoAM: An End-To-End Neural Model for Automatic and Universal Argument Mining

Lang Cao

Argument mining is to analyze argument structure and extract important argument information from unstructured text. An argument mining system can help people automatically gain causal and logical information behind the text. As argumentative corpus gradually increases, like more people begin to argue and debate on social media, argument mining from them is becoming increasingly critical. However, argument mining is still a big challenge in natural language tasks due to its difficulty, and relative techniques are not mature. For example, research on non-tree argument mining needs to be done more. Most works just focus on extracting tree structure argument information. Moreover, current methods cannot accurately describe and capture argument relations and do not predict their types. In this paper, we propose a novel neural model called AutoAM to solve these problems. We first introduce the argument component attention mechanism in our model. It can capture the relevant information between argument components, so our model can better perform argument mining. Our model is a universal end-to-end framework, which can analyze argument structure without constraints like tree structure and complete three subtasks of argument mining in one model. The experiment results show that our model outperforms the existing works on several metrics in two public datasets.

Performer · 最優化 · 穩健性 · 路徑 · 確切的 ·

2023 年 9 月 16 日

CppFlow: Generative Inverse Kinematics for Efficient and Robust Cartesian Path Planning

Jeremy Morgan,David Millard,Gaurav S. Sukhatme

In this work we present CppFlow - a novel and performant planner for the Cartesian Path Planning problem, which finds valid trajectories up to 129x faster than current methods, while also succeeding on more difficult problems where others fail. At the core of the proposed algorithm is the use of a learned, generative Inverse Kinematics solver, which is able to efficiently produce promising entire candidate solution trajectories on the GPU. Precise, valid solutions are then found through classical approaches such as differentiable programming, global search, and optimization. In combining approaches from these two paradigms we get the best of both worlds - efficient approximate solutions from generative AI which are made exact using the guarantees of traditional planning and optimization. We evaluate our system against other state of the art methods on a set of established baselines as well as new ones introduced in this work and find that our method significantly outperforms others in terms of the time to find a valid solution and planning success rate, and performs comparably in terms of trajectory length over time. The work is made open source and available for use upon acceptance.

小樣本學習 · Learning · INFORMS · Networking · MoDELS ·

2023 年 9 月 15 日

Self-Correlation and Cross-Correlation Learning for Few-Shot Remote Sensing Image Semantic Segmentation

Linhan Wang,Shuo Lei,Jianfeng He,Shengkun Wang,Min Zhang,Chang-Tien Lu

from arxiv, 10 pages, 6 figures. Accepted to Sigspatial 2023. arXiv admin note: text overlap with arXiv:2104.01538 by other authors

Remote sensing image semantic segmentation is an important problem for remote sensing image interpretation. Although remarkable progress has been achieved, existing deep neural network methods suffer from the reliance on massive training data. Few-shot remote sensing semantic segmentation aims at learning to segment target objects from a query image using only a few annotated support images of the target class. Most existing few-shot learning methods stem primarily from their sole focus on extracting information from support images, thereby failing to effectively address the large variance in appearance and scales of geographic objects. To tackle these challenges, we propose a Self-Correlation and Cross-Correlation Learning Network for the few-shot remote sensing image semantic segmentation. Our model enhances the generalization by considering both self-correlation and cross-correlation between support and query images to make segmentation predictions. To further explore the self-correlation with the query image, we propose to adopt a classical spectral method to produce a class-agnostic segmentation mask based on the basic visual information of the image. Extensive experiments on two remote sensing image datasets demonstrate the effectiveness and superiority of our model in few-shot remote sensing image semantic segmentation. Code and models will be accessed at //github.com/linhanwang/SCCNet.

Networking · 優化器 · HTTPS · 3D · 數據集 ·

2023 年 9 月 14 日

MC-NeRF: Muti-Camera Neural Radiance Fields for Muti-Camera Image Acquisition Systems

Yu Gao,Lutong Su,Hao Liang,Yufeng Yue,Yi Yang,Mengyin Fu

from arxiv, This manuscript is currently under review

Neural Radiance Fields (NeRF) employ multi-view images for 3D scene representation and have shown remarkable performance. As one of the primary sources of multi-view images, multi-camera systems encounter challenges such as varying intrinsic parameters and frequent pose changes. Most previous NeRF-based methods often assume a global unique camera and seldom consider scenarios with multiple cameras. Besides, some pose-robust methods still remain susceptible to suboptimal solutions when poses are poor initialized. In this paper, we propose MC-NeRF, a method can jointly optimize both intrinsic and extrinsic parameters for bundle-adjusting Neural Radiance Fields. Firstly, we conduct a theoretical analysis to tackle the degenerate case and coupling issue that arise from the joint optimization between intrinsic and extrinsic parameters. Secondly, based on the proposed solutions, we introduce an efficient calibration image acquisition scheme for multi-camera systems, including the design of calibration object. Lastly, we present a global end-to-end network with training sequence that enables the regression of intrinsic and extrinsic parameters, along with the rendering network. Moreover, most existing datasets are designed for unique camera, we create a new dataset that includes four different styles of multi-camera acquisition systems, allowing readers to generate custom datasets. Experiments confirm the effectiveness of our method when each image corresponds to different camera parameters. Specifically, we adopt up to 110 images with 110 different intrinsic and extrinsic parameters, to achieve 3D scene representation without providing initial poses. The Code and supplementary materials are available at //in2-viaun.github.io/MC-NeRF.

跡 · Extensibility · 路徑 · 重要性采樣 · state-of-the-art ·

2023 年 9 月 14 日

NeFII: Inverse Rendering for Reflectance Decomposition with Near-Field Indirect Illumination

Haoqian Wu,Zhipeng Hu,Lincheng Li,Yongqiang Zhang,Changjie Fan,Xin Yu

from arxiv, Accepted in CVPR 2023

Inverse rendering methods aim to estimate geometry, materials and illumination from multi-view RGB images. In order to achieve better decomposition, recent approaches attempt to model indirect illuminations reflected from different materials via Spherical Gaussians (SG), which, however, tends to blur the high-frequency reflection details. In this paper, we propose an end-to-end inverse rendering pipeline that decomposes materials and illumination from multi-view images, while considering near-field indirect illumination. In a nutshell, we introduce the Monte Carlo sampling based path tracing and cache the indirect illumination as neural radiance, enabling a physics-faithful and easy-to-optimize inverse rendering method. To enhance efficiency and practicality, we leverage SG to represent the smooth environment illuminations and apply importance sampling techniques. To supervise indirect illuminations from unobserved directions, we develop a novel radiance consistency constraint between implicit neural radiance and path tracing results of unobserved rays along with the joint optimization of materials and illuminations, thus significantly improving the decomposition performance. Extensive experiments demonstrate that our method outperforms the state-of-the-art on multiple synthetic and real datasets, especially in terms of inter-reflection decomposition.Our code and data are available at //woolseyyy.github.io/nefii/.

語音識別 · Learning · HMM · Processing（編程語言） · MoDELS ·

2023 年 9 月 11 日

Hybrid ASR for Resource-Constrained Robots: HMM - Deep Learning Fusion

Anshul Ranjan,Kaushik Jegadeesan

from arxiv, To be published in IEEE Access, 9 pages, 14 figures, Received valuable support from CCBD PESU, for associated code, see //github.com/AnshulRanjan2004/PyHMM

This paper presents a novel hybrid Automatic Speech Recognition (ASR) system designed specifically for resource-constrained robots. The proposed approach combines Hidden Markov Models (HMMs) with deep learning models and leverages socket programming to distribute processing tasks effectively. In this architecture, the HMM-based processing takes place within the robot, while a separate PC handles the deep learning model. This synergy between HMMs and deep learning enhances speech recognition accuracy significantly. We conducted experiments across various robotic platforms, demonstrating real-time and precise speech recognition capabilities. Notably, the system exhibits adaptability to changing acoustic conditions and compatibility with low-power hardware, making it highly effective in environments with limited computational resources. This hybrid ASR paradigm opens up promising possibilities for seamless human-robot interaction. In conclusion, our research introduces a pioneering dimension to ASR techniques tailored for robotics. By employing socket programming to distribute processing tasks across distinct devices and strategically combining HMMs with deep learning models, our hybrid ASR system showcases its potential to enable robots to comprehend and respond to spoken language adeptly, even in environments with restricted computational resources. This paradigm sets a innovative course for enhancing human-robot interaction across a wide range of real-world scenarios.