一级a视频免费一区二区,亚洲国产中文在线有精品,亚洲国产一级毛片首次亮相,日批视频网站免费看

We consider a category-level perception problem, where one is given 2D or 3D sensor data picturing an object of a given category (e.g., a car), and has to reconstruct the 3D pose and shape of the object despite intra-class variability (i.e., different car models have different shapes). We consider an active shape model, where -for an object category- we are given a library of potential CAD models describing objects in that category, and we adopt a standard formulation where pose and shape are estimated from 2D or 3D keypoints via non-convex optimization. Our first contribution is to develop PACE3D* and PACE2D*, the first certifiably optimal solvers for pose and shape estimation using 3D and 2D keypoints, respectively. Both solvers rely on the design of tight (i.e., exact) semidefinite relaxations. Our second contribution is to develop outlier-robust versions of both solvers, named PACE3D# and PACE2D#. Towards this goal, we propose ROBIN, a general graph-theoretic framework to prune outliers, which uses compatibility hypergraphs to model measurements' compatibility. We show that in category-level perception problems these hypergraphs can be built from the winding orders of the keypoints (in 2D) or their convex hulls (in 3D), and many outliers can be filtered out via maximum hyperclique computation. The last contribution is an extensive experimental evaluation. Besides providing an ablation study on simulated datasets and on the PASCAL3D+ dataset, we combine our solver with a deep keypoint detector, and show that PACE3D# improves over the state of the art in vehicle pose estimation in the ApolloScape datasets, and its runtime is compatible with practical applications. We release our code at //github.com/MIT-SPARK/PACE.

相關內容

估計/估計量

關注 3

估計/估計量 · Learning · MoDELS · Extensibility · Performer ·

2023 年 6 月 29 日

Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation

Zhongwei Qiu,Qiansheng Yang,Jian Wang,Xiyu Wang,Chang Xu,Dongmei Fu,Kun Yao,Junyu Han,Errui Ding,Jingdong Wang

One of the mainstream schemes for 2D human pose estimation (HPE) is learning keypoints heatmaps by a neural network. Existing methods typically improve the quality of heatmaps by customized architectures, such as high-resolution representation and vision Transformers. In this paper, we propose \textbf{DiffusionPose}, a new scheme that formulates 2D HPE as a keypoints heatmaps generation problem from noised heatmaps. During training, the keypoints are diffused to random distribution by adding noises and the diffusion model learns to recover ground-truth heatmaps from noised heatmaps with respect to conditions constructed by image feature. During inference, the diffusion model generates heatmaps from initialized heatmaps in a progressive denoising way. Moreover, we further explore improving the performance of DiffusionPose with conditions from human structural information. Extensive experiments show the prowess of our DiffusionPose, with improvements of 1.6, 1.2, and 1.2 mAP on widely-used COCO, CrowdPose, and AI Challenge datasets, respectively.

MoDELS · Vision · 機器人 · Continuity · state-of-the-art ·

2023 年 6 月 29 日

Spatial Reasoning via Deep Vision Models for Robotic Sequential Manipulation

Hongyou Zhou,Ingmar Fabian Schubert,Marc Toussaint,Ozgur S. Oguz

from arxiv, 8 pages, 8 figures, IROS 2023

In this paper, we propose using deep neural architectures (i.e., vision transformers and ResNet) as heuristics for sequential decision-making in robotic manipulation problems. This formulation enables predicting the subset of objects that are relevant for completing a task. Such problems are often addressed by task and motion planning (TAMP) formulations combining symbolic reasoning and continuous motion planning. In essence, the action-object relationships are resolved for discrete, symbolic decisions that are used to solve manipulation motions (e.g., via nonlinear trajectory optimization). However, solving long-horizon tasks requires consideration of all possible action-object combinations which limits the scalability of TAMP approaches. To overcome this combinatorial complexity, we introduce a visual perception module integrated with a TAMP-solver. Given a task and an initial image of the scene, the learned model outputs the relevancy of objects to accomplish the task. By incorporating the predictions of the model into a TAMP formulation as a heuristic, the size of the search space is significantly reduced. Results show that our framework finds feasible solutions more efficiently when compared to a state-of-the-art TAMP solver.

端到端 · 跡 · 基準 · 卡爾曼濾波 · Better ·

2023 年 6 月 29 日

MotionTrack: End-to-End Transformer-based Multi-Object Tracing with LiDAR-Camera Fusion

Ce Zhang,Chengjie Zhang,Yiluan Guo,Lingji Chen,Michael Happold

from arxiv, This paper is accepted by CVPR WAD 2023

Multiple Object Tracking (MOT) is crucial to autonomous vehicle perception. End-to-end transformer-based algorithms, which detect and track objects simultaneously, show great potential for the MOT task. However, most existing methods focus on image-based tracking with a single object category. In this paper, we propose an end-to-end transformer-based MOT algorithm (MotionTrack) with multi-modality sensor inputs to track objects with multiple classes. Our objective is to establish a transformer baseline for the MOT in an autonomous driving environment. The proposed algorithm consists of a transformer-based data association (DA) module and a transformer-based query enhancement module to achieve MOT and Multiple Object Detection (MOD) simultaneously. The MotionTrack and its variations achieve better results (AMOTA score at 0.55) on the nuScenes dataset compared with other classical baseline models, such as the AB3DMOT, the CenterTrack, and the probabilistic 3D Kalman filter. In addition, we prove that a modified attention mechanism can be utilized for DA to accomplish the MOT, and aggregate history features to enhance the MOD performance.

3D · 優化器 · 三維重建 · MoDELS · Extensibility ·

2023 年 6 月 29 日

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

Minghua Liu,Chao Xu,Haian Jin,Linghao Chen,Mukund Varma T,Zexiang Xu,Hao Su

from arxiv, project website: one-2-3-45.com

Single image 3D reconstruction is an important but challenging task that requires extensive knowledge of our natural world. Many existing methods solve this problem by optimizing a neural radiance field under the guidance of 2D diffusion models but suffer from lengthy optimization time, 3D inconsistency results, and poor geometry. In this work, we propose a novel method that takes a single image of any object as input and generates a full 360-degree 3D textured mesh in a single feed-forward pass. Given a single image, we first use a view-conditioned 2D diffusion model, Zero123, to generate multi-view images for the input view, and then aim to lift them up to 3D space. Since traditional reconstruction methods struggle with inconsistent multi-view predictions, we build our 3D reconstruction module upon an SDF-based generalizable neural surface reconstruction method and propose several critical training strategies to enable the reconstruction of 360-degree meshes. Without costly optimizations, our method reconstructs 3D shapes in significantly less time than existing methods. Moreover, our method favors better geometry, generates more 3D consistent results, and adheres more closely to the input image. We evaluate our approach on both synthetic data and in-the-wild images and demonstrate its superiority in terms of both mesh quality and runtime. In addition, our approach can seamlessly support the text-to-3D task by integrating with off-the-shelf text-to-image diffusion models.

統計量 · 損失 · 估計/估計量 · 相關系數 · 值域 ·

2023 年 6 月 28 日

Estimating the correlation between operational risk loss categories over different time horizons

Maurice L. Brown,Cheng Ly

from arxiv, 27 pages, 11 figures, 6 tables

Operational risk is challenging to quantify because of the broad range of categories (fraud, technological issues, natural disasters) and the heavy-tailed nature of realized losses. Operational risk modeling requires quantifying how these broad loss categories are related. We focus on the issue of loss frequencies having different time scales (e.g., daily, yearly, monthly basis), specifically on estimating the statistics of losses on arbitrary time horizons. We present a frequency model where mathematical techniques can be feasibly applied to analytically calculate the mean, variance, and co-variances that are accurate compared to more time-consuming Monte Carlo simulations. We show that the analytic calculations of cumulative loss statistics in an arbitrary time window are feasible here and would otherwise be intractable due to temporal correlations. Our work has potential value because these statistics are crucial for approximating correlations of losses via copulas. We systematically vary all model parameters to demonstrate the accuracy of our methods for calculating all first and second order statistics of aggregate loss distributions. Finally, using combined data from a consortium of institutions, we show that different time horizons can lead to a large range of loss statistics that can significantly affect calculations of capital requirements.

估計/估計量 · FAST · 示例 · MoDELS · 真正例 ·

2023 年 6 月 28 日

INSTA-BEEER: Explicit Error Estimation and Refinement for Fast and Accurate Unseen Object Instance Segmentation

Seunghyeok Back,Sangbeom Lee,Kangmin Kim,Joosoon Lee,Sungho Shin,Jaemo Maeng,Kyoobin Lee

from arxiv, 8 pages, 5 figures

Efficient and accurate segmentation of unseen objects is crucial for robotic manipulation. However, it remains challenging due to over- or under-segmentation. Although existing refinement methods can enhance the segmentation quality, they fix only minor boundary errors or are not sufficiently fast. In this work, we propose INSTAnce Boundary Explicit Error Estimation and Refinement (INSTA-BEEER), a novel refinement model that allows for adding and deleting instances and sharpening boundaries. Leveraging an error-estimation-then-refinement scheme, the model first estimates the pixel-wise boundary explicit errors: true positive, true negative, false positive, and false negative pixels of the instance boundary in the initial segmentation. It then refines the initial segmentation using these error estimates as guidance. Experiments show that the proposed model significantly enhances segmentation, achieving state-of-the-art performance. Furthermore, with a fast runtime (less than 0.1 s), the model consistently improves performance across various initial segmentation methods, making it highly suitable for practical robotic applications.

估計/估計量 · INFORMS · 圖形處理器 · Networking · state-of-the-art ·

2023 年 6 月 28 日

Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects

Alireza Rezazadeh,Snehal Dikhale,Soshi Iba,Nawid Jamali

Robotic manipulation, in particular in-hand object manipulation, often requires an accurate estimate of the object's 6D pose. To improve the accuracy of the estimated pose, state-of-the-art approaches in 6D object pose estimation use observational data from one or more modalities, e.g., RGB images, depth, and tactile readings. However, existing approaches make limited use of the underlying geometric structure of the object captured by these modalities, thereby, increasing their reliance on visual features. This results in poor performance when presented with objects that lack such visual features or when visual features are simply occluded. Furthermore, current approaches do not take advantage of the proprioceptive information embedded in the position of the fingers. To address these limitations, in this paper: (1) we introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data that allows for a geometrically informed 6D object pose estimation, (2) we introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation, and (3) we introduce a method that accounts for the proprioceptive information for in-hand object representation. We evaluate our model on a diverse subset of objects from the YCB Object and Model Set, and show that our method substantially outperforms existing state-of-the-art work in accuracy and robustness to occlusion. We also deploy our proposed framework on a real robot and qualitatively demonstrate successful transfer to real settings.

狀態估計 · 估計/估計量 · 規范化的 · Agent · INTERACT ·

2023 年 6 月 27 日

Deep Normalizing Flows for State Estimation

Harrison Delecki,Liam A. Kruse,Marc R. Schlichting,Mykel J. Kochenderfer

from arxiv, Accepted to FUSION 2023

Safe and reliable state estimation techniques are a critical component of next-generation robotic systems. Agents in such systems must be able to reason about the intentions and trajectories of other agents for safe and efficient motion planning. However, classical state estimation techniques such as Gaussian filters often lack the expressive power to represent complex underlying distributions, especially if the system dynamics are highly nonlinear or if the interaction outcomes are multi-modal. In this work, we use normalizing flows to learn an expressive representation of the belief over an agent's true state. Furthermore, we improve upon existing architectures for normalizing flows by using more expressive deep neural network architectures to parameterize the flow. We evaluate our method on two robotic state estimation tasks and show that our approach outperforms both classical and modern deep learning-based state estimation baselines.

3D · Continuity · 估計/估計量 · MoDELS · 正則化項 ·

2022 年 3 月 8 日

Recovering 3D Human Mesh from Monocular Images: A Survey

Yating Tian,Hongwen Zhang,Yebin Liu,Limin Wang

from arxiv, Survey paper on monocular 3D human mesh recovery, Project page: //github.com/tinatiansjz/hmr-survey

Estimating human pose and shape from monocular images is a long-standing problem in computer vision. Since the release of statistical body models, 3D human mesh recovery has been drawing broader attention. With the same goal of obtaining well-aligned and physically plausible mesh results, two paradigms have been developed to overcome challenges in the 2D-to-3D lifting process: i) an optimization-based paradigm, where different data terms and regularization terms are exploited as optimization objectives; and ii) a regression-based paradigm, where deep learning techniques are embraced to solve the problem in an end-to-end fashion. Meanwhile, continuous efforts are devoted to improving the quality of 3D mesh labels for a wide range of datasets. Though remarkable progress has been achieved in the past decade, the task is still challenging due to flexible body motions, diverse appearances, complex environments, and insufficient in-the-wild annotations. To the best of our knowledge, this is the first survey to focus on the task of monocular 3D human mesh recovery. We start with the introduction of body models and then elaborate recovery frameworks and training objectives by providing in-depth analyses of their strengths and weaknesses. We also summarize datasets, evaluation metrics, and benchmark results. Open issues and future directions are discussed in the end, hoping to motivate researchers and facilitate their research in this area. A regularly updated project page can be found at //github.com/tinatiansjz/hmr-survey.

估計/估計量 · 3D · 全 · 塑造 · 真實值 ·

2019 年 3 月 3 日

3D Hand Shape and Pose Estimation from a Single RGB Image

Liuhao Ge,Zhou Ren,Yuncheng Li,Zehao Xue,Yingying Wang,Jianfei Cai,Junsong Yuan

from arxiv, CVPR 2019 (Oral), //sites.google.com/site/geliuhaontu/home/cvpr2019

This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully express the 3D shape of hand. In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. To train networks with full supervision, we create a large-scale synthetic dataset containing both ground truth 3D meshes and 3D poses. When fine-tuning the networks on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth map as a weak supervision in training. Through extensive evaluations on our proposed new datasets and two public datasets, we show that our proposed method can produce accurate and reasonable 3D hand mesh, and can achieve superior 3D hand pose estimation accuracy when compared with state-of-the-art methods.