美国式禁忌电影在线观看免费观看_亚洲国产精品成人综合一区_XX欧美精品在线观看_日韩精品无码人妻免费视频_亚洲区一区二区三在线视频_婷婷精品囤产AV麻豆不片_午夜福利综合在线导航

Medical applications have benefited from the rapid advancement in computer vision. For patient monitoring in particular, in-bed human posture estimation provides important health-related metrics with potential value in medical condition assessments. Despite great progress in this domain, it remains a challenging task due to substantial ambiguity during occlusions, and the lack of large corpora of manually labeled data for model training, particularly with domains such as thermal infrared imaging which are privacy-preserving, and thus of great interest. Motivated by the effectiveness of self-supervised methods in learning features directly from data, we propose a multi-modal conditional variational autoencoder (MC-VAE) capable of reconstructing features from missing modalities seen during training. This approach is used with HRNet to enable single modality inference for in-bed pose estimation. Through extensive evaluations, we demonstrate that body positions can be effectively recognized from the available modality, achieving on par results with baseline models that are highly dependent on having access to multiple modes at inference time. The proposed framework supports future research towards self-supervised learning that generates a robust model from a single source, and expects it to generalize over many unknown distributions in clinical environments.

相關內容

估(gu)計/估(gu)計量

關注 3

Automator · Performer · 估計/估計量 · 數據集 · 模型評估 ·

2022 年 2 月 1 日

ADG-Pose: Automated Dataset Generation for Real-World Human Pose Estimation

Ghazal Alinezhad Noghre,Armin Danesh Pazho,Justin Sanchez,Nathan Hewitt,Christopher Neff,Hamed Tabkhi

from arxiv, The first two authors have equal contribution. 12 Pages

Recent advancements in computer vision have seen a rise in the prominence of applications using neural networks to understand human poses. However, while accuracy has been steadily increasing on State-of-the-Art datasets, these datasets often do not address the challenges seen in real-world applications. These challenges are dealing with people distant from the camera, people in crowds, and heavily occluded people. As a result, many real-world applications have trained on data that does not reflect the data present in deployment, leading to significant underperformance. This article presents ADG-Pose, a method for automatically generating datasets for real-world human pose estimation. These datasets can be customized to determine person distances, crowdedness, and occlusion distributions. Models trained with our method are able to perform in the presence of these challenges where those trained on other datasets fail. Using ADG-Pose, end-to-end accuracy for real-world skeleton-based action recognition sees a 20% increase on scenes with moderate distance and occlusion levels, and a 4X increase on distant scenes where other models failed to perform better than random.

圖像檢索 · Performer · SimPLe · 描述符 · 相同 ·

2022 年 2 月 1 日

A Privacy-Preserving Image Retrieval Scheme with a Mixture of Plain and EtC Images

Kenta Iida,Hitoshi Kiya

from arxiv, This paper will be presented at IEEE LifeTech 2022. arXiv admin note: text overlap with arXiv:2011.00270

In this paper, we propose a novel content-based image-retrieval scheme that allows us to use a mixture of plain images and compressible encrypted ones called "encryption-then-compression (EtC) images." In the proposed scheme, extended SIMPLE descriptors are extracted from EtC images as well as from plain ones, so the mixed use of plain and encrypted images is available for image retrieval. In an experiment, the proposed scheme was demonstrated to have almost the same retrieval performance as that for plain images, even with a mixture of plain and encrypted images.

卷積 · Performer · Neural Networks · Networking · 卷積神經網絡 ·

2022 年 1 月 29 日

A Novel Matrix-Encoding Method for Privacy-Preserving Neural Networks (Inference)

John Chiang

from arxiv, The encoding method we proposed in this work, $\texttt{Volley Revolver}$, is particularly tailored for privacy-preserving neural networks and probably can be used to assist the private neural networks training, in which case for the backpropagation algorithm of the fully connected layer the first matrix is revolved while the second matrix is settled to be still

In this work, we present $\texttt{Volley Revolver}$, a novel matrix-encoding method that is particularly convenient for privacy-preserving neural networks to make predictions, and use it to implement a CNN for handwritten image classification. Based on this encoding method, we develop several additional operations for putting into practice the secure matrix multiplication over encrypted data matrices. For two matrices $A$ and $B$ to perform multiplication $A \times B$, the main idea is, in a simple version, to encrypt matrix $A$ and the transposition of the matrix $B$ into two ciphertexts respectively. Along with the additional operations, the homomorphic matrix multiplication $A \times B$ can be calculated over encrypted data matrices efficiently. For the convolution operation in CNN, on the basis of the $\texttt{Volley Revolver}$ encoding method, we develop a feasible and efficient evaluation strategy for performing the convolution operation. We in advance span each convolution kernel of CNN to a matrix space of the same size as the input image so as to generate several ciphertexts, each of which is later used together with the input image for calculating some part of the final convolution result. We accumulate all these part results of convolution operation and thus obtain the final convolution result.

欠采樣 · 壓縮感知 · 去噪 · Performer · 泛化理論 ·

2022 年 1 月 29 日

Validation and Generalizability of Self-Supervised Image Reconstruction Methods for Undersampled MRI

Thomas Yu,Tom Hilbert,Gian Franco Piredda,Arun Joseph,Gabriele Bonanno,Salim Zenkhri,Patrick Omoumi,Meritxell Bach Cuadra,Erick Jorge Canales-Rodríguez,Tobias Kober,Jean-Philippe Thiran

from arxiv, Submitted to Magnetic Resonance in Medicine

Purpose: To investigate aspects of the validation of self-supervised algorithms for reconstruction of undersampled MR images: quantitative evaluation of prospective reconstructions, potential differences between prospective and retrospective reconstructions, suitability of commonly used quantitative metrics, and generalizability. Theory and Methods: Two self-supervised algorithms based on self-supervised denoising and neural network image priors were investigated. These methods are compared to a least squares fitting and a compressed sensing reconstruction using in-vivo and phantom data. Their generalizability was tested with prospectively under-sampled data from experimental conditions different to the training. Results: Prospective reconstructions can exhibit significant distortion relative to retrospective reconstructions/ground truth. Pixel-wise quantitative metrics may not capture differences in perceptual quality accurately, in contrast to a perceptual metric. All methods showed potential for generalization; generalizability is more affected by changes in anatomy/contrast than other changes. No-reference image metrics correspond well with human rating of image quality for studying generalizability. Compressed Sensing and learned denoising perform similarly well on all data. Conclusion: Self-supervised methods show promising results for accelerating image reconstruction in clinical routines. Nonetheless, more work is required to investigate standardized methods to validate reconstruction algorithms for future clinical use.

Single-Shot · 估計/估計量 · 無監督 · Performer · Extensibility ·

2022 年 1 月 28 日

Unsupervised Single-shot Depth Estimation using Perceptual Reconstruction

Christoph Angermann,Matthias Schwab,Markus Haltmeier,Christian Laubichler,Steinbj?rn Jónsson

from arxiv, submitted to the International Conference on Machine Learning (ICML) 2022. arXiv admin note: text overlap with arXiv:2103.16938

Real-time estimation of actual object depth is a module that is essential to performing various autonomous system tasks such as 3D reconstruction, scene understanding and condition assessment of machinery parts. During the last decade of machine learning, extensive deployment of deep learning methods to computer vision tasks has yielded approaches that succeed in achieving realistic depth synthesis out of a simple RGB modality. While most of these models are based on paired depth data or availability of video sequences and stereo images, methods for single-view depth synthesis in a fully unsupervised setting have hardly been explored. This study presents the most recent advances in the field of generative neural networks, leveraging them to perform fully unsupervised single-shot depth synthesis. Two generators for RGB-to-depth and depth-to-RGB transfer are implemented and simultaneously optimized using the Wasserstein-1 distance and a novel perceptual reconstruction term. To ensure that the proposed method is plausible, we comprehensively evaluate the models using industrial surface depth data as well as the Texas 3D Face Recognition Database and the SURREAL dataset that records body depth. The success observed in this study suggests the great potential for unsupervised single-shot depth estimation in real-world applications.

估計/估計量 · 穩健性 · MoDELS · 3D · Extensibility ·

2021 年 5 月 10 日

HuMoR: 3D Human Motion Model for Robust Pose Estimation

Davis Rempe,Tolga Birdal,Aaron Hertzmann,Jimei Yang,Srinath Sridhar,Leonidas J. Guibas

We introduce HuMoR: a 3D Human Motion Model for Robust Estimation of temporal pose and shape. Though substantial progress has been made in estimating 3D human motion and shape from dynamic observations, recovering plausible pose sequences in the presence of noise and occlusions remains a challenge. For this purpose, we propose an expressive generative model in the form of a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence. Furthermore, we introduce a flexible optimization-based approach that leverages HuMoR as a motion prior to robustly estimate plausible pose and shape from ambiguous observations. Through extensive evaluations, we demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset, and enables motion reconstruction from multiple input modalities including 3D keypoints and RGB(-D) videos.

估計/估計量 · Performer · 3D · 數據集 · HTTPS ·

2020 年 12 月 24 日

Deep Learning-Based Human Pose Estimation: A Survey

Ce Zheng,Wenhan Wu,Taojiannan Yang,Sijie Zhu,Chen Chen,Ruixu Liu,Ju Shen,Nasser Kehtarnavaz,Mubarak Shah

Human pose estimation aims to locate the human body parts and build human body representation (e.g., body skeleton) from input data such as images and videos. It has drawn increasing attention during the past decade and has been utilized in a wide range of applications including human-computer interaction, motion analysis, augmented reality, and virtual reality. Although the recently developed deep learning-based solutions have achieved high performance in human pose estimation, there still remain challenges due to insufficient training data, depth ambiguities, and occlusions. The goal of this survey paper is to provide a comprehensive review of recent deep learning-based solutions for both 2D and 3D pose estimation via a systematic analysis and comparison of these solutions based on their input data and inference procedures. More than 240 research papers since 2014 are covered in this survey. Furthermore, 2D and 3D human pose estimation datasets and evaluation metrics are included. Quantitative performance comparisons of the reviewed methods on popular datasets are summarized and discussed. Finally, the challenges involved, applications, and future research directions are concluded. We also provide a regularly updated project page on: \url{//github.com/zczcwh/DL-HPE}

MetaFuse · MoDELS · 估計/估計量 · Pair · state-of-the-art ·

2020 年 3 月 30 日

MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation

Rongchang Xie,Chunyu Wang,Yizhou Wang

from arxiv, Accepted to CVPR2020

Cross view feature fusion is the key to address the occlusion problem in human pose estimation. The current fusion methods need to train a separate model for every pair of cameras making them difficult to scale. In this work, we introduce MetaFuse, a pre-trained fusion model learned from a large number of cameras in the Panoptic dataset. The model can be efficiently adapted or finetuned for a new pair of cameras using a small number of labeled images. The strong adaptation power of MetaFuse is due in large part to the proposed factorization of the original fusion model into two parts (1) a generic fusion model shared by all cameras, and (2) lightweight camera-dependent transformations. Furthermore, the generic model is learned from many cameras by a meta-learning style algorithm to maximize its adaptation capability to various camera poses. We observe in experiments that MetaFuse finetuned on the public datasets outperforms the state-of-the-arts by a large margin which validates its value in practice.

估計/估計量 · 3D · 全 · 塑造 · 真實值 ·

2019 年 3 月 3 日

3D Hand Shape and Pose Estimation from a Single RGB Image

Liuhao Ge,Zhou Ren,Yuncheng Li,Zehao Xue,Yingying Wang,Jianfei Cai,Junsong Yuan

from arxiv, CVPR 2019 (Oral), //sites.google.com/site/geliuhaontu/home/cvpr2019

This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully express the 3D shape of hand. In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. To train networks with full supervision, we create a large-scale synthetic dataset containing both ground truth 3D meshes and 3D poses. When fine-tuning the networks on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth map as a weak supervision in training. Through extensive evaluations on our proposed new datasets and two public datasets, we show that our proposed method can produce accurate and reasonable 3D hand mesh, and can achieve superior 3D hand pose estimation accuracy when compared with state-of-the-art methods.

估計/估計量 · state-of-the-art · MoDELS · 數據集 · Performer ·

2018 年 4 月 13 日

Fine-Grained Head Pose Estimation Without Keypoints

Nataniel Ruiz,Eunji Chong,James M. Rehg

from arxiv, Accepted to Computer Vision and Pattern Recognition Workshops (CVPRW), 2018 IEEE Conference on. IEEE, 2018

Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is a fragile method because it relies entirely on landmark detection performance, the extraneous head model and an ad-hoc fitting step. We present an elegant and robust way to determine pose by training a multi-loss convolutional neural network on 300W-LP, a large synthetically expanded dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from image intensities through joint binned pose classification and regression. We present empirical tests on common in-the-wild pose benchmark datasets which show state-of-the-art results. Additionally we test our method on a dataset usually used for pose estimation using depth and start to close the gap with state-of-the-art depth pose methods. We open-source our training and testing code as well as release our pre-trained models.