欧美丰满大乳屁股流白浆_亚洲天堂AV一区二区在线观看_欧美性爱婷婷网站_最新亚洲人成在线无码中字_成人午夜国产福到在线_亚洲成A人一区二区三区四区_午夜精品无码区久久

The intimate entanglement between objects affordances and human poses is of large interest, among others, for behavioural sciences, cognitive psychology, and Computer Vision communities. In recent years, the latter has developed several object-centric approaches: starting from items, learning pipelines synthesizing human poses and dynamics in a realistic way, satisfying both geometrical and functional expectations. However, the inverse perspective is significantly less explored: Can we infer 3D objects and their poses from human interactions alone? Our investigation follows this direction, showing that a generic 3D human point cloud is enough to pop up an unobserved object, even when the user is just imitating a functionality (e.g., looking through a binocular) without involving a tangible counterpart. We validate our method qualitatively and quantitatively, with synthetic data and sequences acquired for the task, showing applicability for XR/VR. The code is available at //github.com/ptrvilya/object-popup.

相關內容

INTERACT

關注 5

IFIP TC13 Conference on Human-Computer Interaction是人機交互領域的研究者和實踐者展示其工作的重要平臺。多年來，這些會議吸引了來自幾個國家和文化的研究人員。官網鏈接： · R-CNN · 3D · 目標檢測 · INFORMS ·

2023 年 7 月 21 日

Redemption from Range-view for Accurate 3D Object Detection

Yihan Wang,Qiao Yan

Most recent approaches for 3D object detection predominantly rely on point-view or bird's-eye view representations, with limited exploration of range-view-based methods. The range-view representation suffers from scale variation and surface texture deficiency, both of which pose significant limitations for developing corresponding methods. Notably, the surface texture loss problem has been largely ignored by all existing methods, despite its significant impact on the accuracy of range-view-based 3D object detection. In this study, we propose Redemption from Range-view R-CNN (R2 R-CNN), a novel and accurate approach that comprehensively explores the range-view representation. Our proposed method addresses scale variation through the HD Meta Kernel, which captures range-view geometry information in multiple scales. Additionally, we introduce Feature Points Redemption (FPR) to recover the lost 3D surface texture information from the range view, and Synchronous-Grid RoI Pooling (S-Grid RoI Pooling), a multi-scaled approach with multiple receptive fields for accurate box refinement. Our R2 R-CNN outperforms existing range-view-based methods, achieving state-of-the-art performance on both the KITTI benchmark and the Waymo Open Dataset. Our study highlights the critical importance of addressing the surface texture loss problem for accurate 3D object detection in range-view-based methods. Codes will be made publicly available.

Liquid · 機器人 · SimPLe · 數據集 · MoDELS ·

2023 年 7 月 21 日

PourIt!: Weakly-supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring

Haitao Lin,Yanwei Fu,Xiangyang Xue

from arxiv, ICCV2023

Liquid perception is critical for robotic pouring tasks. It usually requires the robust visual detection of flowing liquid. However, while recent works have shown promising results in liquid perception, they typically require labeled data for model training, a process that is both time-consuming and reliant on human labor. To this end, this paper proposes a simple yet effective framework PourIt!, to serve as a tool for robotic pouring tasks. We design a simple data collection pipeline that only needs image-level labels to reduce the reliance on tedious pixel-wise annotations. Then, a binary classification model is trained to generate Class Activation Map (CAM) that focuses on the visual difference between these two kinds of collected data, i.e., the existence of liquid drop or not. We also devise a feature contrast strategy to improve the quality of the CAM, thus entirely and tightly covering the actual liquid regions. Then, the container pose is further utilized to facilitate the 3D point cloud recovery of the detected liquid region. Finally, the liquid-to-container distance is calculated for visual closed-loop control of the physical robot. To validate the effectiveness of our proposed method, we also contribute a novel dataset for our task and name it PourIt! dataset. Extensive results on this dataset and physical Franka robot have shown the utility and effectiveness of our method in the robotic pouring tasks. Our dataset, code and pre-trained models will be available on the project page.

機器人 · Performer · 論文 · 回合 · Agent ·

2023 年 7 月 20 日

Adapting to Human Preferences to Lead or Follow in Human-Robot Collaboration: A System Evaluation

Ali Noormohammadi-Asl,Ali Ayub,Stephen L. Smith,Kerstin Dautenhahn

With the introduction of collaborative robots, humans and robots can now work together in close proximity and share the same workspace. However, this collaboration presents various challenges that need to be addressed to ensure seamless cooperation between the agents. This paper focuses on task planning for human-robot collaboration, taking into account the human's performance and their preference for following or leading. Unlike conventional task allocation methods, the proposed system allows both the robot and human to select and assign tasks to each other. Our previous studies evaluated the proposed framework in a computer simulation environment. This paper extends the research by implementing the algorithm in a real scenario where a human collaborates with a Fetch mobile manipulator robot. We briefly describe the experimental setup, procedure and implementation of the planned user study. As a first step, in this paper, we report on a system evaluation study where the experimenter enacted different possible behaviours in terms of leader/follower preferences that can occur in a user study. Results show that the robot can adapt and respond appropriately to different human agent behaviours, enacted by the experimenter. A future user study will evaluate the system with human participants.

流 · 講稿 · Performance · 可約的 · INTERACT ·

2023 年 7 月 20 日

Investigating VTubing as a Reconstruction of Streamer Self-Presentation: Identity, Performance, and Gender

Qian Wan,Zhicong Lu

from arxiv, Under review at ACM CSCW after a Major Revision

VTubers, or Virtual YouTubers, are live streamers who create streaming content using animated 2D or 3D virtual avatars. In recent years, there has been a significant increase in the number of VTuber creators and viewers across the globe. This practise has drawn research attention into topics such as viewers' engagement behaviors and perceptions, however, as animated avatars offer more identity and performance flexibility than traditional live streaming where one uses their own body, little research has focused on how this flexibility influences how creators present themselves. This research thus seeks to fill this gap by presenting results from a qualitative study of 16 Chinese-speaking VTubers' streaming practices. The data revealed that the virtual avatars that were used while live streaming afforded creators opportunities to present themselves using inflated presentations and resulted in inclusive interactions with viewers. The results also unveiled the inflated, and often sexualized, gender expressions of VTubers while they were situated in misogynistic environments. The socio-technical facets of VTubing were found to potentially reduce sexual harassment and sexism, whilst also raising self-objectification concerns.

Networking · 層 · Analysis · 虛擬現實（VR） · 3D ·

2023 年 7 月 20 日

2D, 2.5D, or 3D? An Exploratory Study on Multilayer Network Visualisations in Virtual Reality

Stefan P. Feyer,Bruno Pinaud,Stephen G. Kobourov,Nicolas Brich,Michael Krone,Andreas Kerren,Michael Behrisch,Falk Schreiber,Karsten Klein

from arxiv, IEEE Transactions on Visualization and Computer Graphics, In press, To appear. Accepted to IEEE VIS 2023

Relational information between different types of entities is often modelled by a multilayer network (MLN) -- a network with subnetworks represented by layers. The layers of an MLN can be arranged in different ways in a visual representation, however, the impact of the arrangement on the readability of the network is an open question. Therefore, we studied this impact for several commonly occurring tasks related to MLN analysis. Additionally, layer arrangements with a dimensionality beyond 2D, which are common in this scenario, motivate the use of stereoscopic displays. We ran a human subject study utilising a Virtual Reality headset to evaluate 2D, 2.5D, and 3D layer arrangements. The study employs six analysis tasks that cover the spectrum of an MLN task taxonomy, from path finding and pattern identification to comparisons between and across layers. We found no clear overall winner. However, we explore the task-to-arrangement space and derive empirical-based recommendations on the effective use of 2D, 2.5D, and 3D layer arrangements for MLNs.

塑造 · 3D · 模型評估 · Extensibility · INTERACT ·

2023 年 7 月 19 日

3Deformer: A Common Framework for Image-Guided Mesh Deformation

Hao Su,Xuefeng Liu,Jianwei Niu,Ji Wan,Xinghao Wu

We propose 3Deformer, a general-purpose framework for interactive 3D shape editing. Given a source 3D mesh with semantic materials, and a user-specified semantic image, 3Deformer can accurately edit the source mesh following the shape guidance of the semantic image, while preserving the source topology as rigid as possible. Recent studies of 3D shape editing mostly focus on learning neural networks to predict 3D shapes, which requires high-cost 3D training datasets and is limited to handling objects involved in the datasets. Unlike these studies, our 3Deformer is a non-training and common framework, which only requires supervision of readily-available semantic images, and is compatible with editing various objects unlimited by datasets. In 3Deformer, the source mesh is deformed utilizing the differentiable renderer technique, according to the correspondences between semantic images and mesh materials. However, guiding complex 3D shapes with a simple 2D image incurs extra challenges, that is, the deform accuracy, surface smoothness, geometric rigidity, and global synchronization of the edited mesh should be guaranteed. To address these challenges, we propose a hierarchical optimization architecture to balance the global and local shape features, and propose further various strategies and losses to improve properties of accuracy, smoothness, rigidity, and so on. Extensive experiments show that our 3Deformer is able to produce impressive results and reaches the state-of-the-art level.

NeRF · INTERACT · MoDELS · VR · 3D ·

2023 年 7 月 19 日

Magic NeRF Lens: Interactive Fusion of Neural Radiance Fields for Virtual Facility Inspection

Ke Li,Susanne Schmidt,Tim Rolff,Reinhard Bacher,Wim Leemans,Frank Steinicke

from arxiv, This work has been submitted to the IEEE TVCG for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Large industrial facilities such as particle accelerators and nuclear power plants are critical infrastructures for scientific research and industrial processes. These facilities are complex systems that not only require regular maintenance and upgrades but are often inaccessible to humans due to various safety hazards. Therefore, a virtual reality (VR) system that can quickly replicate real-world remote environments to provide users with a high level of spatial and situational awareness is crucial for facility maintenance planning. However, the exact 3D shapes of these facilities are often too complex to be accurately modeled with geometric primitives through the traditional rasterization pipeline. In this work, we develop Magic NeRF Lens, an interactive framework to support facility inspection in immersive VR using neural radiance fields (NeRF) and volumetric rendering. We introduce a novel data fusion approach that combines the complementary strengths of volumetric rendering and geometric rasterization, allowing a NeRF model to be merged with other conventional 3D data, such as a computer-aided design model. We develop two novel 3D magic lens effects to optimize NeRF rendering by exploiting the properties of human vision and context-aware visualization. We demonstrate the high usability of our framework and methods through a technical benchmark, a visual search user study, and expert reviews. In addition, the source code of our VR NeRF framework is made publicly available for future research and development.

離散化 · Performer · 表示學習 · 學成 · 向量化 ·

2021 年 6 月 10 日

Cross-Modal Discrete Representation Learning

Alexander H. Liu,SouYoung Jin,Cheng-I Jeff Lai,Andrew Rouditchenko,Aude Oliva,James Glass

from arxiv, Preprint

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector. In this work we present a self-supervised learning framework that is able to learn a representation that captures finer levels of granularity across different modalities such as concepts or events represented by visual objects or spoken words. Our framework relies on a discretized embedding space created via vector quantization that is shared across different modalities. Beyond the shared embedding space, we propose a Cross-Modal Code Matching objective that forces the representations from different views (modalities) to have a similar distribution over the discrete embedding space such that cross-modal objects/actions localization can be performed without direct supervision. In our experiments we show that the proposed discretized multi-modal fine-grained representation (e.g., pixel/word/frame) can complement high-level summary representations (e.g., video/sentence/waveform) for improved performance on cross-modal retrieval tasks. We also observe that the discretized representation uses individual clusters to represent the same semantic concept across modalities.

對象識別 · MoDELS · Backbone · Extensibility · 學成 ·

2020 年 3 月 31 日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Mohan Zhou,Yalong Bai,Wei Zhang,Tiejun Zhao,Tao Mei

from arxiv, 10 pages, 7 figures, accepted by CVPR 2020

Most object recognition approaches predominantly focus on learning discriminative visual patterns while overlooking the holistic object structure. Though important, structure modeling usually requires significant manual annotations and therefore is labor-intensive. In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. We show the recognition backbone can be substantially enhanced for more robust representation learning, without any cost of extra annotation and inference speed. Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category. We then design a spatial context learning module for modeling the internal structures of the object, through predicting the relative positions within the extent. These two modules can be easily plugged into any backbone networks during training and detached at inference time. Extensive experiments show that our look-into-object approach (LIO) achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft). We also show that this learning paradigm is highly generalizable to other tasks such as object detection and segmentation (MS COCO). Project page: //github.com/JDAI-CV/LIO.

估計/估計量 · 3D · 全 · 塑造 · 真實值 ·

2019 年 3 月 3 日

3D Hand Shape and Pose Estimation from a Single RGB Image

Liuhao Ge,Zhou Ren,Yuncheng Li,Zehao Xue,Yingying Wang,Jianfei Cai,Junsong Yuan

from arxiv, CVPR 2019 (Oral), //sites.google.com/site/geliuhaontu/home/cvpr2019

This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully express the 3D shape of hand. In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. To train networks with full supervision, we create a large-scale synthetic dataset containing both ground truth 3D meshes and 3D poses. When fine-tuning the networks on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth map as a weak supervision in training. Through extensive evaluations on our proposed new datasets and two public datasets, we show that our proposed method can produce accurate and reasonable 3D hand mesh, and can achieve superior 3D hand pose estimation accuracy when compared with state-of-the-art methods.