91婷婷国产精选国产色,日韩A精品日韩精品无码,久久精品国产99精品国产2,亚洲欧洲成人A在线观看

This paper proposes a novel concept of a hybrid tactile display with multistimulus feedback, allowing the real-time experience of the position, shape, and texture of the virtual object. The key technology of the TeslaMirror is that we can deliver the sensation of object parameters (pressure, vibration, and electrotactile feedback) without any wearable haptic devices. We developed the full digital twin of the 6 DOF UR robot in the virtual reality (VR) environment, allowing the adaptive surface simulation and control of the hybrid display in real-time. The preliminary user study was conducted to evaluate the ability of TeslaMirror to reproduce shape sensations with the under-actuated end-effector. The results revealed that potentially this approach can be used in the virtual systems for rendering versatile VR shapes with high fidelity haptic experience.

相關內容

塑造

關注 1

機器人 · MoDELS · 多峰值 · 回合 · 生成式人工智能 ·

2024 年 1 月 29 日

CognitiveOS: Large Multimodal Model based System to Endow Any Type of Robot with Generative AI

Artem Lykov,Mikhail Konenkov,Koffivi Fidèle Gbagbe,Mikhail Litvinov,Robinroy Peter,Denis Davletshin,Aleksey Fedoseev,Oleg Kobzarev,Ali Alabbas,Oussama Alyounes,Miguel Altamirano Cabrera,Dzmitry Tsetserukou

from arxiv, Paper submitted to CHI 2024

This paper introduces CognitiveOS, a disruptive system based on multiple transformer-based models, endowing robots of various types with cognitive abilities not only for communication with humans but also for task resolution through physical interaction with the environment. The system operates smoothly on different robotic platforms without extra tuning. It autonomously makes decisions for task execution by analyzing the environment and using information from its long-term memory. The system underwent testing on various platforms, including quadruped robots and manipulator robots, showcasing its capability to formulate behavioral plans even for robots whose behavioral examples were absent in the training dataset. Experimental results revealed the system's high performance in advanced task comprehension and adaptability, emphasizing its potential for real-world applications. The chapters of this paper describe the key components of the system and the dataset structure. The dataset for fine-tuning step generation model is provided at the following link: link coming soon

控制器 · MoDELS · Attention · 稀疏 · 預測器/決策函數 ·

2024 年 1 月 29 日

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Xiaoyu Shi,Zhaoyang Huang,Fu-Yun Wang,Weikang Bian,Dasong Li,Yi Zhang,Manyuan Zhang,Ka Chun Cheung,Simon See,Hongwei Qin,Jifeng Da,Hongsheng Li

We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video mapping, Motion-I2V factorizes I2V into two stages with explicit motion modeling. For the first stage, we propose a diffusion-based motion field predictor, which focuses on deducing the trajectories of the reference image's pixels. For the second stage, we propose motion-augmented temporal attention to enhance the limited 1-D temporal attention in video latent diffusion models. This module can effectively propagate reference image's feature to synthesized frames with the guidance of predicted trajectories from the first stage. Compared with existing methods, Motion-I2V can generate more consistent videos even at the presence of large motion and viewpoint variation. By training a sparse trajectory ControlNet for the first stage, Motion-I2V can support users to precisely control motion trajectories and motion regions with sparse trajectory and region annotations. This offers more controllability of the I2V process than solely relying on textual instructions. Additionally, Motion-I2V's second stage naturally supports zero-shot video-to-video translation. Both qualitative and quantitative comparisons demonstrate the advantages of Motion-I2V over prior approaches in consistent and controllable image-to-video generation.

INTERACT · 可約的 · MoDELS · Integration · 生成模型 ·

2024 年 1 月 28 日

IntentTuner: An Interactive Framework for Integrating Human Intents in Fine-tuning Text-to-Image Generative Models

Xingchen Zeng,Ziyao Gao,Yilin Ye,Wei Zeng

from arxiv, 26 pages, 10 figures

Fine-tuning facilitates the adaptation of text-to-image generative models to novel concepts (e.g., styles and portraits), empowering users to forge creatively customized content. Recent efforts on fine-tuning focus on reducing training data and lightening computation overload but neglect alignment with user intentions, particularly in manual curation of multi-modal training data and intent-oriented evaluation. Informed by a formative study with fine-tuning practitioners for comprehending user intentions, we propose IntentTuner, an interactive framework that intelligently incorporates human intentions throughout each phase of the fine-tuning workflow. IntentTuner enables users to articulate training intentions with imagery exemplars and textual descriptions, automatically converting them into effective data augmentation strategies. Furthermore, IntentTuner introduces novel metrics to measure user intent alignment, allowing intent-aware monitoring and evaluation of model training. Application exemplars and user studies demonstrate that IntentTuner streamlines fine-tuning, reducing cognitive effort and yielding superior models compared to the common baseline tool.

多峰值 · Integration · Performer · HTTPS · 系統設計 ·

2024 年 1 月 26 日

RABBIT: A Robot-Assisted Bed Bathing System with Multimodal Perception and Integrated Compliance

Rishabh Madan,Skyler Valdez,David Kim,Sujie Fang,Luoyan Zhong,Diego Virtue,Tapomayukh Bhattacharjee

from arxiv, 10 pages, 8 figures, 19th Annual ACM/IEEE International Conference on Human Robot Interaction (HRI)

This paper introduces RABBIT, a novel robot-assisted bed bathing system designed to address the growing need for assistive technologies in personal hygiene tasks. It combines multimodal perception and dual (software and hardware) compliance to perform safe and comfortable physical human-robot interaction. Using RGB and thermal imaging to segment dry, soapy, and wet skin regions accurately, RABBIT can effectively execute washing, rinsing, and drying tasks in line with expert caregiving practices. Our system includes custom-designed motion primitives inspired by human caregiving techniques, and a novel compliant end-effector called Scrubby, optimized for gentle and effective interactions. We conducted a user study with 12 participants, including one participant with severe mobility limitations, demonstrating the system's effectiveness and perceived comfort. Supplementary material and videos can be found on our website //emprise.cs.cornell.edu/rabbit.

Storage · 代碼 · 損失 · binary · Performer ·

2024 年 1 月 26 日

Shift-Interleave Coding for DNA-Based Storage: Correction of IDS Errors and Sequence Losses

Ryo Shibata,Haruhiko Kaneko

from arxiv, submitted to IEEE conference

We propose a novel coding scheme for DNA-based storage systems, called the shift-interleave (SI) coding, designed to correct insertion, deletion, and substitution (IDS) errors, as well as sequence losses. The SI coding scheme employs multiple codewords from two binary low-density parity-check codes. These codewords are processed to form DNA base sequences through shifting, bit-to-base mapping, and interleaving. At the receiver side, an efficient non-iterative detection and decoding scheme is employed to sequentially estimate codewords. The numerical results demonstrate the excellent performance of the SI coding scheme in correcting both IDS errors and sequence losses.

HTTPS · cancer · Weight · MoDELS · Processing（編程語言） ·

2024 年 1 月 25 日

MicroSegNet: A Deep Learning Approach for Prostate Segmentation on Micro-Ultrasound Images

Hongxu Jiang,Muhammad Imran,Preethika Muralidharan,Anjali Patel,Jake Pensa,Muxuan Liang,Tarik Benidir,Joseph R. Grajo,Jason P. Joseph,Russell Terry,John Michael DiBianco,Li-Ming Su,Yuyin Zhou,Wayne G. Brisbane,Wei Shao

Micro-ultrasound (micro-US) is a novel 29-MHz ultrasound technique that provides 3-4 times higher resolution than traditional ultrasound, potentially enabling low-cost, accurate diagnosis of prostate cancer. Accurate prostate segmentation is crucial for prostate volume measurement, cancer diagnosis, prostate biopsy, and treatment planning. However, prostate segmentation on micro-US is challenging due to artifacts and indistinct borders between the prostate, bladder, and urethra in the midline. This paper presents MicroSegNet, a multi-scale annotation-guided transformer UNet model designed specifically to tackle these challenges. During the training process, MicroSegNet focuses more on regions that are hard to segment (hard regions), characterized by discrepancies between expert and non-expert annotations. We achieve this by proposing an annotation-guided binary cross entropy (AG-BCE) loss that assigns a larger weight to prediction errors in hard regions and a lower weight to prediction errors in easy regions. The AG-BCE loss was seamlessly integrated into the training process through the utilization of multi-scale deep supervision, enabling MicroSegNet to capture global contextual dependencies and local information at various scales. We trained our model using micro-US images from 55 patients, followed by evaluation on 20 patients. Our MicroSegNet model achieved a Dice coefficient of 0.939 and a Hausdorff distance of 2.02 mm, outperforming several state-of-the-art segmentation methods, as well as three human annotators with different experience levels. Our code is publicly available at //github.com/mirthAI/MicroSegNet and our dataset is publicly available at //zenodo.org/records/10475293.

MoDELS · 可辨認的 · INFORMS · PAR · AIM ·

2024 年 1 月 25 日

Genie: Achieving Human Parity in Content-Grounded Datasets Generation

Asaf Yehudai,Boaz Carmeli,Yosi Mass,Ofir Arviv,Nathaniel Mills,Assaf Toledo,Eyal Shnarch,Leshem Choshen

from arxiv, Accepted to ICLR24

The lack of high-quality data for content-grounded generation tasks has been identified as a major obstacle to advancing these tasks. To address this gap, we propose Genie, a novel method for automatically generating high-quality content-grounded data. It consists of three stages: (a) Content Preparation, (b) Generation: creating task-specific examples from the content (e.g., question-answer pairs or summaries). (c) Filtering mechanism aiming to ensure the quality and faithfulness of the generated data. We showcase this methodology by generating three large-scale synthetic data, making wishes, for Long-Form Question-Answering (LFQA), summarization, and information extraction. In a human evaluation, our generated data was found to be natural and of high quality. Furthermore, we compare models trained on our data with models trained on human-written data -- ELI5 and ASQA for LFQA and CNN-DailyMail for Summarization. We show that our models are on par with or outperforming models trained on human-generated data and consistently outperforming them in faithfulness. Finally, we applied our method to create LFQA data within the medical domain and compared a model trained on it with models trained on other domains.

數據集 · LIDAR · Analysis · 三維重建 · 3D ·

2024 年 1 月 25 日

GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting

Butian Xiong,Zhuo Li,Zhen Li

from arxiv, IJCAI2024 submit, 8 pages

We introduce a novel large-scale scene reconstruction benchmark using the newly developed 3D representation approach, Gaussian Splatting, on our expansive U-Scene dataset. U-Scene encompasses over one and a half square kilometres, featuring a comprehensive RGB dataset coupled with LiDAR ground truth. For data acquisition, we employed the Matrix 300 drone equipped with the high-accuracy Zenmuse L1 LiDAR, enabling precise rooftop data collection. This dataset, offers a unique blend of urban and academic environments for advanced spatial analysis convers more than 1.5 km$^2$. Our evaluation of U-Scene with Gaussian Splatting includes a detailed analysis across various novel viewpoints. We also juxtapose these results with those derived from our accurate point cloud dataset, highlighting significant differences that underscore the importance of combine multi-modal information

泛化理論 · Pyramid · 視頻分類 · domain shift · Networking ·

2021 年 9 月 17 日

VideoDG: Generalizing Temporal Relations in Videos to Novel Domains

Zhiyu Yao,Yunbo Wang,Jianmin Wang,Philip S. Yu,Mingsheng Long

from arxiv, Accepted by IEEE TPAMI, 2021. Code: //github.com/thuml/VideoDG

This paper introduces video domain generalization where most video classification networks degenerate due to the lack of exposure to the target domains of divergent distributions. We observe that the global temporal features are less generalizable, due to the temporal domain shift that videos from other unseen domains may have an unexpected absence or misalignment of the temporal relations. This finding has motivated us to solve video domain generalization by effectively learning the local-relation features of different timescales that are more generalizable, and exploiting them along with the global-relation features to maintain the discriminability. This paper presents the VideoDG framework with two technical contributions. The first is a new deep architecture named the Adversarial Pyramid Network, which improves the generalizability of video features by capturing the local-relation, global-relation, and cross-relation features progressively. On the basis of pyramid features, the second contribution is a new and robust approach of adversarial data augmentation that can bridge different video domains by improving the diversity and quality of augmented data. We construct three video domain generalization benchmarks in which domains are divided according to different datasets, different consequences of actions, or different camera views, respectively. VideoDG consistently outperforms the combinations of previous video classification models and existing domain generalization methods on all benchmarks.

損失函數（機器學習） · 學習的學習 · 學成 · entity · 泛函 ·

2019 年 9 月 9 日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Jiawei Wu,Wenhan Xiong,William Yang Wang

from arxiv, 11pages, 5 figures, accepted to EMNLP 2019

Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.