国产乱伦对白刺激视频,天天射天天色综合,久久久久亚州A无码专区首,国产欧美在线观看亚洲一区

Various heuristic objectives for modeling hand-object interaction have been proposed in past work. However, due to the lack of a cohesive framework, these objectives often possess a narrow scope of applicability and are limited by their efficiency or accuracy. In this paper, we propose HandyPriors, a unified and general pipeline for pose estimation in human-object interaction scenes by leveraging recent advances in differentiable physics and rendering. Our approach employs rendering priors to align with input images and segmentation masks along with physics priors to mitigate penetration and relative-sliding across frames. Furthermore, we present two alternatives for hand and object pose estimation. The optimization-based pose estimation achieves higher accuracy, while the filtering-based tracking, which utilizes the differentiable priors as dynamics and observation models, executes faster. We demonstrate that HandyPriors attains comparable or superior results in the pose estimation task, and that the differentiable physics module can predict contact information for pose refinement. We also show that our approach generalizes to perception tasks, including robotic hand manipulation and human-object pose estimation in the wild.

相關內容

INTERACT

關注 5

IFIP TC13 Conference on Human-Computer Interaction是人機交互領域的研究者和實踐者展示其工作的重要平臺。多年來，這些會議吸引了來自幾個國家和文化的研究人員。官網鏈接： · INFORMS · MoDELS · 結點 · 層 ·

2024 年 2 月 13 日

HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs

Adrián Bazaga,Pietro Liò,Gos Micklem

from arxiv, 11 pages, 2 figures

Hypergraphs are marked by complex topology, expressing higher-order interactions among multiple entities with hyperedges. Lately, hypergraph-based deep learning methods to learn informative data representations for the problem of node classification on text-attributed hypergraphs have garnered increasing research attention. However, existing methods struggle to simultaneously capture the full extent of hypergraph structural information and the rich linguistic attributes inherent in the nodes attributes, which largely hampers their effectiveness and generalizability. To overcome these challenges, we explore ways to further augment a pretrained BERT model with specialized hypergraph-aware layers for the task of node classification. Such layers introduce higher-order structural inductive bias into the language model, thus improving the model's capacity to harness both higher-order context information from the hypergraph structure and semantic information present in text. In this paper, we propose a new architecture, HyperBERT, a mixed text-hypergraph model which simultaneously models hypergraph relational structure while maintaining the high-quality text encoding capabilities of a pre-trained BERT. Notably, HyperBERT presents results that achieve a new state-of-the-art on five challenging text-attributed hypergraph node classification benchmarks.

離散化 · MoDELS · UniFormer · Continuity · Markov ·

2024 年 2 月 12 日

Convergence Analysis of Discrete Diffusion Model: Exact Implementation through Uniformization

Hongrui Chen,Lexing Ying

from arxiv, 19 pages

Diffusion models have achieved huge empirical success in data generation tasks. Recently, some efforts have been made to adapt the framework of diffusion models to discrete state space, providing a more natural approach for modeling intrinsically discrete data, such as language and graphs. This is achieved by formulating both the forward noising process and the corresponding reversed process as Continuous Time Markov Chains (CTMCs). In this paper, we investigate the theoretical properties of the discrete diffusion model. Specifically, we introduce an algorithm leveraging the uniformization of continuous Markov chains, implementing transitions on random time points. Under reasonable assumptions on the learning of the discrete score function, we derive Total Variation distance and KL divergence guarantees for sampling from any distribution on a hypercube. Our results align with state-of-the-art achievements for diffusion models in $\mathbb{R}^d$ and further underscore the advantages of discrete diffusion models in comparison to the $\mathbb{R}^d$ setting.

MoDELS · 語言模型化 · 設計 · Performer · 任務對話系統 ·

2024 年 2 月 12 日

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

Siddharth Karamcheti,Suraj Nair,Ashwin Balakrishna,Percy Liang,Thomas Kollar,Dorsa Sadigh

from arxiv, 22 pages, 11 figures. Training code and models: //github.com/TRI-ML/prismatic-vlms. Evaluation code: //github.com/TRI-ML/vlm-evaluation

Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and PaLI-3. Despite the volume of new releases, key design decisions around image preprocessing, architecture, and optimization are under-explored, making it challenging to understand what factors account for model performance $-$ a challenge further complicated by the lack of objective, consistent evaluations. To address these gaps, we first compile a suite of standardized evaluations spanning visual question answering, object localization from language, and targeted challenge sets that probe properties such as hallucination; evaluations that provide calibrated, fine-grained insight into a VLM's capabilities. Second, we rigorously investigate VLMs along key design axes, including pretrained visual representations and quantifying the tradeoffs of using base vs. instruct-tuned language models, amongst others. We couple our analysis with three resource contributions: (1) a unified framework for evaluating VLMs, (2) optimized, flexible code for VLM training, and (3) checkpoints for all models, including a family of VLMs at the 7-13B scale that strictly outperform InstructBLIP and LLaVa v1.5, the state-of-the-art in open-source VLMs.

語言模型化 · 大語言模型 · MoDELS · Extensibility · Performer ·

2024 年 2 月 12 日

PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models

Haochen Tan,Zhijiang Guo,Zhan Shi,Lu Xu,Zhili Liu,Yunlong Feng,Xiaoguang Li,Yasheng Wang,Lifeng Shang,Qun Liu,Linqi Song

Large Language Models (LLMs) have exhibited remarkable success in long-form context comprehension tasks. However, their capacity to generate long contents, such as reports and articles, remains insufficiently explored. Current benchmarks do not adequately assess LLMs' ability to produce informative and comprehensive content, necessitating a more rigorous evaluation approach. In this study, we introduce \textsc{ProxyQA}, a framework for evaluating long-form text generation, comprising in-depth human-curated \textit{meta-questions} spanning various domains. Each meta-question contains corresponding \textit{proxy-questions} with annotated answers. LLMs are prompted to generate extensive content in response to these meta-questions. Utilizing an evaluator and incorporating generated content as background context, \textsc{ProxyQA} evaluates the quality of generated content based on the evaluator's performance in answering the \textit{proxy-questions}. We examine multiple LLMs, emphasizing \textsc{ProxyQA}'s demanding nature as a high-quality assessment tool. Human evaluation demonstrates that evaluating through \textit{proxy-questions} is a highly self-consistent and human-criteria-correlated validation method. The dataset and leaderboard will be available at \url{//github.com/Namco0816/ProxyQA}.

置換 · MoDELS · F分數 · Performer · Better ·

2024 年 2 月 12 日

ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution

Xuanming Zhang,Zixun Chen,Zhou Yu

from arxiv, 19 pages, 4 figures

Lexical Substitution discovers appropriate substitutes for a given target word in a context sentence. However, the task fails to consider substitutes that are of equal or higher proficiency than the target, an aspect that could be beneficial for language learners looking to improve their writing. To bridge this gap, we propose a new task, language proficiency-oriented lexical substitution. We also introduce ProLex, a novel benchmark designed to assess systems' ability to generate not only appropriate substitutes but also substitutes that demonstrate better language proficiency. Besides the benchmark, we propose models that can automatically perform the new task. We show that our best model, a Llama2-13B model fine-tuned with task-specific synthetic data, outperforms ChatGPT by an average of 3.2% in F-score and achieves comparable results with GPT-4 on ProLex.

Learning · 機器人 · 數據集 · Vision · 可約的 ·

2024 年 2 月 11 日

Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation

Chrisantus Eze,Christopher Crick

Robot learning of manipulation skills is hindered by the scarcity of diverse, unbiased datasets. While curated datasets can help, challenges remain in generalizability and real-world transfer. Meanwhile, large-scale "in-the-wild" video datasets have driven progress in computer vision through self-supervised techniques. Translating this to robotics, recent works have explored learning manipulation skills by passively watching abundant videos sourced online. Showing promising results, such video-based learning paradigms provide scalable supervision while reducing dataset bias. This survey reviews foundations such as video feature representation learning techniques, object affordance understanding, 3D hand/body modeling, and large-scale robot resources, as well as emerging techniques for acquiring robot manipulation skills from uncontrolled video demonstrations. We discuss how learning only from observing large-scale human videos can enhance generalization and sample efficiency for robotic manipulation. The survey summarizes video-based learning approaches, analyses their benefits over standard datasets, survey metrics, and benchmarks, and discusses open challenges and future directions in this nascent domain at the intersection of computer vision, natural language processing, and robot learning.

泛函 · Performer · 變換 · 縮放 · 編程語言 ·

2024 年 2 月 10 日

Taypsi: Static Enforcement of Privacy Policies for Policy-Agnostic Oblivious Computation

Qianchuan Ye,Benjamin Delaware

Secure multiparty computation (MPC) techniques enable multiple parties to compute joint functions over their private data without sharing that data to other parties, typically by employing powerful cryptographic protocols to protect individual's data. One challenge when writing such functions is that most MPC languages force users to intermix programmatic and privacy concerns in a single application, making it difficult to change or audit a program's underlying privacy policy. Existing policy-agnostic MPC languages rely on run-time / dynamic enforcement to decouple privacy requirements from program logic. Unfortunately, the resulting overhead makes it difficult to scale MPC applications that manipulate structured data. This work proposes to eliminate this overhead by instead transforming programs to semantically equivalent versions that statically enforce user-provided privacy policies. We have implemented this approach in a new MPC language, called Taypsi; our experimental evaluation demonstrates that the resulting system features considerable performance improvements on a variety of MPC applications involving structured data and complex privacy polices.

INTERACT · NCRF · Learning · Extensibility · 3D ·

2024 年 2 月 9 日

NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction

Zhongqun Zhang,Jifei Song,Eduardo Pérez-Pellitero,Yiren Zhou,Hyung Jin Chang,Ale? Leonardis

from arxiv, Accepted by 3DV 2024

Modeling hand-object interactions is a fundamentally challenging task in 3D computer vision. Despite remarkable progress that has been achieved in this field, existing methods still fail to synthesize the hand-object interaction photo-realistically, suffering from degraded rendering quality caused by the heavy mutual occlusions between the hand and the object, and inaccurate hand-object pose estimation. To tackle these challenges, we present a novel free-viewpoint rendering framework, Neural Contact Radiance Field (NCRF), to reconstruct hand-object interactions from a sparse set of videos. In particular, the proposed NCRF framework consists of two key components: (a) A contact optimization field that predicts an accurate contact field from 3D query points for achieving desirable contact between the hand and the object. (b) A hand-object neural radiance field to learn an implicit hand-object representation in a static canonical space, in concert with the specifically designed hand-object motion field to produce observation-to-canonical correspondences. We jointly learn these key components where they mutually help and regularize each other with visual and geometric constraints, producing a high-quality hand-object reconstruction that achieves photo-realistic novel view synthesis. Extensive experiments on HO3D and DexYCB datasets show that our approach outperforms the current state-of-the-art in terms of both rendering quality and pose estimation accuracy.

INTERACT · 穩健性 · MoDELS · Performer · Extensibility ·

2024 年 2 月 9 日

TETRIS: Towards Exploring the Robustness of Interactive Segmentation

Andrey Moskalenko,Vlad Shakhuro,Anna Vorontsova,Anton Konushin,Anton Antonov,Alexander Krapukhin,Denis Shepelev,Konstantin Soshin

from arxiv, Accepted by AAAI2024

Interactive segmentation methods rely on user inputs to iteratively update the selection mask. A click specifying the object of interest is arguably the most simple and intuitive interaction type, and thereby the most common choice for interactive segmentation. However, user clicking patterns in the interactive segmentation context remain unexplored. Accordingly, interactive segmentation evaluation strategies rely more on intuition and common sense rather than empirical studies (e.g., assuming that users tend to click in the center of the area with the largest error). In this work, we conduct a real user study to investigate real user clicking patterns. This study reveals that the intuitive assumption made in the common evaluation strategy may not hold. As a result, interactive segmentation models may show high scores in the standard benchmarks, but it does not imply that they would perform well in a real world scenario. To assess the applicability of interactive segmentation methods, we propose a novel evaluation strategy providing a more comprehensive analysis of a model's performance. To this end, we propose a methodology for finding extreme user inputs by a direct optimization in a white-box adversarial attack on the interactive segmentation model. Based on the performance with such adversarial user inputs, we assess the robustness of interactive segmentation models w.r.t click positions. Besides, we introduce a novel benchmark for measuring the robustness of interactive segmentation, and report the results of an extensive evaluation of dozens of models.

學成 · Machine Learning · INTERACT · 圖 · INFORMS ·

2021 年 5 月 27 日

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

David Ahmedt-Aristizabal,Mohammad Ali Armin,Simon Denman,Clinton Fookes,Lars Petersson

With the advances of data-driven machine learning research, a wide variety of prediction problems have been tackled. It has become critical to explore how machine learning and specifically deep learning methods can be exploited to analyse healthcare data. A major limitation of existing methods has been the focus on grid-like data; however, the structure of physiological recordings are often irregular and unordered which makes it difficult to conceptualise them as a matrix. As such, graph neural networks have attracted significant attention by exploiting implicit information that resides in a biological system, with interactive nodes connected by edges whose weights can be either temporal associations or anatomical junctions. In this survey, we thoroughly review the different types of graph architectures and their applications in healthcare. We provide an overview of these methods in a systematic manner, organized by their domain of application including functional connectivity, anatomical structure and electrical-based analysis. We also outline the limitations of existing techniques and discuss potential directions for future research.