两个人的电影全免费观看720_亚洲成A人片在线观看网站黄_日本久久午夜国产精品WWW_久久精品精品自在国产2019_国产美女精品91_国产免费无码AV片在观看_五月丁香婷婷综合在线观看

We present DR-HAI -- a novel argumentation-based framework designed to extend model reconciliation approaches, commonly used in human-aware planning, for enhanced human-AI interaction. By adopting an argumentation-based dialogue paradigm, DR-HAI enables interactive reconciliation to address knowledge discrepancies between an explainer and an explainee. We formally describe the operational semantics of DR-HAI, provide theoretical guarantees, and empirically evaluate its efficacy. Our findings suggest that DR-HAI offers a promising direction for fostering effective human-AI interactions.

相關內容

INTERACT

關注 5

IFIP TC13 Conference on Human-Computer Interaction是人機交互領域的研究者和實踐者展示其工作的重要平臺。多年來，這些會議吸引了來自幾個國家和文化的研究人員。官網鏈接： · NeRF · Extensibility · EASE · motivation ·

2023 年 10 月 16 日

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

Jia-Wei Liu,Yan-Pei Cao,Jay Zhangjie Wu,Weijia Mao,Yuchao Gu,Rui Zhao,Jussi Keppo,Ying Shan,Mike Zheng Shou

from arxiv, Project Page: //showlab.github.io/DynVideo-E/

Despite remarkable research advances in diffusion-based video editing, existing methods are limited to short-length videos due to the contradiction between long-range consistency and frame-wise editing. Recent approaches attempt to tackle this challenge by introducing video-2D representations to degrade video editing to image editing. However, they encounter significant difficulties in handling large-scale motion- and view-change videos especially for human-centric videos. This motivates us to introduce the dynamic Neural Radiance Fields (NeRF) as the human-centric video representation to ease the video editing problem to a 3D space editing task. As such, editing can be performed in the 3D spaces and propagated to the entire video via the deformation field. To provide finer and direct controllable editing, we propose the image-based 3D space editing pipeline with a set of effective designs. These include multi-view multi-pose Score Distillation Sampling (SDS) from both 2D personalized diffusion priors and 3D diffusion priors, reconstruction losses on the reference image, text-guided local parts super-resolution, and style transfer for 3D background space. Extensive experiments demonstrate that our method, dubbed as DynVideo-E, significantly outperforms SOTA approaches on two challenging datasets by a large margin of 50% ~ 95% in terms of human preference. Compelling video comparisons are provided in the project page //showlab.github.io/DynVideo-E/. Our code and data will be released to the community.

TOOLS · 語言模型化 · MoDELS · Agent · 知識 (knowledge) ·

2023 年 10 月 16 日

BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation

Ji Qi,Kaixuan Ji,Jifan Yu,Duokang Wang,Bin Xu,Lei Hou,Juanzi Li

Building models that generate textual responses to user instructions for videos is a practical and challenging topic, as it requires both vision understanding and knowledge reasoning. Compared to language and image modalities, training efficiency remains a serious problem as existing studies train models on massive sparse videos aligned with brief descriptions. In this paper, we introduce BiLL-VTG, a fast adaptive framework that leverages large language models (LLMs) to reasoning on videos based on essential lightweight visual tools. Specifically, we reveal the key to response specific instructions is the concentration on relevant video events, and utilize two visual tools of structured scene graph generation and descriptive image caption generation to gather and represent the events information. Thus, a LLM equipped with world knowledge is adopted as the reasoning agent to achieve the response by performing multiple reasoning steps on specified video events.To address the difficulty of specifying events from agent, we further propose an Instruction-oriented Video Events Recognition (InsOVER) algorithm based on the efficient Hungarian matching to localize corresponding video events using linguistic instructions, enabling LLMs to interact with long videos. Extensive experiments on two typical video-based texts generations tasks show that our tuning-free framework outperforms the pre-trained models including Flamingo-80B, to achieve the state-of-the-art performance.

INFORMS · Guidance · 序列標注 · HTTPS · 任務對話系統 ·

2023 年 10 月 16 日

DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task

Guanting Dong,Tingfeng Hui,Zhuoma GongQue,Jinxu Zhao,Daichi Guo,Gang Zhao,Keqing He,Weiran Xu

from arxiv, Findings of EMNLP 2023 (Short Paper)

Recently, prompt-based generative frameworks have shown impressive capabilities in sequence labeling tasks. However, in practical dialogue scenarios, relying solely on simplistic templates and traditional corpora presents a challenge for these methods in generalizing to unknown input perturbations. To address this gap, we propose a multi-task demonstration based generative framework for noisy slot filling, named DemoNSF. Specifically, we introduce three noisy auxiliary tasks, namely noisy recovery (NR), random mask (RM), and hybrid discrimination (HD), to implicitly capture semantic structural information of input perturbations at different granularities. In the downstream main task, we design a noisy demonstration construction strategy for the generative framework, which explicitly incorporates task-specific information and perturbed distribution during training and inference. Experiments on two benchmarks demonstrate that DemoNSF outperforms all baseline methods and achieves strong generalization. Further analysis provides empirical guidance for the practical application of generative frameworks. Our code is released at //github.com/dongguanting/Demo-NSF.

語言模型化 · MoDELS · 推斷 · 多樣性 · 可約的 ·

2023 年 10 月 15 日

Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models

Wenqi Jiang,Marco Zeller,Roger Waleffe,Torsten Hoefler,Gustavo Alonso

A Retrieval-Augmented Language Model (RALM) augments a generative language model by retrieving context-specific knowledge from an external database. This strategy facilitates impressive text generation quality even with smaller models, thus reducing orders of magnitude of computational demands. However, RALMs introduce unique system design challenges due to (a) the diverse workload characteristics between LM inference and retrieval and (b) the various system requirements and bottlenecks for different RALM configurations such as model sizes, database sizes, and retrieval frequencies. We propose Chameleon, a heterogeneous accelerator system that integrates both LM and retrieval accelerators in a disaggregated architecture. The heterogeneity ensures efficient acceleration of both LM inference and retrieval, while the accelerator disaggregation enables the system to independently scale both types of accelerators to fulfill diverse RALM requirements. Our Chameleon prototype implements retrieval accelerators on FPGAs and assigns LM inference to GPUs, with a CPU server orchestrating these accelerators over the network. Compared to CPU-based and CPU-GPU vector search systems, Chameleon achieves up to 23.72x speedup and 26.2x energy efficiency. Evaluated on various RALMs, Chameleon exhibits up to 2.16x reduction in latency and 3.18x speedup in throughput compared to the hybrid CPU-GPU architecture. These promising results pave the way for bringing accelerator heterogeneity and disaggregation into future RALM systems.

Agent · Processing（編程語言） · Extensibility · INTERACT · MoDELS ·

2023 年 10 月 14 日

How AI Processing Delays Foster Creativity: Exploring Research Question Co-Creation with an LLM-based Agent

Yiren Liu,Si Chen,Haocong Cheng,Mengxia Yu,Xiao Ran,Andrew Mo,Yiliu Tang,Yun Huang

Developing novel research questions (RQs) often requires extensive literature reviews, especially for interdisciplinary fields. Leveraging Large Language Models (LLMs), we built an LLM-based agent system, called CoQuest, supporting RQ development through human-AI co-creation. We conducted an experimental design with 20 participants to examine the effect of two interaction designs: breadth-first and depth-first RQ generation. The results showed that participants found the breadth-first approach more creative and trustworthy upon task completion. However, during the task, they rated the RQs generated through the depth-first approach as more creative. We also discovered that AI processing delays allowed users to contemplate multiple RQs simultaneously, resulting in more generated RQs and an increased sense of perceived control. Our work makes both theoretical and practical contributions by proposing and assessing a mental model for human-AI co-creation RQs.

Performer · 可約的 · 變換 · 推斷 · 詞元分析器 ·

2023 年 10 月 13 日

PIM-GPT: A Hybrid Process-in-Memory Accelerator for Autoregressive Transformers

Yuting Wu,Ziyu Wang,Wei D. Lu

Decoder-only Transformer models such as GPT have demonstrated superior performance in text generation, by autoregressively predicting the next token. However, the performance of GPT is bounded by low compute-to-memory-ratio and high memory access. Throughput-oriented architectures such as GPUs target parallel processing rather than sequential token generation, and are not efficient for GPT acceleration, particularly on-device inference applications. Process-in-memory (PIM) architectures can significantly reduce data movement and provide high computation parallelism, and are promising candidates to accelerate GPT inference. In this work, we propose PIM-GPT that aims to achieve high throughput, high energy efficiency and end-to-end acceleration of GPT inference. PIM-GPT leverages DRAM-based PIM solutions to perform multiply-accumulate (MAC) operations on the DRAM chips, greatly reducing data movement. A compact application-specific integrated chip (ASIC) is designed and synthesized to initiate instructions to PIM chips and support data communication along with necessary arithmetic computations. At the software level, the mapping scheme is designed to maximize data locality and computation parallelism by partitioning a matrix among DRAM channels and banks to utilize all in-bank computation resources concurrently. We develop an event-driven clock-cycle accurate simulator to validate the efficacy of the proposed PIM-GPT architecture. Overall, PIM-GPT achieves 41$-$137$\times$, 631$-$1074$\times$ speedup and 339$-$1085$\times$, 890$-$1632$\times$ energy efficiency over GPU and CPU baseline, respectively, on 8 GPT models with up to 1.4 billion parameters.

Legged Robot · 路徑 · Performer · 機器人 · FAST ·

2023 年 10 月 11 日

DiPPeR: Diffusion-based 2D Path Planner applied on Legged Robots

Jianwei Liu,Maria Stamatopoulou,Dimitrios Kanoulas

from arxiv, 7 pages, 9 figures

In this work, we present DiPPeR, a novel and fast 2D path planning framework for quadrupedal locomotion, leveraging diffusion-driven techniques. Our contributions include a scalable dataset of map images and corresponding end-to-end trajectories, an image-conditioned diffusion planner for mobile robots, and a training/inference pipeline employing CNNs. We validate our approach in several mazes, as well as in real-world deployment scenarios on Boston Dynamic's Spot and Unitree's Go1 robots. DiPPeR performs on average 70 times faster for trajectory generation against both search based and data driven path planning algorithms with an average of 80% consistency in producing feasible paths of various length in maps of variable size, and obstacle structure.

文本分類 · 標注 · Extensibility · state-of-the-art · 正則化項 ·

2021 年 2 月 15 日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Yu Zhang,Zhihong Shen,Yuxiao Dong,Kuansan Wang,Jiawei Han

from arxiv, 12 pages; Accepted to WWW 2021

Multi-label text classification refers to the problem of assigning each given document its most relevant labels from the label set. Commonly, the metadata of the given documents and the hierarchy of the labels are available in real-world applications. However, most existing studies focus on only modeling the text information, with a few attempts to utilize either metadata or hierarchy signals, but not both of them. In this paper, we bridge the gap by formalizing the problem of metadata-aware text classification in a large label hierarchy (e.g., with tens of thousands of labels). To address this problem, we present the MATCH solution -- an end-to-end framework that leverages both metadata and hierarchy information. To incorporate metadata, we pre-train the embeddings of text and metadata in the same space and also leverage the fully-connected attentions to capture the interrelations between them. To leverage the label hierarchy, we propose different ways to regularize the parameters and output probability of each child label by its parents. Extensive experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH over state-of-the-art deep learning baselines.

MoDELS · CLUES · INTERACT · 圖形處理器 · Neural Networks ·

2021 年 1 月 28 日

A Graph-based Relevance Matching Model for Ad-hoc Retrieval

Yufeng Zhang,Jinghao Zhang,Zeyu Cui,Shu Wu,Liang Wang

from arxiv, To appear at AAAI 2021

To retrieve more relevant, appropriate and useful documents given a query, finding clues about that query through the text is crucial. Recent deep learning models regard the task as a term-level matching problem, which seeks exact or similar query patterns in the document. However, we argue that they are inherently based on local interactions and do not generalise to ubiquitous, non-consecutive contextual relationships.In this work, we propose a novel relevance matching model based on graph neural networks to leverage the document-level word relationships for ad-hoc retrieval. In addition to the local interactions, we explicitly incorporate all contexts of a term through the graph-of-word text format. Matching patterns can be revealed accordingly to provide a more accurate relevance score. Our approach significantly outperforms strong baselines on two ad-hoc benchmarks. We also experimentally compare our model with BERT and show our ad-vantages on long documents.

entity · 鏈路預測 · Extensibility · 圖 · 知識圖譜 ·

2019 年 3 月 13 日

MMKG: Multi-Modal Knowledge Graphs

Ye Liu,Hui Li,Alberto Garcia-Duran,Mathias Niepert,Daniel Onoro-Rubio,David S. Rosenblum

from arxiv, ESWC 2019

We present MMKG, a collection of three knowledge graphs that contain both numerical features and (links to) images for all entities as well as entity alignments between pairs of KGs. Therefore, multi-relational link prediction and entity matching communities can benefit from this resource. We believe this data set has the potential to facilitate the development of novel multi-modal learning approaches for knowledge graphs.We validate the utility ofMMKG in the sameAs link prediction task with an extensive set of experiments. These experiments show that the task at hand benefits from learning of multiple feature types.