黄片一级在线视频播放_午夜男女爽爽爽免费大片_黄色网站欧美内射_一本久久A久久精品综合夜色_中文字幕亚洲欧美精品无限乱码_日本在线观看免费高清I_人人摸人人添人人干

The key of Human-Object Interaction(HOI) recognition is to infer the relationship between human and objects. Recently, the image's Human-Object Interaction(HOI) detection has made significant progress. However, there is still room for improvement in video HOI detection performance. Existing one-stage methods use well-designed end-to-end networks to detect a video segment and directly predict an interaction. It makes the model learning and further optimization of the network more complex. This paper introduces the Spatial Parsing and Dynamic Temporal Pooling (SPDTP) network, which takes the entire video as a spatio-temporal graph with human and object nodes as input. Unlike existing methods, our proposed network predicts the difference between interactive and non-interactive pairs through explicit spatial parsing, and then performs interaction recognition. Moreover, we propose a learnable and differentiable Dynamic Temporal Module(DTM) to emphasize the keyframes of the video and suppress the redundant frame. Furthermore, the experimental results show that SPDTP can pay more attention to active human-object pairs and valid keyframes. Overall, we achieve state-of-the-art performance on CAD-120 dataset and Something-Else dataset.

相關內容

INTERACT

關注 5

IFIP TC13 Conference on Human-Computer Interaction是人機交互領域的研究者和實踐者展示其工作的重要平臺。多年來，這些會議吸引了來自幾個國家和文化的研究人員。官網鏈接： · INFORMS · 圖 · 變換 · Learning ·

2022 年 7 月 25 日

IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Yunsheng Pang,Qiuhong Ke,Hossein Rahmani,James Bailey,Jun Liu

from arxiv, Accepted by ECCV 2022

Human interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition via modeling the interactive body parts as graphs. More specifically, the proposed IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts, and enhances the representation of each person by aggregating the information of the interactive body parts based on the learned graphs. Furthermore, we propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence to better capture the spatial and temporal information of the skeleton sequence for learning the graphs. Extensive experiments on three benchmark datasets demonstrate that our model outperforms the state-of-the-art with a significant margin.

Attention · 相關系數 · Performer · 變換 · 塊 ·

2022 年 7 月 22 日

AiATrack: Attention in Attention for Transformer Visual Tracking

Shenyuan Gao,Chunluan Zhou,Chao Ma,Xinggang Wang,Junsong Yuan

from arxiv, Accepted by ECCV 2022. Code and models are publicly available at //github.com/Little-Podi/AiATrack

Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role. However, the independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weights, which inhibits further performance improvement. To address this issue, we propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors. Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking. Moreover, we propose a streamlined Transformer tracking framework, dubbed AiATrack, by introducing efficient feature reuse and target-background embeddings to make full use of temporal references. Experiments show that our tracker achieves state-of-the-art performance on six tracking benchmarks while running at a real-time speed.

簇 · 異常檢測 · prototype · Extensibility · Networking ·

2022 年 7 月 22 日

Dynamic Local Aggregation Network with Adaptive Clusterer for Anomaly Detection

Zhiwei Yang,Peng Wu,Jing Liu,Xiaotao Liu

from arxiv, Accepted by ECCV 2022

Existing methods for anomaly detection based on memory-augmented autoencoder (AE) have the following drawbacks: (1) Establishing a memory bank requires additional memory space. (2) The fixed number of prototypes from subjective assumptions ignores the data feature differences and diversity. To overcome these drawbacks, we introduce DLAN-AC, a Dynamic Local Aggregation Network with Adaptive Clusterer, for anomaly detection. First, The proposed DLAN can automatically learn and aggregate high-level features from the AE to obtain more representative prototypes, while freeing up extra memory space. Second, The proposed AC can adaptively cluster video data to derive initial prototypes with prior information. In addition, we also propose a dynamic redundant clustering strategy (DRCS) to enable DLAN for automatically eliminating feature clusters that do not contribute to the construction of prototypes. Extensive experiments on benchmarks demonstrate that DLAN-AC outperforms most existing methods, validating the effectiveness of our method. Our code is publicly available at //github.com/Beyond-Zw/DLAN-AC.

MoDELS · state-of-the-art · 視頻分類 · Extensibility · Networking ·

2021 年 1 月 5 日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Wenhao Wu,Dongliang He,Tianwei Lin,Fu Li,Chuang Gan,Errui Ding

from arxiv, Accepted by AAAI2021

Conventionally, spatiotemporal modeling network and its complexity are the two most concentrated research topics in video action recognition. Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance. In this paper, we attempt to acquire both efficiency and effectiveness simultaneously. First of all, besides traditionally treating H x W x T video frames as space-time signal (viewing from the Height-Width spatial plane), we propose to also model video from the other two Height-Time and Width-Time planes, to capture the dynamics of video thoroughly. Secondly, our model is designed based on 2D CNN backbones and model complexity is well kept in mind by design. Specifically, we introduce a novel multi-view fusion (MVF) module to exploit video dynamics using separable convolution for efficiency. It is a plug-and-play module and can be inserted into off-the-shelf 2D CNNs to form a simple yet effective model called MVFNet. Moreover, MVFNet can be thought of as a generalized video modeling framework and it can specialize to be existing methods such as C2D, SlowOnly, and TSM under different settings. Extensive experiments are conducted on popular benchmarks (i.e., Something-Something V1 & V2, Kinetics, UCF-101, and HMDB-51) to show its superiority. The proposed MVFNet can achieve state-of-the-art performance with 2D CNN's complexity.

圖 · Networking · INTERACT · INFORMS · 圖形處理器 ·

2020 年 11 月 25 日

Time-Series Event Prediction with Evolutionary State Graph

Wenjie Hu,Yang Yang,Ziqiang Cheng,Carl Yang,Xiang Ren

from arxiv, A long version of EvoNet (WSDM 2021)

The accurate and interpretable prediction of future events in time-series data often requires the capturing of representative patterns (or referred to as states) underpinning the observed data. To this end, most existing studies focus on the representation and recognition of states, but ignore the changing transitional relations among them. In this paper, we present evolutionary state graph, a dynamic graph structure designed to systematically represent the evolving relations (edges) among states (nodes) along time. We conduct analysis on the dynamic graphs constructed from the time-series data and show that changes on the graph structures (e.g., edges connecting certain state nodes) can inform the occurrences of events (i.e., time-series fluctuation). Inspired by this, we propose a novel graph neural network model, Evolutionary State Graph Network (EvoNet), to encode the evolutionary state graph for accurate and interpretable time-series event prediction. Specifically, Evolutionary State Graph Network models both the node-level (state-to-state) and graph-level (segment-to-segment) propagation, and captures the node-graph (state-to-segment) interactions over time. Experimental results based on five real-world datasets show that our approach not only achieves clear improvements compared with 11 baselines, but also provides more insights towards explaining the results of event predictions.

圖 · Networking · 學成 · Performer · 深度學習 ·

2020 年 10 月 9 日

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Emanuele Rossi,Ben Chamberlain,Fabrizio Frasca,Davide Eynard,Federico Monti,Michael Bronstein

Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.

小樣本學習 · 目標檢測 · Networking · 數據集 · 情景 ·

2020 年 3 月 31 日

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Qi Fan,Wei Zhuo,Chi-Keung Tang,Yu-Wing Tai

from arxiv, CVPR2020 Camera Ready. (Fix Figure 3 and Table 5. More implementation details in the supplementary material.)

Conventional methods for object detection typically require a substantial amount of training data and preparing such high-quality training data is very labor-intensive. In this paper, we propose a novel few-shot object detection network that aims at detecting objects of unseen categories with only a few annotated examples. Central to our method are our Attention-RPN, Multi-Relation Detector and Contrastive Training strategy, which exploit the similarity between the few shot support set and query set to detect novel objects while suppressing false detection in the background. To train our network, we contribute a new dataset that contains 1000 categories of various objects with high-quality annotations. To the best of our knowledge, this is one of the first datasets specifically designed for few-shot object detection. Once our few-shot network is trained, it can detect objects of unseen categories without further training or fine-tuning. Our method is general and has a wide range of potential applications. We produce a new state-of-the-art performance on different datasets in the few-shot setting. The dataset link is //github.com/fanq15/Few-Shot-Object-Detection-Dataset.

圖卷積神經網絡/圖卷積網絡 · 圖 · 簇 · 圖卷積 · 注意力機制 ·

2020 年 3 月 13 日

Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection

Kaihua Zhang,Tengpeng Li,Shiwen Shen,Bo Liu,Jin Chen,Qingshan Liu

from arxiv, CVPR2020

Co-saliency detection aims to discover the common and salient foregrounds from a group of relevant images. For this task, we present a novel adaptive graph convolutional network with attention graph clustering (GCAGC). Three major contributions have been made, and are experimentally shown to have substantial practical merits. First, we propose a graph convolutional network design to extract information cues to characterize the intra- and interimage correspondence. Second, we develop an attention graph clustering algorithm to discriminate the common objects from all the salient foreground objects in an unsupervised fashion. Third, we present a unified framework with encoder-decoder structure to jointly train and optimize the graph convolutional network, attention graph cluster, and co-saliency detection decoder in an end-to-end manner. We evaluate our proposed GCAGC method on three cosaliency detection benchmark datasets (iCoseg, Cosal2015 and COCO-SEG). Our GCAGC method obtains significant improvements over the state-of-the-arts on most of them.

圖卷積神經網絡/圖卷積網絡 · 簇 · 圖卷積 · 圖 · Networking ·

2019 年 3 月 27 日

Linkage Based Face Clustering via Graph Convolution Network

Zhongdao Wang,Liang Zheng,Yali Li,Shengjin Wang

from arxiv, To appear in CVPR 2019

In this paper, we present an accurate and scalable approach to the face clustering task. We aim at grouping a set of faces by their potential identities. We formulate this task as a link prediction problem: a link exists between two faces if they are of the same identity. The key idea is that we find the local context in the feature space around an instance (face) contains rich information about the linkage relationship between this instance and its neighbors. By constructing sub-graphs around each instance as input data, which depict the local context, we utilize the graph convolution network (GCN) to perform reasoning and infer the likelihood of linkage between pairs in the sub-graphs. Experiments show that our method is more robust to the complex distribution of faces than conventional methods, yielding favorably comparable results to state-of-the-art methods on standard face clustering benchmarks, and is scalable to large datasets. Furthermore, we show that the proposed method does not need the number of clusters as prior, is aware of noises and outliers, and can be extended to a multi-view version for more accurate clustering accuracy.

MoDELS · 注意力機制 · RNN · 標注 · Networking ·

2017 年 12 月 20 日

Order-Free RNN with Visual Attention for Multi-Label Classification

Shang-Fu Chen,Yi-Chen Chen,Chih-Kuan Yeh,Yu-Chiang Frank Wang

from arxiv, Accepted at 32nd AAAI Conference on Artificial Intelligence (AAAI-18)

In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.