亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tr id='vvRB3'><strong id='7uQuK'></strong><small id='Bi1m0'></small><button id='m6rye'></button><li id='1wnkg'><noscript id='hCIXH'><big id='gbk8E'></big><dt id='4e2ow'></dt></noscript></li></tr><ol id='LN95W'><option id='toWnN'><table id='V2tea'><blockquote id='0hGvU'><tbody id='EV4XD'></tbody></blockquote></table></option></ol><u id='z83Sg'></u><kbd id='idccC'><kbd id='mxtyW'></kbd></kbd>

<code id='Q0IZ6'><strong id='o3eIH'></strong></code>

<fieldset id='uSxQK'></fieldset>

<span id='S1jsU'></span>

<ins id='qNKqJ'></ins>

<acronym id='UEZ8k'><em id='Fu079'></em><td id='xXQNs'><div id='g0rQD'></div></td></acronym><address id='aBpCA'><big id='BlcuS'><big id='EbF8e'></big><legend id='hc7jp'></legend></big></address>

<i id='GEFdc'><div id='92Fh6'><ins id='bGFm8'></ins></div></i>

<i id='bMrIH'></i>

·

示例 · 可辨認的 · 掩碼 · 注意力機制 · Performer ·

2021 年 11 月 15 日

Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Anirudh S Chakravarthy,Won-Dong Jang,Zudi Lin,Donglai Wei,Song Bai,Hanspeter Pfister

from arxiv, Accepted at CVPR RVSU Workshop 2021

Video instance segmentation aims to detect, segment, and track objects in a video. Current approaches extend image-level segmentation algorithms to the temporal domain. However, this results in temporally inconsistent masks. In this work, we identify the mask quality due to temporal stability as a performance bottleneck. Motivated by this, we propose a video instance segmentation method that alleviates the problem due to missing detections. Since this cannot be solved simply using spatial information, we leverage temporal context using inter-frame attentions. This allows our network to refocus on missing objects using box predictions from the neighbouring frame, thereby overcoming missing detections. Our method significantly outperforms previous state-of-the-art algorithms using the Mask R-CNN backbone, by achieving 35.1% mAP on the YouTube-VIS benchmark. Additionally, our method is completely online and requires no future frames. Our code is publicly available at //github.com/anirudh-chakravarthy/ObjProp.

相關內容

示例 · Extensibility · MoDELS · Notability · 注意力機制 ·

2021 年 12 月 3 日

Hybrid Instance-aware Temporal Fusion for Online Video Instance Segmentation

Xiang Li,Jinglu Wang,Xiao Li,Yan Lu

from arxiv, AAAI 2022 Accepted Paper

Recently, transformer-based image segmentation methods have achieved notable success against previous solutions. While for video domains, how to effectively model temporal context with the attention of object instances across frames remains an open problem. In this paper, we propose an online video instance segmentation framework with a novel instance-aware temporal fusion method. We first leverages the representation, i.e., a latent code in the global context (instance code) and CNN feature maps to represent instance- and pixel-level features. Based on this representation, we introduce a cropping-free temporal fusion approach to model the temporal consistency between video frames. Specifically, we encode global instance-specific information in the instance code and build up inter-frame contextual fusion with hybrid attentions between the instance codes and CNN feature maps. Inter-frame consistency between the instance codes are further enforced with order constraints. By leveraging the learned hybrid temporal consistency, we are able to directly retrieve and maintain instance identities across frames, eliminating the complicated frame-wise instance matching in prior methods. Extensive experiments have been conducted on popular VIS datasets, i.e. Youtube-VIS-19/21. Our model achieves the best performance among all online VIS methods. Notably, our model also eclipses all offline methods when using the ResNet-50 backbone.

直推學習 · 歸納學習 · 學成 · Branch · Extensibility ·

2021 年 8 月 8 日

Joint Inductive and Transductive Learning for Video Object Segmentation

Yunyao Mao,Ning Wang,Wengang Zhou,Houqiang Li

from arxiv, To appear in ICCV 2021

Semi-supervised video object segmentation is a task of segmenting the target object in a video sequence given only a mask annotation in the first frame. The limited information available makes it an extremely challenging task. Most previous best-performing methods adopt matching-based transductive reasoning or online inductive learning. Nevertheless, they are either less discriminative for similar instances or insufficient in the utilization of spatio-temporal information. In this work, we propose to integrate transductive and inductive learning into a unified framework to exploit the complementarity between them for accurate and robust video object segmentation. The proposed approach consists of two functional branches. The transduction branch adopts a lightweight transformer architecture to aggregate rich spatio-temporal cues while the induction branch performs online inductive learning to obtain discriminative target information. To bridge these two diverse branches, a two-head label encoder is introduced to learn the suitable target prior for each of them. The generated mask encodings are further forced to be disentangled to better retain their complementarity. Extensive experiments on several prevalent benchmarks show that, without the need of synthetic training data, the proposed approach sets a series of new state-of-the-art records. Code is available at //github.com/maoyunyao/JOINT.

相關系數 · 變換 · INFORMS · 局部最優 · Better ·

2021 年 3 月 29 日

Transformer Tracking

Xin Chen,Bin Yan,Jiawen Zhu,Dong Wang,Xiaoyun Yang,Huchuan Lu

from arxiv, Accepted by CVPR2021

Correlation acts as a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion manner to consider the similarity between the template and the search region. However, the correlation operation itself is a local linear matching process, leading to lose semantic information and fall into local optimum easily, which may be the bottleneck of designing high-accuracy tracking algorithms. Is there any better feature fusion method than correlation? To address this issue, inspired by Transformer, this work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention. Specifically, the proposed method includes an ego-context augment module based on self-attention and a cross-feature augment module based on cross-attention. Finally, we present a Transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head. Experiments show that our TransT achieves very promising results on six challenging datasets, especially on large-scale LaSOT, TrackingNet, and GOT-10k benchmarks. Our tracker runs at approximatively 50 fps on GPU. Code and models are available at //github.com/chenxin-dlut/TransT.

示例 · 端到端 · 變換 · MoDELS · 可理解性 ·

2021 年 3 月 24 日

End-to-End Video Instance Segmentation with Transformers

Yuqing Wang,Zhaoliang Xu,Xinlong Wang,Chunhua Shen,Baoshan Cheng,Hao Shen,Huaxia Xia

from arxiv, CVPR2021 Oral

Video instance segmentation (VIS) is the task that requires simultaneously classifying, segmenting and tracking object instances of interest in video. Recent methods typically develop sophisticated pipelines to tackle this task. Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem. Given a video clip consisting of multiple image frames as input, VisTR outputs the sequence of masks for each instance in the video in order directly. At the core is a new, effective instance sequence matching and segmentation strategy, which supervises and segments instances at the sequence level as a whole. VisTR frames the instance segmentation and tracking in the same perspective of similarity learning, thus considerably simplifying the overall pipeline and is significantly different from existing approaches. Without bells and whistles, VisTR achieves the highest speed among all existing VIS models, and achieves the best result among methods using single model on the YouTube-VIS dataset. For the first time, we demonstrate a much simpler and faster video instance segmentation framework built upon Transformers, achieving competitive accuracy. We hope that VisTR can motivate future research for more video understanding tasks.

圖形處理器 · 圖 · 掩碼 · Neural Networks · Networking ·

2020 年 12 月 10 日

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Daizong Liu,Shuangjie Xu,Xiao-Yang Liu,Zichuan Xu,Wei Wei,Pan Zhou

from arxiv, Accepted by AAAI 2021

This paper addresses the task of segmenting class-agnostic objects in semi-supervised setting. Although previous detection based methods achieve relatively good performance, these approaches extract the best proposal by a greedy strategy, which may lose the local patch details outside the chosen candidate. In this paper, we propose a novel spatiotemporal graph neural network (STG-Net) to reconstruct more accurate masks for video object segmentation, which captures the local contexts by utilizing all proposals. In the spatial graph, we treat object proposals of a frame as nodes and represent their correlations with an edge weight strategy for mask context aggregation. To capture temporal information from previous frames, we use a memory network to refine the mask of current frame by retrieving historic masks in a temporal graph. The joint use of both local patch details and temporal relationships allow us to better address the challenges such as object occlusion and missing. Without online learning and fine-tuning, our STG-Net achieves state-of-the-art performance on four large benchmarks (DAVIS, YouTube-VOS, SegTrack-v2, and YouTube-Objects), demonstrating the effectiveness of the proposed approach.

示例 · Siamese · HTTPS · 判別器 · Processing（編程語言） ·

2020 年 12 月 7 日

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Yang Fu,Linjie Yang,Ding Liu,Thomas S. Huang,Humphrey Shi

from arxiv, Accepted to AAAI 2021

Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects and they suffer in the video scenario due to several distinct challenges such as motion blur and drastic appearance change. To eliminate ambiguities introduced by only using single-frame features, we propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information. The aggregation process is carefully designed with a new attention mechanism which significantly increases the discriminative power of the learned features. We further improve the tracking capability of our model through a siamese design by incorporating both feature similarities and spatial similarities. Experiments conducted on the YouTube-VIS dataset validate the effectiveness of proposed CompFeat. Our code will be available at //github.com/SHI-Labs/CompFeat-for-Video-Instance-Segmentation.

INFORMS · Networking · MoDELS · INTERACT · Performer ·

2020 年 3 月 13 日

Dual Temporal Memory Network for Efficient Video Object Segmentation

Kaihua Zhang,Long Wang,Dong Liu,Bo Liu,Qingshan Liu,Zhu Li

from arxiv, 10 pages, 5 figures

Video Object Segmentation (VOS) is typically formulated in a semi-supervised setting. Given the ground-truth segmentation mask on the first frame, the task of VOS is to track and segment the single or multiple objects of interests in the rest frames of the video at the pixel level. One of the fundamental challenges in VOS is how to make the most use of the temporal information to boost the performance. We present an end-to-end network which stores short- and long-term video sequence information preceding the current frame as the temporal memories to address the temporal modeling in VOS. Our network consists of two temporal sub-networks including a short-term memory sub-network and a long-term memory sub-network. The short-term memory sub-network models the fine-grained spatial-temporal interactions between local regions across neighboring frames in video via a graph-based learning framework, which can well preserve the visual consistency of local regions over time. The long-term memory sub-network models the long-range evolution of object via a Simplified-Gated Recurrent Unit (S-GRU), making the segmentation be robust against occlusions and drift errors. In our experiments, we show that our proposed method achieves a favorable and competitive performance on three frequently-used VOS datasets, including DAVIS 2016, DAVIS 2017 and Youtube-VOS in terms of both speed and accuracy.

Networking · Performer · Extensibility · 級聯 · 可辨認的 ·

2019 年 7 月 4 日

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Qiang Zhou,Zilong Huang,Lichao Huang,Yongchao Gong,Han Shen,Chang Huang,Wenyu Liu,Xinggang Wang

Video object segmentation (VOS) aims at pixel-level object tracking given only the annotations in the first frame. Due to the large visual variations of objects in video and the lack of training samples, it remains a difficult task despite the upsurging development of deep learning. Toward solving the VOS problem, we bring in several new insights by the proposed unified framework consisting of object proposal, tracking and segmentation components. The object proposal network transfers objectness information as generic knowledge into VOS; the tracking network identifies the target object from the proposals; and the segmentation network is performed based on the tracking results with a novel dynamic-reference based model adaptation scheme. Extensive experiments have been conducted on the DAVIS'17 dataset and the YouTube-VOS dataset, our method achieves the state-of-the-art performance on several video object segmentation benchmarks. We make the code publicly available at //github.com/sydney0zq/PTSNet.

注意力機制 · Networking · state-of-the-art · 模型評估 · 可辨認的 ·

2018 年 9 月 6 日

Global-and-local attention networks for visual recognition

Drew Linsley,Dan Shiebler,Sven Eberhardt,Thomas Serre

State-of-the-art deep convolutional networks (DCNs) such as squeeze-and- excitation (SE) residual networks implement a form of attention, also known as contextual guidance, which is derived from global image features. Here, we explore a complementary form of attention, known as visual saliency, which is derived from local image features. We extend the SE module with a novel global-and-local attention (GALA) module which combines both forms of attention -- resulting in state-of-the-art accuracy on ILSVRC. We further describe ClickMe.ai, a large-scale online experiment designed for human participants to identify diagnostic image regions to co-train a GALA network. Adding humans-in-the-loop is shown to significantly improve network accuracy, while also yielding visual features that are more interpretable and more similar to those used by human observers.

示例 · 數據集 · Performer · binary · state-of-the-art ·

2018 年 3 月 29 日

MaskRNN: Instance Level Video Object Segmentation

Yuan-Ting Hu,Jia-Bin Huang,Alexander G. Schwing

from arxiv, Accepted to NIPS 2017

Instance level video object segmentation is an important technique for video editing and compression. To capture the temporal coherence, in this paper, we develop MaskRNN, a recurrent neural net approach which fuses in each frame the output of two deep nets for each object instance -- a binary segmentation net providing a mask and a localization net providing a bounding box. Due to the recurrent component and the localization component, our method is able to take advantage of long-term temporal structures of the video data as well as rejecting outliers. We validate the proposed algorithm on three challenging benchmark datasets, the DAVIS-2016 dataset, the DAVIS-2017 dataset, and the Segtrack v2 dataset, achieving state-of-the-art performance on all of them.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

注意力機制(zhi)

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='ei3qq'></tfoot>

<legend id='ei3qq'><style id='ei3qq'><dir id='ei3qq'><q id='ei3qq'></q></dir></style></legend>

<i id='ei3qq'><tr id='ei3qq'><dt id='ei3qq'><q id='ei3qq'><span id='ei3qq'><b id='ei3qq'><form id='ei3qq'><ins id='ei3qq'></ins><ul id='ei3qq'></ul><sub id='ei3qq'></sub></form><legend id='ei3qq'></legend><bdo id='ei3qq'><pre id='ei3qq'><center id='ei3qq'></center></pre></bdo></b><th id='ei3qq'></th></span></q></dt></tr></i><div id='ei3qq'><tfoot id='ei3qq'></tfoot><dl id='ei3qq'><fieldset id='ei3qq'></fieldset></dl></div>