五月丁香四月婷婷激情综合_黄片小视频色多多_在线观看亚洲免费_污视频免费看网站无码高清在线_黄色录像网站一个片_国产日韩久久久噜噜噜久久_亚洲AV无码一区二三区观看

A common yet challenging scenario in periocular biometrics is cross-spectral matching - in particular, the matching of visible wavelength against near-infrared (NIR) periocular images. We propose a novel approach to cross-spectral periocular verification that primarily focuses on learning a mapping from visible and NIR periocular images to a shared latent representational subspace, and supports this effort by simultaneously learning intra-spectral image reconstruction. We show the auxiliary image reconstruction task (and in particular the reconstruction of high-level, semantic features) results in learning a more discriminative, domain-invariant subspace compared to the baseline while incurring no additional computational or memory costs at test-time. The proposed Coupled Conditional Generative Adversarial Network (CoGAN) architecture uses paired generator networks (one operating on visible images and the other on NIR) composed of U-Nets with ResNet-18 encoders trained for feature learning via contrastive loss and for intra-spectral image reconstruction with adversarial, pixel-based, and perceptual reconstruction losses. Moreover, the proposed CoGAN model beats the current state-of-art (SotA) in cross-spectral periocular recognition. On the Hong Kong PolyU benchmark dataset, we achieve 98.65% AUC and 5.14% EER compared to the SotA EER of 8.02%. On the Cross-Eyed dataset, we achieve 99.31% AUC and 3.99% EER versus SotA EER of 4.39%.

相關內容

表(biao)征學習

關注 151

在機(ji)器(qi)學習(xi)中(zhong)(zhong)(zhong)，表(biao)(biao)(biao)征(zheng)(zheng)(zheng)(zheng)學習(xi)或表(biao)(biao)(biao)示學習(xi)是允(yun)許系統從原始數(shu)據(ju)中(zhong)(zhong)(zhong)自動(dong)發現特征(zheng)(zheng)(zheng)(zheng)檢測或分類(lei)所需(xu)的表(biao)(biao)(biao)示的一組技術(shu)。這取代了手動(dong)特征(zheng)(zheng)(zheng)(zheng)工程，并允(yun)許機(ji)器(qi)學習(xi)特征(zheng)(zheng)(zheng)(zheng)并使(shi)用它們執行(xing)特定任(ren)務。在有監(jian)督(du)的表(biao)(biao)(biao)征(zheng)(zheng)(zheng)(zheng)學習(xi)中(zhong)(zhong)(zhong)，使(shi)用標記(ji)的輸(shu)(shu)入數(shu)據(ju)來(lai)學習(xi)特征(zheng)(zheng)(zheng)(zheng)，包括監(jian)督(du)神經網絡(luo)，多層感知器(qi)和（監(jian)督(du)）字典學習(xi)。在無(wu)監(jian)督(du)表(biao)(biao)(biao)征(zheng)(zheng)(zheng)(zheng)學習(xi)中(zhong)(zhong)(zhong)，特征(zheng)(zheng)(zheng)(zheng)是與未標記(ji)的輸(shu)(shu)入數(shu)據(ju)一起學習(xi)的，包括字典學習(xi)，獨立成分分析(xi)，自動(dong)編碼(ma)器(qi)，矩(ju)陣分解和各(ge)種形式的聚類(lei)。

泛化理論 · 排版引擎（瀏覽器） · 機器人 · Performer · 模型評估 ·

2022 年 1 月 20 日

DFBVS: Deep Feature-Based Visual Servo

Nicholas Adrian,Do Van Thach,Pham Quang Cuong

Classical Visual Servoing (VS) rely on handcrafted visual features, which limit their generalizability. Recently, a number of approaches, some based on Deep Neural Networks, have been proposed to overcome this limitation by comparing directly the entire target and current camera images. However, by getting rid of the visual features altogether, those approaches require the target and current images to be essentially similar, which precludes the generalization to unknown, cluttered, scenes. Here we propose to perform VS based on visual features as in classical VS approaches but, contrary to the latter, we leverage recent breakthroughs in Deep Learning to automatically extract and match the visual features. By doing so, our approach enjoys the advantages from both worlds: (i) because our approach is based on visual features, it is able to steer the robot towards the object of interest even in presence of significant distraction in the background; (ii) because the features are automatically extracted and matched, our approach can easily and automatically generalize to unseen objects and scenes. In addition, we propose to use a render engine to synthesize the target image, which offers a further level of generalization. We demonstrate these advantages in a robotic grasping task, where the robot is able to steer, with high accuracy, towards the object to grasp, based simply on an image of the object rendered from the camera view corresponding to the desired robot grasping pose.

圖 · 學成 · 近似 · Neural Networks · Performer ·

2021 年 6 月 21 日

BernNet: Learning Arbitrary Graph Spectral Filters via Bernstein Approximation

Mingguo He,Zhewei Wei,Zengfeng Huang,Hongteng Xu

from arxiv, 14 pages, 31 figures

Many representative graph neural networks, $e.g.$, GPR-GNN and ChebyNet, approximate graph convolutions with graph spectral filters. However, existing work either applies predefined filter weights or learns them without necessary constraints, which may lead to oversimplified or ill-posed filters. To overcome these issues, we propose $\textit{BernNet}$, a novel graph neural network with theoretical support that provides a simple but effective scheme for designing and learning arbitrary graph spectral filters. In particular, for any filter over the normalized Laplacian spectrum of a graph, our BernNet estimates it by an order-$K$ Bernstein polynomial approximation and designs its spectral property by setting the coefficients of the Bernstein basis. Moreover, we can learn the coefficients (and the corresponding filter weights) based on observed graphs and their associated signals and thus achieve the BernNet specialized for the data. Our experiments demonstrate that BernNet can learn arbitrary spectral filters, including complicated band-rejection and comb filters, and it achieves superior performance in real-world graph modeling tasks.

FC · 層 · Performer · 卷積 · 表示容量 ·

2021 年 5 月 5 日

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

Xiaohan Ding,Xiangyu Zhang,Jungong Han,Guiguang Ding

from arxiv, Work in progress

We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition, which is composed of a series of fully-connected (FC) layers. Compared to convolutional layers, FC layers are more efficient, better at modeling the long-range dependencies and positional patterns, but worse at capturing the local structures, hence usually less favored for image recognition. We propose a structural re-parameterization technique that adds local prior into an FC to make it powerful for image recognition. Specifically, we construct convolutional layers inside a RepMLP during training and merge them into the FC for inference. On CIFAR, a simple pure-MLP model shows performance very close to CNN. By inserting RepMLP in traditional CNN, we improve ResNets by 1.8% accuracy on ImageNet, 2.9% for face recognition, and 2.3% mIoU on Cityscapes with lower FLOPs. Our intriguing findings highlight that combining the global representational capacity and positional perception of FC with the local prior of convolution can improve the performance of neural network with faster speed on both the tasks with translation invariance (e.g., semantic segmentation) and those with aligned images and positional patterns (e.g., face recognition). The code and models are available at //github.com/DingXiaoH/RepMLP.

FRN · INFORMS · Networking · MoDELS · 學成 ·

2021 年 4 月 12 日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Delian Ruan, YanYan,Shenqi Lai,Zhenhua Chai,Chunhua Shen,Hanzi Wang

from arxiv, IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 (CVPR 2021)

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

contrastive · 學成 · 對比學習 · Extensibility · SSL ·

2020 年 6 月 18 日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Krishna Chaitanya,Ertunc Erdil,Neerav Karani,Ender Konukoglu

from arxiv, 16 pages, 2 figures, 7 tables. This article is a pre-print and is currently under review at a conference

A key requirement for the success of supervised deep learning is a large labeled dataset - a condition that is difficult to meet in medical image analysis. Self-supervised learning (SSL) can help in this regard by providing a strategy to pre-train a neural network with unlabeled data, followed by fine-tuning for a downstream task with limited annotations. Contrastive learning, a particular variant of SSL, is a powerful technique for learning image-level representations. In this work, we propose strategies for extending the contrastive learning framework for segmentation of volumetric medical images in the semi-supervised setting with limited annotations, by leveraging domain-specific and problem-specific cues. Specifically, we propose (1) novel contrasting strategies that leverage structural similarity across volumetric medical images (domain-specific cue) and (2) a local version of the contrastive loss to learn distinctive representations of local regions that are useful for per-pixel segmentation (problem-specific cue). We carry out an extensive evaluation on three Magnetic Resonance Imaging (MRI) datasets. In the limited annotation setting, the proposed method yields substantial improvements compared to other self-supervision and semi-supervised learning techniques. When combined with a simple data augmentation technique, the proposed method reaches within 8% of benchmark performance using only two labeled MRI volumes for training, corresponding to only 4% (for ACDC) of the training data used to train the benchmark.

命名實體識別 · entity · 遷移學習 · Extensibility · 學成 ·

2018 年 4 月 28 日

Label-aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition

Zhenghui Wang,Yanru Qu,Liheng Chen,Jian Shen,Weinan Zhang,Shaodian Zhang,Yimei Gao,Gen Gu,Ken Chen,Yong Yu

from arxiv, NAACL HLT 2018

We study the problem of named entity recognition (NER) from electronic medical records, which is one of the most fundamental and critical problems for medical text mining. Medical records which are written by clinicians from different specialties usually contain quite different terminologies and writing styles. The difference of specialties and the cost of human annotation makes it particularly difficult to train a universal medical NER system. In this paper, we propose a label-aware double transfer learning framework (La-DTL) for cross-specialty NER, so that a medical NER system designed for one specialty could be conveniently applied to another one with minimal annotation efforts. The transferability is guaranteed by two components: (i) we propose label-aware MMD for feature representation transfer, and (ii) we perform parameter transfer with a theoretical upper bound which is also label aware. We conduct extensive experiments on 12 cross-specialty NER tasks. The experimental results demonstrate that La-DTL provides consistent accuracy improvement over strong baselines. Besides, the promising experimental results on non-medical NER scenarios indicate that La-DTL is potential to be seamlessly adapted to a wide range of NER tasks.

目標領域 · 未標記 · 學成 · 無監督 · 遷移學習 ·

2018 年 3 月 20 日

Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatial-Temporal Patterns

Jianming Lv,Weihang Chen,Qing Li,Can Yang

from arxiv, Accepted by CVPR 2018

Most of the proposed person re-identification algorithms conduct supervised training and testing on single labeled datasets with small size, so directly deploying these trained models to a large-scale real-world camera network may lead to poor performance due to underfitting. It is challenging to incrementally optimize the models by using the abundant unlabeled data collected from the target domain. To address this challenge, we propose an unsupervised incremental learning algorithm, TFusion, which is aided by the transfer learning of the pedestrians' spatio-temporal patterns in the target domain. Specifically, the algorithm firstly transfers the visual classifier trained from small labeled source dataset to the unlabeled target dataset so as to learn the pedestrians' spatial-temporal patterns. Secondly, a Bayesian fusion model is proposed to combine the learned spatio-temporal patterns with visual features to achieve a significantly improved classifier. Finally, we propose a learning-to-rank based mutual promotion procedure to incrementally optimize the classifiers based on the unlabeled data in the target domain. Comprehensive experiments based on multiple real surveillance datasets are conducted, and the results show that our algorithm gains significant improvement compared with the state-of-art cross-dataset unsupervised person re-identification algorithms.

學成 · 可約的 · 粵港澳大灣區數字經濟研究院 · state-of-the-art · 視頻分類 ·

2018 年 3 月 14 日

Learning Representative Temporal Features for Action Recognition

Ali Javidani,Ahmad Mahmoudi-Aznaveh

from arxiv, 5 pages

In this paper, a novel video classification methodology is presented that aims to recognize different categories of third-person videos efficiently. The idea is to keep track of motion in videos by following optical flow elements over time. To classify the resulted motion time series efficiently, the idea is letting the machine to learn temporal features along the time dimension. This is done by training a multi-channel one dimensional Convolutional Neural Network (1D-CNN). Since CNNs represent the input data hierarchically, high level features are obtained by further processing of features in lower level layers. As a result, in the case of time series, long-term temporal features are extracted from short-term ones. Besides, the superiority of the proposed method over most of the deep-learning based approaches is that we only try to learn representative temporal features along the time dimension. This reduces the number of learning parameters significantly which results in trainability of our method on even smaller datasets. It is illustrated that the proposed method could reach state-of-the-art results on two public datasets UCF11 and jHMDB with the aid of a more efficient feature vector representation.

INFORMS · 學成 · Re-ID · Extensibility · Performer ·

2018 年 2 月 22 日

Video Person Re-identification by Temporal Residual Learning

Ju Dai,Pingping Zhang,Huchuan Lu,Hongyu Wang

from arxiv, Submitted to IEEE Transactions on Image Processing, including 5 figures and 4 tables. The first two authors contribute equally to this work

In this paper, we propose a novel feature learning framework for video person re-identification (re-ID). The proposed framework largely aims to exploit the adequate temporal information of video sequences and tackle the poor spatial alignment of moving pedestrians. More specifically, for exploiting the temporal information, we design a temporal residual learning (TRL) module to simultaneously extract the generic and specific features of consecutive frames. The TRL module is equipped with two bi-directional LSTM (BiLSTM), which are respectively responsible to describe a moving person in different aspects, providing complementary information for better feature representations. To deal with the poor spatial alignment in video re-ID datasets, we propose a spatial-temporal transformer network (ST^2N) module. Transformation parameters in the ST^2N module are learned by leveraging the high-level semantic information of the current frame as well as the temporal context knowledge from other frames. The proposed ST^2N module with less learnable parameters allows effective person alignments under significant appearance changes. Extensive experimental results on the large-scale MARS, PRID2011, ILIDS-VID and SDU-VID datasets demonstrate that the proposed method achieves consistently superior performance and outperforms most of the very recent state-of-the-art methods.

條件隨機場 · 隨機場 · INFORMS · 圖像分割 · 卷積神經網絡 ·

2017 年 12 月 27 日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Fahim Irfan Alam,Jun Zhou,Alan Wee-Chung Liew,Xiuping Jia,Jocelyn Chanussot,Yongsheng Gao

from arxiv, Submitted for Journal (Version 2)

Image segmentation is considered to be one of the critical tasks in hyperspectral remote sensing image processing. Recently, convolutional neural network (CNN) has established itself as a powerful model in segmentation and classification by demonstrating excellent performances. The use of a graphical model such as a conditional random field (CRF) contributes further in capturing contextual information and thus improving the segmentation performance. In this paper, we propose a method to segment hyperspectral images by considering both spectral and spatial information via a combined framework consisting of CNN and CRF. We use multiple spectral cubes to learn deep features using CNN, and then formulate deep CRF with CNN-based unary and pairwise potential functions to effectively extract the semantic correlations between patches consisting of three-dimensional data cubes. Effective piecewise training is applied in order to avoid the computationally expensive iterative CRF inference. Furthermore, we introduce a deep deconvolution network that improves the segmentation masks. We also introduce a new dataset and experimented our proposed method on it along with several widely adopted benchmark datasets to evaluate the effectiveness of our method. By comparing our results with those from several state-of-the-art models, we show the promising potential of our method.