青青国产成人久久激情91_精品国产一区二区三区日日嗨_国产男女猛烈无遮挡高清视频_国产精品久久久久久2021_最新国产精品一区免费视频_九九精品插国产视频_国产成人福利一区二区三区一

Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, education, etc. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators' diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose \emph{NeuCrowd}, a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality \emph{n}-tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at \url{//github.com/tal-ai/NeuCrowd_KAIS2021}.

相關內容

表示學習

關注 186

表(biao)(biao)示學(xue)習(xi)(xi)是通過利(li)用(yong)訓(xun)練數(shu)(shu)據(ju)來(lai)學(xue)習(xi)(xi)得到向(xiang)量表(biao)(biao)示，這(zhe)可以克(ke)服人工方(fang)法的(de)(de)(de)局(ju)限(xian)性(xing)。表(biao)(biao)示學(xue)習(xi)(xi)通常(chang)可分(fen)(fen)為(wei)(wei)兩大類，無(wu)監(jian)(jian)督(du)(du)和(he)(he)(he)有(you)監(jian)(jian)督(du)(du)表(biao)(biao)示學(xue)習(xi)(xi)。大多數(shu)(shu)無(wu)監(jian)(jian)督(du)(du)表(biao)(biao)示學(xue)習(xi)(xi)方(fang)法利(li)用(yong)自動編碼器（如去噪自動編碼器和(he)(he)(he)稀(xi)疏自動編碼器等）中的(de)(de)(de)隱變量作為(wei)(wei)表(biao)(biao)示。目(mu)前(qian)出(chu)現的(de)(de)(de)變分(fen)(fen)自動編碼器能(neng)(neng)夠更好的(de)(de)(de)容忍噪聲和(he)(he)(he)異常(chang)值。然而，推斷給(gei)定數(shu)(shu)據(ju)的(de)(de)(de)潛在結構幾(ji)乎(hu)是不可能(neng)(neng)的(de)(de)(de)。目(mu)前(qian)有(you)一些近似(si)推斷的(de)(de)(de)策(ce)略。此(ci)外，一些無(wu)監(jian)(jian)督(du)(du)表(biao)(biao)示學(xue)習(xi)(xi)方(fang)法旨(zhi)在近似(si)某種特(te)定的(de)(de)(de)相似(si)性(xing)度量。提出(chu)了一種無(wu)監(jian)(jian)督(du)(du)的(de)(de)(de)相似(si)性(xing)保持(chi)(chi)表(biao)(biao)示學(xue)習(xi)(xi)框架，該框架使用(yong)矩陣分(fen)(fen)解(jie)來(lai)保持(chi)(chi)成對的(de)(de)(de)DTW相似(si)性(xing)。通過學(xue)習(xi)(xi)保持(chi)(chi)DTW的(de)(de)(de)shaplets，即在轉換后的(de)(de)(de)空間中的(de)(de)(de)歐式距(ju)(ju)離(li)近似(si)原始(shi)數(shu)(shu)據(ju)的(de)(de)(de)真實DTW距(ju)(ju)離(li)。有(you)監(jian)(jian)督(du)(du)表(biao)(biao)示學(xue)習(xi)(xi)方(fang)法可以利(li)用(yong)數(shu)(shu)據(ju)的(de)(de)(de)標簽信息，更好地捕獲(huo)數(shu)(shu)據(ju)的(de)(de)(de)語義結構。孿生網(wang)絡和(he)(he)(he)三(san)元(yuan)組網(wang)絡是目(mu)前(qian)兩種比較(jiao)流行的(de)(de)(de)模型，它(ta)們的(de)(de)(de)目(mu)標是最(zui)大化類別之間的(de)(de)(de)距(ju)(ju)離(li)并(bing)最(zui)小化了類別內部的(de)(de)(de)距(ju)(ju)離(li)。

噪聲 · Backbone · 標注 · Networking · 穩健性 ·

2022 年 2 月 17 日

PENCIL: Deep Learning with Noisy Labels

Kun Yi,Guo-Hua Wang,Jianxin Wu

from arxiv, arXiv admin note: substantial text overlap with arXiv:1903.07788

Deep learning has achieved excellent performance in various computer vision tasks, but requires a lot of training examples with clean labels. It is easy to collect a dataset with noisy labels, but such noise makes networks overfit seriously and accuracies drop dramatically. To address this problem, we propose an end-to-end framework called PENCIL, which can update both network parameters and label estimations as label distributions. PENCIL is independent of the backbone network structure and does not need an auxiliary clean dataset or prior information about noise, thus it is more general and robust than existing methods and is easy to apply. PENCIL can even be used repeatedly to obtain better performance. PENCIL outperforms previous state-of-the-art methods by large margins on both synthetic and real-world datasets with different noise types and noise rates. And PENCIL is also effective in multi-label classification tasks through adding a simple attention structure on backbone networks. Experiments show that PENCIL is robust on clean datasets, too.

Neural Networks · 學成 · Networking · MoDELS · 深度學習 ·

2021 年 10 月 5 日

Deep Neural Networks and Tabular Data: A Survey

Vadim Borisov,Tobias Leemann,Kathrin Se?ler,Johannes Haug,Martin Pawelczyk,Gjergji Kasneci

Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous data sets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their application to modeling tabular data (inference or generation) remains highly challenging. This work provides an overview of state-of-the-art deep learning methods for tabular data. We start by categorizing them into three groups: data transformations, specialized architectures, and regularization models. We then provide a comprehensive overview of the main approaches in each group. A discussion of deep learning approaches for generating tabular data is complemented by strategies for explaining deep models on tabular data. Our primary contribution is to address the main research streams and existing methodologies in this area, while highlighting relevant challenges and open research questions. To the best of our knowledge, this is the first in-depth look at deep learning approaches for tabular data. This work can serve as a valuable starting point and guide for researchers and practitioners interested in deep learning with tabular data.

無監督 · 表示學習 · 學成 · CASES · state-of-the-art ·

2021 年 4 月 29 日

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Christoph Feichtenhofer,Haoqi Fan,Bo Xiong,Ross Girshick,Kaiming He

from arxiv, CVPR 2021

We present a large-scale study on unsupervised spatiotemporal representation learning from videos. With a unified perspective on four recent image-based frameworks, we study a simple objective that can easily generalize all these methods to space-time. Our objective encourages temporally-persistent features in the same video, and in spite of its simplicity, it works surprisingly well across: (i) different unsupervised frameworks, (ii) pre-training datasets, (iii) downstream datasets, and (iv) backbone architectures. We draw a series of intriguing observations from this study, e.g., we discover that encouraging long-spanned persistency can be effective even if the timespan is 60 seconds. In addition to state-of-the-art results in multiple benchmarks, we report a few promising cases in which unsupervised pre-training can outperform its supervised counterpart. Code is made available at //github.com/facebookresearch/SlowFast

類別 · 學成 · MoDELS · Performer · Better ·

2021 年 2 月 15 日

OntoZSL: Ontology-enhanced Zero-shot Learning

Yuxia Geng,Jiaoyan Chen,Zhuo Chen,Jeff Z. Pan,Zhiquan Ye,Zonggang Yuan,Yantao Jia,Huajun Chen

from arxiv, Accepted to The Web Conference (WWW) 2021

Zero-shot Learning (ZSL), which aims to predict for those classes that have never appeared in the training data, has arisen hot research interests. The key of implementing ZSL is to leverage the prior knowledge of classes which builds the semantic relationship between classes and enables the transfer of the learned models (e.g., features) from training classes (i.e., seen classes) to unseen classes. However, the priors adopted by the existing methods are relatively limited with incomplete semantics. In this paper, we explore richer and more competitive prior knowledge to model the inter-class relationship for ZSL via ontology-based knowledge representation and semantic embedding. Meanwhile, to address the data imbalance between seen classes and unseen classes, we developed a generative ZSL framework with Generative Adversarial Networks (GANs). Our main findings include: (i) an ontology-enhanced ZSL framework that can be applied to different domains, such as image classification (IMGC) and knowledge graph completion (KGC); (ii) a comprehensive evaluation with multiple zero-shot datasets from different domains, where our method often achieves better performance than the state-of-the-art models. In particular, on four representative ZSL baselines of IMGC, the ontology-based class semantics outperform the previous priors e.g., the word embeddings of classes by an average of 12.4 accuracy points in the standard ZSL across two example datasets (see Figure 4).

XLM-R · Performer · MoDELS · 語言模型化 · 縮放 ·

2019 年 11 月 5 日

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau,Kartikay Khandelwal,Naman Goyal,Vishrav Chaudhary,Guillaume Wenzek,Francisco Guzmán,Edouard Grave,Myle Ott,Luke Zettlemoyer,Veselin Stoyanov

from arxiv, 12 pages, 7 figures

This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +13.8% average accuracy on XNLI, +12.3% average F1 score on MLQA, and +2.1% average F1 score on NER. XLM-R performs particularly well on low-resource languages, improving 11.8% in XNLI accuracy for Swahili and 9.2% for Urdu over the previous XLM model. We also present a detailed empirical evaluation of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing per-language performance; XLM-Ris very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We will make XLM-R code, data, and models publicly available.

entity · 鏈路預測 · Performer · 圖 · 知識圖譜 ·

2019 年 9 月 26 日

Representation Learning with Ordered Relation Paths for Knowledge Graph Completion

Yao Zhu,Hongzhi Liu,Zhonghai Wu,Yang Song,Tao Zhang

Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.

網絡嵌入 · Networking · Networks · 表示學習 · 學成 ·

2019 年 1 月 7 日

Deep Network Embedding for Graph Representation Learning in Signed Networks

Xiao Shen,Fu-Lai Chung

Network embedding has attracted an increasing attention over the past few years. As an effective approach to solve graph mining problems, network embedding aims to learn a low-dimensional feature vector representation for each node of a given network. The vast majority of existing network embedding algorithms, however, are only designed for unsigned networks, and the signed networks containing both positive and negative links, have pretty distinct properties from the unsigned counterpart. In this paper, we propose a deep network embedding model to learn the low-dimensional node vector representations with structural balance preservation for the signed networks. The model employs a semi-supervised stacked auto-encoder to reconstruct the adjacency connections of a given signed network. As the adjacency connections are overwhelmingly positive in the real-world signed networks, we impose a larger penalty to make the auto-encoder focus more on reconstructing the scarce negative links than the abundant positive links. In addition, to preserve the structural balance property of signed networks, we design the pairwise constraints to make the positively connected nodes much closer than the negatively connected nodes in the embedding space. Based on the network representations learned by the proposed model, we conduct link sign prediction and community detection in signed networks. Extensive experimental results in real-world datasets demonstrate the superiority of the proposed model over the state-of-the-art network embedding algorithms for graph representation learning in signed networks.

圖 · 正則化項 · 鏈路預測 · 自編碼器 · 向量化 ·

2019 年 1 月 4 日

Learning Graph Embedding with Adversarial Training Methods

Shirui Pan,Ruiqi Hu,Sai-fu Fung,Guodong Long,Jing Jiang,Chengqi Zhang

from arxiv, arXiv admin note: substantial text overlap with arXiv:1802.04407

Graph embedding aims to transfer a graph into vectors to facilitate subsequent graph analytics tasks like link prediction and graph clustering. Most approaches on graph embedding focus on preserving the graph structure or minimizing the reconstruction errors for graph data. They have mostly overlooked the embedding distribution of the latent codes, which unfortunately may lead to inferior representation in many cases. In this paper, we present a novel adversarially regularized framework for graph embedding. By employing the graph convolutional network as an encoder, our framework embeds the topological information and node content into a vector representation, from which a graph decoder is further built to reconstruct the input graph. The adversarial training principle is applied to enforce our latent codes to match a prior Gaussian or Uniform distribution. Based on this framework, we derive two variants of adversarial models, the adversarially regularized graph autoencoder (ARGA) and its variational version, adversarially regularized variational graph autoencoder (ARVGA), to learn the graph embedding effectively. We also exploit other potential variations of ARGA and ARVGA to get a deeper understanding on our designs. Experimental results compared among twelve algorithms for link prediction and twenty algorithms for graph clustering validate our solutions.

圖 · 學成 · state-of-the-art · GNN · 表示學習 ·

2018 年 6 月 26 日

Hierarchical Graph Representation Learning with Differentiable Pooling

Rex Ying,Jiaxuan You,Christopher Morris,Xiang Ren,William L. Hamilton,Jure Leskovec

Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs---a limitation that is especially problematic for the task of graph classification, where the goal is to predict the label associated with an entire graph. Here we propose DiffPool, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. DiffPool learns a differentiable soft cluster assignment for nodes at each layer of a deep GNN, mapping nodes to a set of clusters, which then form the coarsened input for the next GNN layer. Our experimental results show that combining existing GNN methods with DiffPool yields an average improvement of 5-10% accuracy on graph classification benchmarks, compared to all existing pooling approaches, achieving a new state-of-the-art on four out of five benchmark data sets.

DeepWalk · Networking · 學成 · 無監督特征學習 · 潛在 ·

2014 年 6 月 27 日

DeepWalk: Online Learning of Social Representations

Bryan Perozzi,Rami Al-Rfou,Steven Skiena

from arxiv, 10 pages, 5 figures, 4 tables

We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide $F_1$ scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk's representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.