Masked graph autoencoder (MGAE) has emerged as a promising self-supervised graph pre-training (SGP) paradigm due to its simplicity and effectiveness. However, existing efforts perform the mask-then-reconstruct operation in the raw data space as is done in computer vision (CV) and natural language processing (NLP) areas, while neglecting the important non-Euclidean property of graph data. As a result, the highly unstable local connection structures largely increase the uncertainty in inferring masked data and decrease the reliability of the exploited self-supervision signals, leading to inferior representations for downstream evaluations. To address this issue, we propose a novel SGP method termed Robust mAsked gRaph autoEncoder (RARE) to improve the certainty in inferring masked data and the reliability of the self-supervision mechanism by further masking and reconstructing node samples in the high-order latent feature space. Through both theoretical and empirical analyses, we have discovered that performing a joint mask-then-reconstruct strategy in both latent feature and raw data spaces could yield improved stability and performance. To this end, we elaborately design a masked latent feature completion scheme, which predicts latent features of masked nodes under the guidance of high-order sample correlations that are hard to be observed from the raw data perspective. Specifically, we first adopt a latent feature predictor to predict the masked latent features from the visible ones. Next, we encode the raw data of masked samples with a momentum graph encoder and subsequently employ the resulting representations to improve predicted results through latent feature matching. Extensive experiments on seventeen datasets have demonstrated the effectiveness and robustness of RARE against state-of-the-art (SOTA) competitors across three downstream tasks.
The generation of collider data using machine learning has emerged as a prominent research topic in particle physics due to the increasing computational challenges associated with traditional Monte Carlo simulation methods, particularly for future colliders with higher luminosity. Although generating particle clouds is analogous to generating point clouds, accurately modelling the complex correlations between the particles presents a considerable challenge. Additionally, variable particle cloud sizes further exacerbate these difficulties, necessitating more sophisticated models. In this work, we propose a novel model that utilizes an attention-based aggregation mechanism to address these challenges. The model is trained in an adversarial training paradigm, ensuring that both the generator and critic exhibit permutation equivariance/invariance with respect to their input. A novel feature matching loss in the critic is introduced to stabilize the training. The proposed model performs competitively to the state-of-art whilst having significantly fewer parameters.
Advances in Large Language Models (LLMs) have inspired a surge of research exploring their expansion into the visual domain. While recent models exhibit promise in generating abstract captions for images and conducting natural conversations, their performance on text-rich images leaves room for improvement. In this paper, we propose the Contrastive Reading Model (Cream), a novel neural architecture designed to enhance the language-image understanding capability of LLMs by capturing intricate details typically overlooked by existing methods. Cream integrates vision and auxiliary encoders, complemented by a contrastive feature alignment technique, resulting in a more effective understanding of textual information within document images. Our approach, thus, seeks to bridge the gap between vision and language understanding, paving the way for more sophisticated Document Intelligence Assistants. Rigorous evaluations across diverse tasks, such as visual question answering on document images, demonstrate the efficacy of Cream as a state-of-the-art model in the field of visual document understanding. We provide our codebase and newly-generated datasets at //github.com/naver-ai/cream
Recently, various studies have been directed towards exploring dense passage retrieval techniques employing pre-trained language models, among which the masked auto-encoder (MAE) pre-training architecture has emerged as the most promising. The conventional MAE framework relies on leveraging the passage reconstruction of decoder to bolster the text representation ability of encoder, thereby enhancing the performance of resulting dense retrieval systems. Within the context of building the representation ability of the encoder through passage reconstruction of decoder, it is reasonable to postulate that a ``more demanding'' decoder will necessitate a corresponding increase in the encoder's ability. To this end, we propose a novel token importance aware masking strategy based on pointwise mutual information to intensify the challenge of the decoder. Importantly, our approach can be implemented in an unsupervised manner, without adding additional expenses to the pre-training phase. Our experiments verify that the proposed method is both effective and robust on large-scale supervised passage retrieval datasets and out-of-domain zero-shot retrieval benchmarks.
Probabilistic logical rule learning has shown great strength in logical rule mining and knowledge graph completion. It learns logical rules to predict missing edges by reasoning on existing edges in the knowledge graph. However, previous efforts have largely been limited to only modeling chain-like Horn clauses such as $R_1(x,z)\land R_2(z,y)\Rightarrow H(x,y)$. This formulation overlooks additional contextual information from neighboring sub-graphs of entity variables $x$, $y$ and $z$. Intuitively, there is a large gap here, as local sub-graphs have been found to provide important information for knowledge graph completion. Inspired by these observations, we propose Logical Entity RePresentation (LERP) to encode contextual information of entities in the knowledge graph. A LERP is designed as a vector of probabilistic logical functions on the entity's neighboring sub-graph. It is an interpretable representation while allowing for differentiable optimization. We can then incorporate LERP into probabilistic logical rule learning to learn more expressive rules. Empirical results demonstrate that with LERP, our model outperforms other rule learning methods in knowledge graph completion and is comparable or even superior to state-of-the-art black-box methods. Moreover, we find that our model can discover a more expressive family of logical rules. LERP can also be further combined with embedding learning methods like TransE to make it more interpretable.
Graphs and networks play an important role in modeling and analyzing complex interconnected systems such as transportation networks, integrated circuits, power grids, citation graphs, and biological and artificial neural networks. Graph clustering algorithms can be used to detect groups of strongly connected vertices and to derive coarse-grained models. We define transfer operators such as the Koopman operator and the Perron-Frobenius operator on graphs, study their spectral properties, introduce Galerkin projections of these operators, and illustrate how reduced representations can be estimated from data. In particular, we show that spectral clustering of undirected graphs can be interpreted in terms of eigenfunctions of the Koopman operator and propose novel clustering algorithms for directed graphs based on generalized transfer operators. We demonstrate the efficacy of the resulting algorithms on several benchmark problems and provide different interpretations of clusters.
作者 | 黃 鋒
審核 | 付海濤
?今天給大家介紹清華大學計算機科學與技術系唐杰教授課題組發表于KDD 2022上的論文“GraphMAE: Self-Supervised Masked Graph Autoencoders”。這篇論文將掩碼自編碼器MAE引入到graph領域中,在涉及三個圖學習任務的21個數據集上執行了大量的實驗,實驗結果表明在圖自編碼器上一點簡單的改進能夠產生超越最近的對比式和生成式自監督的SOTA性能。
? 生成式自監督模型在NLP和CV領域得到廣泛應用,而在graph領域對比學習占據主導地位,不論是節點分類還是圖分類任務,生成式自監督的性能都被對比學習甩“幾條街”。雖然如此,對比學習卻有著致命缺陷,它要么過度依賴于數據增廣,要么需要使用負采樣、動量更新或指數滑動平均等策略來避免訓練時陷入平凡解。而生成式自監督,特別是圖自編碼器通常目標是重建圖自身的節點特征或結構信息,則會完全規避對比學習的局限。本文發現利用圖自編碼器,稍加改進,僅僅重建節點特征便能夠獲得優越的性能。GraphMAE的改進如下圖所示: ?
概括地講,改進主要是四點:1,掩碼特征重建,不重建邊;2,不同于大多數圖自編碼器使用的均方誤差,GraphMAE使用縮放余弦誤差作為損失函數;3,將編碼器輸出的嵌入重新掩碼后再輸入到解碼器中;4,比起大多數圖自編碼器的解碼器用多層感知機,GraphMAE的解碼器使用圖神經網絡。 ?GraphMAE在無監督節點分類、無監督圖分類以及在分子性質預測上的遷移學習三個任務共21個數據集上取得了與對比學習差不多,甚至是更好的效果。
? 給定一個屬性圖,輸入到編碼器前對進行類似BERT中的掩碼操作,具體來說,GraphMAE是隨機選取一個節點子集,將這些節點的特征替換成一個可學習的向量:
? 設經過編碼器得到的編碼,繼續對先前選取的那部分節點重新掩碼,即替換為: 使用圖神經網絡作為解碼器,希望其能從未掩碼的部分編碼恢復成節點特征。
?不同于大多數圖自編碼器模型使用的均方誤差,GraphMAE使用縮放余弦誤差,假設由編碼器恢復的節點特征為,縮放余弦誤差定義為:
?三種任務:1)無監督節點分類;2)無監督圖分類;3)分子性質預測的遷移學習 ?下表是節點分類任務上的結果。首先是無監督的學習,接著固定編碼器參數得到節點的嵌入,用節點嵌入訓練一個線性分類器,列出20次隨機初始化的平均結果。編碼器和解碼器都用的是標準的圖注意力網絡。更多細節參看原文。
?下表是圖分類任務上的結果。首先無監督訓練,得到節點嵌入后經過一個無參數的池化操作得到圖級表達,接著訓練LIBSVM作為分類器,列出5次十折交叉驗證的平均結果。編碼器和解碼器都是用的圖同構網絡。更多細節參看原文。
?下表是分子性質預測的結果。首先在大數據集上無監督訓練,接著在小數據上微調。更多細節參看原文。
更多實驗結果請參看原文
Sequential recommendation as an emerging topic has attracted increasing attention due to its important practical significance. Models based on deep learning and attention mechanism have achieved good performance in sequential recommendation. Recently, the generative models based on Variational Autoencoder (VAE) have shown the unique advantage in collaborative filtering. In particular, the sequential VAE model as a recurrent version of VAE can effectively capture temporal dependencies among items in user sequence and perform sequential recommendation. However, VAE-based models suffer from a common limitation that the representational ability of the obtained approximate posterior distribution is limited, resulting in lower quality of generated samples. This is especially true for generating sequences. To solve the above problem, in this work, we propose a novel method called Adversarial and Contrastive Variational Autoencoder (ACVAE) for sequential recommendation. Specifically, we first introduce the adversarial training for sequence generation under the Adversarial Variational Bayes (AVB) framework, which enables our model to generate high-quality latent variables. Then, we employ the contrastive loss. The latent variables will be able to learn more personalized and salient characteristics by minimizing the contrastive loss. Besides, when encoding the sequence, we apply a recurrent and convolutional structure to capture global and local relationships in the sequence. Finally, we conduct extensive experiments on four real-world datasets. The experimental results show that our proposed ACVAE model outperforms other state-of-the-art methods.
Graph Convolutional Network (GCN) has been widely applied in transportation demand prediction due to its excellent ability to capture non-Euclidean spatial dependence among station-level or regional transportation demands. However, in most of the existing research, the graph convolution was implemented on a heuristically generated adjacency matrix, which could neither reflect the real spatial relationships of stations accurately, nor capture the multi-level spatial dependence of demands adaptively. To cope with the above problems, this paper provides a novel graph convolutional network for transportation demand prediction. Firstly, a novel graph convolution architecture is proposed, which has different adjacency matrices in different layers and all the adjacency matrices are self-learned during the training process. Secondly, a layer-wise coupling mechanism is provided, which associates the upper-level adjacency matrix with the lower-level one. It also reduces the scale of parameters in our model. Lastly, a unitary network is constructed to give the final prediction result by integrating the hidden spatial states with gated recurrent unit, which could capture the multi-level spatial dependence and temporal dynamics simultaneously. Experiments have been conducted on two real-world datasets, NYC Citi Bike and NYC Taxi, and the results demonstrate the superiority of our model over the state-of-the-art ones.
Text classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification. However, only a limited number of studies have explored the more flexible graph convolutional neural networks (convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classification. We build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the corpus. Our Text GCN is initialized with one-hot representation for word and document, it then jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents. Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification.
Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs---a limitation that is especially problematic for the task of graph classification, where the goal is to predict the label associated with an entire graph. Here we propose DiffPool, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. DiffPool learns a differentiable soft cluster assignment for nodes at each layer of a deep GNN, mapping nodes to a set of clusters, which then form the coarsened input for the next GNN layer. Our experimental results show that combining existing GNN methods with DiffPool yields an average improvement of 5-10% accuracy on graph classification benchmarks, compared to all existing pooling approaches, achieving a new state-of-the-art on four out of five benchmark data sets.