A key question in many network studies is whether the observed correlations between units are primarily due to contagion or latent confounding. Here, we study this question using a segregated graph (Shpitser, 2015) representation of these mechanisms, and examine how uncertainty about the true underlying mechanism impacts downstream computation of network causal effects, particularly under full interference -- settings where we only have a single realization of a network and each unit may depend on any other unit in the network. Under certain assumptions about asymptotic growth of the network, we derive likelihood ratio tests that can be used to identify whether different sets of variables -- confounders, treatments, and outcomes -- across units exhibit dependence due to contagion or latent confounding. We then propose network causal effect estimation strategies that provide unbiased and consistent estimates if the dependence mechanisms are either known or correctly inferred using our proposed tests. Together, the proposed methods allow network effect estimation in a wider range of full interference scenarios that have not been considered in prior work. We evaluate the effectiveness of our methods with synthetic data and the validity of our assumptions using real-world networks.
Information Retrieval (IR) methods aim to identify documents relevant to a query, which have been widely applied in various natural language tasks. However, existing approaches typically consider only the textual content within documents, overlooking the fact that documents can contain multiple modalities, including images and tables. Also, they often segment each long document into multiple discrete passages for embedding, which prevents them from capturing the overall document context and interactions between paragraphs. To address these two challenges, we propose a method that holistically embeds documents interleaved with multiple modalities by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation. Moreover, to mitigate the information loss from segmenting documents into passages, instead of representing and retrieving passages individually, we further merge the representations of segmented passages into one single document representation, while we additionally introduce a reranking strategy to decouple and identify the relevant passage within the document if necessary. Then, through extensive experiments on diverse IR scenarios considering both the textual and multimodal queries, we show that our approach substantially outperforms relevant baselines, thanks to the consideration of the multimodal information within documents.
Existing knowledge distillation (KD) methods have demonstrated their ability in achieving student network performance on par with their teachers. However, the knowledge gap between the teacher and student remains significant and may hinder the effectiveness of the distillation process. In this work, we introduce the structure of Neural Collapse (NC) into the KD framework. NC typically occurs in the final phase of training, resulting in a graceful geometric structure where the last-layer features form a simplex equiangular tight frame. Such phenomenon has improved the generalization of deep network training. We hypothesize that NC can also alleviate the knowledge gap in distillation, thereby enhancing student performance. This paper begins with an empirical analysis to bridge the connection between knowledge distillation and neural collapse. Through this analysis, we establish that transferring the teacher's NC structure to the student benefits the distillation process. Therefore, instead of merely transferring instance-level logits or features, as done by existing distillation methods, we encourage students to learn the teacher's NC structure. Thereby, we propose a new distillation paradigm termed Neural Collapse-inspired Knowledge Distillation (NCKD). Comprehensive experiments demonstrate that NCKD is simple yet effective, improving the generalization of all distilled student models and achieving state-of-the-art accuracy performance.
Graph Neural Networks (GNNs) are currently one of the most powerful types of neural network architectures. Their advantage lies in the ability to leverage both the graph topology, which represents the relationships between samples, and the features of the samples themselves. However, the given graph topology often contains noisy edges, and GNNs are vulnerable to noise in the graph structure. This issue remains unresolved. In this paper, we propose using adversarial robustness evaluation to select a small subset of robust nodes that are less affected by noise. We then only feed the features of these robust nodes, along with the KNN graph constructed from these nodes, into the GNN for classification. Additionally, we compute the centroids for each class. For the remaining non-robust nodes, we assign them to the class whose centroid is closest to them. Experimental results show that this method significantly improves the accuracy of GNNs.
Despite the crucial need for formal safety and security verification of programs, discovering loop invariants remains a significant challenge. Static analysis is a primary technique for inferring loop invariants but often relies on substantial assumptions about underlying theories. Data-driven methods supported by dynamic analysis and machine learning algorithms have shown impressive performance in inferring loop invariants for some challenging programs. However, state-of-the-art data-driven techniques do not offer theoretical guarantees for finding loop invariants. We present a novel technique that leverages the simulated annealing (SA) search algorithm combined with SMT solvers and computational geometry to provide probabilistic guarantees for inferring loop invariants using data-driven methods. Our approach enhances the SA search with real analysis to define the search space and employs parallelism to increase the probability of success. To ensure the convergence of our algorithm, we adapt e-nets, a key concept from computational geometry. Our tool, DLIA2, implements these algorithms and demonstrates competitive performance against state-of-the-art techniques. We also identify a subclass of programs, on which we outperform the current state-of-the-art tool GSpacer.
Join order selection is a sub-field of query optimization that aims to find the optimal join order for an SQL query with the minimum cost. The challenge lies in the exponentially growing search space as the number of tables increases, making exhaustive enumeration impractical. Traditional optimizers use static heuristics to prune the search space, but they often fail to adapt to changes or improve based on feedback from the DBMS. Recent research addresses these limitations with Deep Reinforcement Learning (DRL), allowing models to use feedback to dynamically search for better join orders and enhance performance over time. Existing research primarily focuses on capturing join order sequences and their representations at various levels, with limited comparative analysis of reinforcement learning methods. In this paper, we propose GTDD, a novel framework that integrates Graph Neural Networks (GNN), Treestructured Long Short-Term Memory (Tree LSTM), and DuelingDQN. We conduct a series of experiments that demonstrate a clear advantage of GTDD over state-of the-art techniques.
Recently, a considerable literature has grown up around the theme of Graph Convolutional Network (GCN). How to effectively leverage the rich structural information in complex graphs, such as knowledge graphs with heterogeneous types of entities and relations, is a primary open challenge in the field. Most GCN methods are either restricted to graphs with a homogeneous type of edges (e.g., citation links only), or focusing on representation learning for nodes only instead of jointly propagating and updating the embeddings of both nodes and edges for target-driven objectives. This paper addresses these limitations by proposing a novel framework, namely the Knowledge Embedding based Graph Convolutional Network (KE-GCN), which combines the power of GCNs in graph-based belief propagation and the strengths of advanced knowledge embedding (a.k.a. knowledge graph embedding) methods, and goes beyond. Our theoretical analysis shows that KE-GCN offers an elegant unification of several well-known GCN methods as specific cases, with a new perspective of graph convolution. Experimental results on benchmark datasets show the advantageous performance of KE-GCN over strong baseline methods in the tasks of knowledge graph alignment and entity classification.
This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks. We assume no math knowledge beyond what you learned in calculus 1, and provide links to help you refresh the necessary math where needed. Note that you do not need to understand this material before you start learning to train and use deep learning in practice; rather, this material is for those who are already familiar with the basics of neural networks, and wish to deepen their understanding of the underlying math. Don't worry if you get stuck at some point along the way---just go back and reread the previous section, and try writing down and working through some examples. And if you're still stuck, we're happy to answer your questions in the Theory category at forums.fast.ai. Note: There is a reference section at the end of the paper summarizing all the key matrix calculus rules and terminology discussed here. See related articles at //explained.ai
Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.
The amount of publicly available biomedical literature has been growing rapidly in recent years, yet question answering systems still struggle to exploit the full potential of this source of data. In a preliminary processing step, many question answering systems rely on retrieval models for identifying relevant documents and passages. This paper proposes a weighted cosine distance retrieval scheme based on neural network word embeddings. Our experiments are based on publicly available data and tasks from the BioASQ biomedical question answering challenge and demonstrate significant performance gains over a wide range of state-of-the-art models.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.