With the explosion of graph-structured data, link prediction has emerged as an increasingly important task. Embedding methods for link prediction utilize neural networks to generate node embeddings, which are subsequently employed to predict links between nodes. However, the existing embedding methods typically take a holistic strategy to learn node embeddings and ignore the entanglement of latent factors. As a result, entangled embeddings fail to effectively capture the underlying information and are vulnerable to irrelevant information, leading to unconvincing and uninterpretable link prediction results. To address these challenges, this paper proposes a novel framework with two variants, the disentangled graph auto-encoder (DGAE) and the variational disentangled graph auto-encoder (VDGAE). Our work provides a pioneering effort to apply the disentanglement strategy to link prediction. The proposed framework infers the latent factors that cause edges in the graph and disentangles the representation into multiple channels corresponding to unique latent factors, which contributes to improving the performance of link prediction. To further encourage the embeddings to capture mutually exclusive latent factors, we introduce mutual information regularization to enhance the independence among different channels. Extensive experiments on various real-world benchmarks demonstrate that our proposed methods achieve state-of-the-art results compared to a variety of strong baselines on link prediction tasks. Qualitative analysis on the synthetic dataset also illustrates that the proposed methods can capture distinct latent factors that cause links, providing empirical evidence that our models are able to explain the results of link prediction to some extent. All code will be made publicly available upon publication of the paper.
All analog signal processing is fundamentally subject to noise, and this is also the case in modern implementations of Optical Neural Networks (ONNs). Therefore, to mitigate noise in ONNs, we propose two designs that are constructed from a given, possibly trained, Neural Network (NN) that one wishes to implement. Both designs have the capability that the resulting ONNs gives outputs close to the desired NN. To establish the latter, we analyze the designs mathematically. Specifically, we investigate a probabilistic framework for the first design that establishes that the design is correct, i.e., for any feed-forward NN with Lipschitz continuous activation functions, an ONN can be constructed that produces output arbitrarily close to the original. ONNs constructed with the first design thus also inherit the universal approximation property of NNs. For the second design, we restrict the analysis to NNs with linear activation functions and characterize the ONNs' output distribution using exact formulas. Finally, we report on numerical experiments with LeNet ONNs that give insight into the number of components required in these designs for certain accuracy gains. We specifically study the effect of noise as a function of the depth of an ONN. The results indicate that in practice, adding just a few components in the manner of the first or the second design can already be expected to increase the accuracy of ONNs considerably.
The emergence of deep-learning-based methods to solve image-reconstruction problems has enabled a significant increase in reconstruction quality. Unfortunately, these new methods often lack reliability and explainability, and there is a growing interest to address these shortcomings while retaining the boost in performance. In this work, we tackle this issue by revisiting regularizers that are the sum of convex-ridge functions. The gradient of such regularizers is parameterized by a neural network that has a single hidden layer with increasing and learnable activation functions. This neural network is trained within a few minutes as a multistep Gaussian denoiser. The numerical experiments for denoising, CT, and MRI reconstruction show improvements over methods that offer similar reliability guarantees.
Skin lesion analysis models are biased by artifacts placed during image acquisition, which influence model predictions despite carrying no clinical information. Solutions that address this problem by regularizing models to prevent learning those spurious features achieve only partial success, and existing test-time debiasing techniques are inappropriate for skin lesion analysis due to either making unrealistic assumptions on the distribution of test data or requiring laborious annotation from medical practitioners. We propose TTS (Test-Time Selection), a human-in-the-loop method that leverages positive (e.g., lesion area) and negative (e.g., artifacts) keypoints in test samples. TTS effectively steers models away from exploiting spurious artifact-related correlations without retraining, and with less annotation requirements. Our solution is robust to a varying availability of annotations, and different levels of bias. We showcase on the ISIC2019 dataset (for which we release a subset of annotated images) how our model could be deployed in the real-world for mitigating bias.
The analysis of mapping relationships and distortions in multidimensional data poses a significant challenge in contemporary research. While Beltrami coefficients offer a precise description of distortions in two-dimensional mappings, current tools lack this capability in the context of three-dimensional space. This paper presents a novel approach: a 3D quasiconformal representation that captures the local dilation of 3D mappings, along with an algorithm that establishes a connection between this representation and the corresponding mapping. Experimental results showcase the algorithm's effectiveness in eliminating foldings in 3D mappings, as well as in mapping reconstruction and generation. These features bear a resemblance to the 2D Linear Beltrami Solver technique. The work presented in this paper offers a promising solution for the precise analysis and adjustment of distortions in 3D data and mappings.
We address the task of probabilistic anomaly attribution in the black-box regression setting, where the goal is to compute the probability distribution of the attribution score of each input variable, given an observed anomaly. The training dataset is assumed to be unavailable. This task differs from the standard XAI (explainable AI) scenario, since we wish to explain the anomalous deviation from a black-box prediction rather than the black-box model itself. We begin by showing that mainstream model-agnostic explanation methods, such as the Shapley values, are not suitable for this task because of their ``deviation-agnostic property.'' We then propose a novel framework for probabilistic anomaly attribution that allows us to not only compute attribution scores as the predictive mean but also quantify the uncertainty of those scores. This is done by considering a generative process for perturbations that counter-factually bring the observed anomalous observation back to normalcy. We introduce a variational Bayes algorithm for deriving the distributions of per variable attribution scores. To the best of our knowledge, this is the first probabilistic anomaly attribution framework that is free from being deviation-agnostic.
We present a unified and compact scene representation for robotics, where each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction (e.g. recovering depth, point clouds, or voxel maps), collision checking, and stable grasp prediction. We build our representation from a single RGB input image at test time by leveraging recent advances in Neural Radiance Fields (NeRF) that learn category-level priors on large multiview datasets, then fine-tune on novel objects from one or few views. We expand the NeRF model for additional grasp outputs and explore ways to leverage this representation for robotics. At test-time, we build the representation from a single RGB input image observing the scene from only one viewpoint. We find that the recovered representation allows rendering from novel views, including of occluded object parts, and also for predicting successful stable grasps. Grasp poses can be directly decoded from our latent representation with an implicit grasp decoder. We experimented in both simulation and real world and demonstrated the capability for robust robotic grasping using such compact representation. Website: //nerfgrasp.github.io
Many scientific problems require to process data in the form of geometric graphs. Unlike generic graph data, geometric graphs exhibit symmetries of translations, rotations, and/or reflections. Researchers have leveraged such inductive bias and developed geometrically equivariant Graph Neural Networks (GNNs) to better characterize the geometry and topology of geometric graphs. Despite fruitful achievements, it still lacks a survey to depict how equivariant GNNs are progressed, which in turn hinders the further development of equivariant GNNs. To this end, based on the necessary but concise mathematical preliminaries, we analyze and classify existing methods into three groups regarding how the message passing and aggregation in GNNs are represented. We also summarize the benchmarks as well as the related datasets to facilitate later researches for methodology development and experimental evaluation. The prospect for future potential directions is also provided.
Knowledge graph embedding, which aims to represent entities and relations as low dimensional vectors (or matrices, tensors, etc.), has been shown to be a powerful technique for predicting missing links in knowledge graphs. Existing knowledge graph embedding models mainly focus on modeling relation patterns such as symmetry/antisymmetry, inversion, and composition. However, many existing approaches fail to model semantic hierarchies, which are common in real-world applications. To address this challenge, we propose a novel knowledge graph embedding model---namely, Hierarchy-Aware Knowledge Graph Embedding (HAKE)---which maps entities into the polar coordinate system. HAKE is inspired by the fact that concentric circles in the polar coordinate system can naturally reflect the hierarchy. Specifically, the radial coordinate aims to model entities at different levels of the hierarchy, and entities with smaller radii are expected to be at higher levels; the angular coordinate aims to distinguish entities at the same level of the hierarchy, and these entities are expected to have roughly the same radii but different angles. Experiments demonstrate that HAKE can effectively model the semantic hierarchies in knowledge graphs, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the link prediction task.
We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. Our implicit field decoder is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our decoder for representation learning and generative modeling of shapes, we demonstrate superior results for tasks such as shape autoencoding, generation, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality.
Traditional methods for link prediction can be categorized into three main types: graph structure feature-based, latent feature-based, and explicit feature-based. Graph structure feature methods leverage some handcrafted node proximity scores, e.g., common neighbors, to estimate the likelihood of links. Latent feature methods rely on factorizing networks' matrix representations to learn an embedding for each node. Explicit feature methods train a machine learning model on two nodes' explicit attributes. Each of the three types of methods has its unique merits. In this paper, we propose SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction), a new framework for link prediction which combines the power of all the three types into a single graph neural network (GNN). GNN is a new type of neural network which directly accepts graphs as input and outputs their labels. In SEAL, the input to the GNN is a local subgraph around each target link. We prove theoretically that our local subgraphs also reserve a great deal of high-order graph structure features related to link existence. Another key feature is that our GNN can naturally incorporate latent features and explicit features. It is achieved by concatenating node embeddings (latent features) and node attributes (explicit features) in the node information matrix for each subgraph, thus combining the three types of features to enhance GNN learning. Through extensive experiments, SEAL shows unprecedentedly strong performance against a wide range of baseline methods, including various link prediction heuristics and network embedding methods.