We analyse the geometric instability of embeddings produced by graph neural networks (GNNs). Existing methods are only applicable for small graphs and lack context in the graph domain. We propose a simple, efficient and graph-native Graph Gram Index (GGI) to measure such instability which is invariant to permutation, orthogonal transformation, translation and order of evaluation. This allows us to study the varying instability behaviour of GNN embeddings on large graphs for both node classification and link prediction.
Neural network with quadratic decision functions have been introduced as alternatives to standard neural networks with affine linear one. They are advantageous when the objects to be identified are of compact basic geometries like circles, ellipsis etc. In this paper we investigate the use of such ansatz functions for classification. In particular we test and compare the algorithm on the MNIST dataset for classification of handwritten digits and for classification of subspecies. We also show, that the implementation can be based on the neural network structure in the software Tensorflow and Keras, respectively.
The method is based on the preliminary transformation of the traditionally used matrices or adjacency lists in the graph theory into refined projections free from redundant information, and their subsequent use in constructing shortest paths. Unlike adjacency matrices and lists based on enumerating binary adjacency relations, the refined projection is based on enumerating more complex relations: simple paths from a given graph vertex that are shortest. The preliminary acquisition of such projections reduces the algorithmic complexity of applications using them and improves their volumetric and real-time characteristics to linear ones for a pair of vertices. The class of graphs considered is extended to mixed graphs.
The notions of bounded-size and quasibounded-size decompositions with bounded treedepth base classes are central to the structural theory of graph sparsity introduced by two of the authors years ago, and provide a characterization of both classes with bounded expansions and nowhere dense classes. In this paper, we first prove that the model theoretic notions of dependence and stability are, for hereditary classes of graphs, compatible with quasibounded-size decompositions, in the following sense: every hereditary class with quasibounded-size decompositions with dependent (resp.\ stable) base classes is itself dependent (resp.\ stable). This result is obtained in a more general study of ``decomposition horizons'', which are class properties compatible with quasibounded-size decompositions. We deduce that hereditary classes with quasibounded-size decompositions with bounded shrubdepth base classes are stable. In the second part of the paper, we prove the converse. Thus, we characterize stable hereditary classes of graphs as those hereditary classes that admit quasibounded-size decompositions with bounded shrubdepth base classes. This result is obtained by proving that every hereditary stable class of graphs admits almost nowhere dense quasi-bush representations, thus answering positively a conjecture of Dreier et al. These results have several consequences. For example, we show that every graph $G$ in a stable, hereditary class of graphs $\mathscr C$ has a clique or a stable set of size $\Omega_{\mathscr C,\epsilon}(|G|^{1/2-\epsilon})$, for every $\epsilon>0$, which is tight in the sense that it cannot be improved to $\Omega_{\mathscr C}(|G|^{1/2})$.
The interpretability of models has become a crucial issue in Machine Learning because of algorithmic decisions' growing impact on real-world applications. Tree ensemble methods, such as Random Forests or XgBoost, are powerful learning tools for classification tasks. However, while combining multiple trees may provide higher prediction quality than a single one, it sacrifices the interpretability property resulting in "black-box" models. In light of this, we aim to develop an interpretable representation of a tree-ensemble model that can provide valuable insights into its behavior. First, given a target tree-ensemble model, we develop a hierarchical visualization tool based on a heatmap representation of the forest's feature use, considering the frequency of a feature and the level at which it is selected as an indicator of importance. Next, we propose a mixed-integer linear programming (MILP) formulation for constructing a single optimal multivariate tree that accurately mimics the target model predictions. The goal is to provide an interpretable surrogate model based on oblique hyperplane splits, which uses only the most relevant features according to the defined forest's importance indicators. The MILP model includes a penalty on feature selection based on their frequency in the forest to further induce sparsity of the splits. The natural formulation has been strengthened to improve the computational performance of {mixed-integer} software. Computational experience is carried out on benchmark datasets from the UCI repository using a state-of-the-art off-the-shelf solver. Results show that the proposed model is effective in yielding a shallow interpretable tree approximating the tree-ensemble decision function.
Neural network wavefunctions optimized using the variational Monte Carlo method have been shown to produce highly accurate results for the electronic structure of atoms and small molecules, but the high cost of optimizing such wavefunctions prevents their application to larger systems. We propose the Subsampled Projected-Increment Natural Gradient Descent (SPRING) optimizer to reduce this bottleneck. SPRING combines ideas from the recently introduced minimum-step stochastic reconfiguration optimizer (MinSR) and the classical randomized Kaczmarz method for solving linear least-squares problems. We demonstrate that SPRING outperforms both MinSR and the popular Kronecker-Factored Approximate Curvature method (KFAC) across a number of small atoms and molecules, given that the learning rates of all methods are optimally tuned. For example, on the oxygen atom, SPRING attains chemical accuracy after forty thousand training iterations, whereas both MinSR and KFAC fail to do so even after one hundred thousand iterations.
The most popular method for computing the matrix logarithm is a combination of the inverse scaling and squaring method in conjunction with a Pad\'e approximation, sometimes accompanied by the Schur decomposition. The main computational effort lies in matrix-matrix multiplications and left matrix division. In this work we illustrate that the number of such operations can be substantially reduced, by using a graph based representation of an efficient polynomial evaluation scheme. A technique to analyze the rounding error is proposed, and backward error analysis is adapted. We provide substantial simulations illustrating competitiveness both in terms of computation time and rounding errors.
We present a general framework to generate trees every vertex of which has a non-negative weight and a color. The colors are used to impose certain restrictions on the weight and colors of other vertices. We first extend the enumeration algorithms of unweighted trees given in [19, 20] to generate weighted trees that allow zero weight. We avoid isomorphisms by generalizing the concept of centroids to weighted trees and then using the so-called centroid-rooted canonical weighted trees. We provide a time complexity analysis of unranking algorithms and also show that the output delay complexity of enumeration is linear. The framework can be used to generate graph classes taking advantage of their tree-based decompositions/representations. We demonstrate our framework by generating weighted block trees which are in one-to-one correspondence with connected block graphs. All connected block graphs up to 19 vertices are publicly available at [1].
We develop a numerical method for computing with orthogonal polynomials that are orthogonal on multiple, disjoint intervals for which analytical formulae are currently unknown. Our approach exploits the Fokas--Its--Kitaev Riemann--Hilbert representation of the orthogonal polynomials to produce an $\mathrm{O}(N)$ method to compute the first $N$ recurrence coefficients. The method can also be used for pointwise evaluation of the polynomials and their Cauchy transforms throughout the complex plane. The method encodes the singularity behavior of weight functions using weighted Cauchy integrals of Chebyshev polynomials. This greatly improves the efficiency of the method, outperforming other available techniques. We demonstrate the fast convergence of our method and present applications to integrable systems and approximation theory.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.