Graph partitioning is a common solution to scale up the graph algorithms, and shortest path (SP) computation is one of them. However, the existing solutions typically have a fixed partition method with a fixed path index and fixed partition structure, so it is unclear how the partition method and path index influence the pathfinding performance. Moreover, few studies have explored the index maintenance of partitioned SP (PSP) on dynamic graphs. To provide a deeper insight into the dynamic PSP indexes, we systematically deliberate on the existing works and propose a universal scheme to analyze this problem theoretically. Specifically, we first propose two novel partitioned index strategies and one optimization to improve index construction, query answering, or index maintenance of PSP index. Then we propose a path-oriented graph partitioning classification criteria for easier partition method selection. After that, we re-couple the dimensions in our scheme (partitioned index strategy, path index, and partition structure) to propose five new partitioned SP indexes that are more efficient either in the query or update on different networks. Finally, we demonstrate the effectiveness of our new indexes by comparing them with state-of-the-art PSP indexes through comprehensive evaluations.
Despite the success of Siamese encoder models such as sentence transformers (ST), little is known about the aspects of inputs they pay attention to. A barrier is that their predictions cannot be attributed to individual features, as they compare two inputs rather than processing a single one. This paper derives a local attribution method for Siamese encoders by generalizing the principle of integrated gradients to models with multiple inputs. The solution takes the form of feature-pair attributions, and can be reduced to a token-token matrix for STs. Our method involves the introduction of integrated Jacobians and inherits the advantageous formal properties of integrated gradients: it accounts for the model's full computation graph and is guaranteed to converge to the actual prediction. A pilot study shows that in an ST few token-pairs can often explain large fractions of predictions, and it focuses on nouns and verbs. For accurate predictions, it however needs to attend to the majority of tokens and parts of speech.
Constraint programming (CP) is a powerful tool for modeling mathematical concepts and objects and finding both solutions or counter examples. One of the major strengths of CP is that problems can easily be combined or expanded. In this paper, we illustrate that this versatility makes CP an ideal tool for exploring problems in permutation patterns. We declaratively define permutation properties, permutation pattern avoidance and containment constraints using CP and show how this allows us to solve a wide range of problems. We show how this approach enables the arbitrary composition of these conditions, and also allows the easy addition of extra conditions. We demonstrate the effectiveness of our techniques by modelling the containment and avoidance of six permutation patterns, eight permutation properties and measuring five statistics on the resulting permutations. In addition to calculating properties and statistics for the generated permutations, we show that arbitrary additional constraints can also be easily and efficiently added. This approach enables mathematicians to investigate permutation pattern problems in a quick and efficient manner. We demonstrate the utility of constraint programming for permutation patterns by showing how we can easily and efficiently extend the known permutation counts for a conjecture involving the class of 1324 avoiding permutations. For this problem, we expand the enumeration of 1324-avoiding permutations with a fixed number of inversions to permutations of length 16 and show for the first time that in the enumeration there is a pattern occurring which follows a unique sequence on the Online Encyclopedia of Integer Sequences.
Unobserved confounding is common in many applications, making causal inference from observational data challenging. As a remedy, causal sensitivity analysis is an important tool to draw causal conclusions under unobserved confounding with mathematical guarantees. In this paper, we propose NeuralCSA, a neural framework for generalized causal sensitivity analysis. Unlike previous work, our framework is compatible with (i) a large class of sensitivity models, including the marginal sensitivity model, f-sensitivity models, and Rosenbaum's sensitivity model; (ii) different treatment types (i.e., binary and continuous); and (iii) different causal queries, including (conditional) average treatment effects and simultaneous effects on multiple outcomes. The generality of \frameworkname is achieved by learning a latent distribution shift that corresponds to a treatment intervention using two conditional normalizing flows. We provide theoretical guarantees that NeuralCSA is able to infer valid bounds on the causal query of interest and also demonstrate this empirically using both simulated and real-world data.
Optimal transportation is a fundamental topic that has attracted a great amount of attention from machine learning community in the past decades. In this paper, we consider an interesting discrete dynamic optimal transport problem: can we efficiently update the optimal transport plan when the weights or the locations of the data points change? This problem is naturally motivated by several applications in machine learning. For example, we often need to compute the optimal transportation cost between two different data sets; if some change happens to a few data points, should we re-compute the high complexity cost function or update the cost by some efficient dynamic data structure? We are aware that several dynamic maximum flow algorithms have been proposed before, however, the research on dynamic minimum cost flow problem is still quite limited, to the best of our knowledge. We propose a novel 2D Skip Orthogonal List together with some dynamic tree techniques. Although our algorithm is based on the conventional simplex method, it can efficiently complete each pivoting operation within $O(|V|)$ time with high probability where $V$ is the set of all supply and demand nodes. Since dynamic modifications typically do not introduce significant changes, our algorithm requires only a few simplex iterations in practice. So our algorithm is more efficient than re-computing the optimal transportation cost that needs at least one traversal over all the $O(|E|) = O(|V|^2)$ variables in general cases. Our experiments demonstrate that our algorithm significantly outperforms existing algorithms in the dynamic scenarios.
This work addresses the development of a physics-informed neural network (PINN) with a loss term derived from a discretized time-dependent reduced-order system. In this work, first, the governing equations are discretized using a finite difference scheme (whereas, any other discretization technique can be adopted), then projected on a reduced or latent space using the Proper Orthogonal Decomposition (POD)-Galerkin approach and next, the residual arising from discretized reduced order equation is considered as an additional loss penalty term alongside the data-driven loss term using different variants of deep learning method such as Artificial neural network (ANN), Long Short-Term Memory based neural network (LSTM). The LSTM neural network has been proven to be very effective for time-dependent problems in a purely data-driven environment. The current work demonstrates the LSTM network's potential over ANN networks in physics-informed neural networks (PINN) as well. The potential of using discretized governing equations instead of continuous form lies in the flexibility of input to the PINN. Different sizes of data ranging from small, medium to big datasets are used to assess the potential of discretized-physics-informed neural networks when there is very sparse or no data available. The proposed methods are applied to a pitch-plunge airfoil motion governed by rigid-body dynamics and a one-dimensional viscous Burgers' equation. The current work also demonstrates the prediction capability of various discretized-physics-informed neural networks outside the domain where the data is available or governing equation-based residuals are minimized.
We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice. Our approach is based on estimating the locally-optimal stepsize, guaranteeing maximal descent in the direction of the stochastic gradient of the current step. We first establish theoretical convergence bounds for our method within the context of smooth non-convex stochastic optimization, matching state-of-the-art bounds while only assuming knowledge of the smoothness parameter. We then present a practical implementation of our algorithm and conduct systematic experiments across diverse datasets and optimization algorithms, comparing our scheme with existing state-of-the-art learning-rate schedulers. Our findings indicate that our method needs minimal tuning when compared to existing approaches, removing the need for auxiliary manual schedules and warm-up phases and achieving comparable performance with drastically reduced parameter tuning.
CP decomposition is a powerful tool for data science, especially gene analysis, deep learning, and quantum computation. However, the application of tensor decomposition is largely hindered by the exponential increment of the computational complexity and storage consumption with the size of tensors. While the data in our real world is usually presented as trillion- or even exascale-scale tensors, existing work can only support billion-scale scale tensors. In our work, we propose the Exascale-Tensor to mitigate the significant gap. Specifically, we propose a compression-based tensor decomposition framework, namely the exascale-tensor, to support exascale tensor decomposition. Then, we carefully analyze the inherent parallelism and propose a bag of strategies to improve computational efficiency. Last, we conduct experiments to decompose tensors ranging from million-scale to trillion-scale for evaluation. Compared to the baselines, the exascale-tensor supports 8,000x larger tensors and a speedup up to 6.95x. We also apply our method to two real-world applications, including gene analysis and tensor layer neural networks, of which the numeric results demonstrate the scalability and effectiveness of our method.
How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multi-relational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. While a number of approaches have been developed to address this problem for general graphs, they do not fully utilize information available in KGs, or lack flexibility needed to model complex relationship between entities and their importance. To address these limitations, we explore supervised machine learning algorithms. In particular, building upon recent advancement of graph neural networks (GNNs), we develop GENI, a GNN-based method designed to deal with distinctive challenges involved with predicting node importance in KGs. Our method performs an aggregation of importance scores instead of aggregating node embeddings via predicate-aware attention mechanism and flexible centrality adjustment. In our evaluation of GENI and existing methods on predicting node importance in real-world KGs with different characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. Our implicit field decoder is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our decoder for representation learning and generative modeling of shapes, we demonstrate superior results for tasks such as shape autoencoding, generation, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality.