A* is a classic and popular method for graphs search and path finding. It assumes the existence of a heuristic function $h(u,t)$ that estimates the shortest distance from any input node $u$ to the destination $t$. Traditionally, heuristics have been handcrafted by domain experts. However, over the last few years, there has been a growing interest in learning heuristic functions. Such learned heuristics estimate the distance between given nodes based on "features" of those nodes. In this paper we formalize and initiate the study of such feature-based heuristics. In particular, we consider heuristics induced by norm embeddings and distance labeling schemes, and provide lower bounds for the tradeoffs between the number of dimensions or bits used to represent each graph node, and the running time of the A* algorithm. We also show that, under natural assumptions, our lower bounds are almost optimal.
We demonstrate the effectiveness of an adaptive explicit Euler method for the approximate solution of the Cox-Ingersoll-Ross model. This relies on a class of path-bounded timestepping strategies which work by reducing the stepsize as solutions approach a neighbourhood of zero. The method is hybrid in the sense that a convergent backstop method is invoked if the timestep becomes too small, or to prevent solutions from overshooting zero and becoming negative. Under parameter constraints that imply Feller's condition, we prove that such a scheme is strongly convergent, of order at least 1/2. Control of the strong error is important for multi-level Monte Carlo techniques. Under Feller's condition we also prove that the probability of ever needing the backstop method to prevent a negative value can be made arbitrarily small. Numerically, we compare this adaptive method to fixed step implicit and explicit schemes, and a novel semi-implicit adaptive variant. We observe that the adaptive approach leads to methods that are competitive in a domain that extends beyond Feller's condition, indicating suitability for the modelling of stochastic volatility in Heston-type asset models.
In decentralized learning, a network of nodes cooperate to minimize an overall objective function that is usually the finite-sum of their local objectives, and incorporates a non-smooth regularization term for the better generalization ability. Decentralized stochastic proximal gradient (DSPG) method is commonly used to train this type of learning models, while the convergence rate is retarded by the variance of stochastic gradients. In this paper, we propose a novel algorithm, namely DPSVRG, to accelerate the decentralized training by leveraging the variance reduction technique. The basic idea is to introduce an estimator in each node, which tracks the local full gradient periodically, to correct the stochastic gradient at each iteration. By transforming our decentralized algorithm into a centralized inexact proximal gradient algorithm with variance reduction, and controlling the bounds of error sequences, we prove that DPSVRG converges at the rate of $O(1/T)$ for general convex objectives plus a non-smooth term with $T$ as the number of iterations, while DSPG converges at the rate $O(\frac{1}{\sqrt{T}})$. Our experiments on different applications, network topologies and learning models demonstrate that DPSVRG converges much faster than DSPG, and the loss function of DPSVRG decreases smoothly along with the training epochs.
In this paper, a numerical scheme for a nonlinear McKendrick-von Foerster equation with diffusion in age (MV-D) with the Dirichlet boundary condition is proposed. The main idea to derive the scheme is to use the discretization based on the method of characteristics to the convection part, and the finite difference method to the rest of the terms. The nonlocal terms are dealt with the quadrature methods. As a result, an implicit scheme is obtained for the boundary value problem under consideration. The consistency and the convergence of the proposed numerical scheme is established. Moreover, numerical simulations are presented to validate the theoretical results.
Graph representation learning (also called graph embeddings) is a popular technique for incorporating network structure into machine learning models. Unsupervised graph embedding methods aim to capture graph structure by learning a low-dimensional vector representation (the embedding) for each node. Despite the widespread use of these embeddings for a variety of downstream transductive machine learning tasks, there is little principled analysis of the effectiveness of this approach for common tasks. In this work, we provide an empirical and theoretical analysis for the performance of a class of embeddings on the common task of pairwise community labeling. This is a binary variant of the classic community detection problem, which seeks to build a classifier to determine whether a pair of vertices participate in a community. In line with our goal of foundational understanding, we focus on a popular class of unsupervised embedding techniques that learn low rank factorizations of a vertex proximity matrix (this class includes methods like GraRep, DeepWalk, node2vec, NetMF). We perform detailed empirical analysis for community labeling over a variety of real and synthetic graphs with ground truth. In all cases we studied, the models trained from embedding features perform poorly on community labeling. In constrast, a simple logistic model with classic graph structural features handily outperforms the embedding models. For a more principled understanding, we provide a theoretical analysis for the (in)effectiveness of these embeddings in capturing the community structure. We formally prove that popular low-dimensional factorization methods either cannot produce community structure, or can only produce ``unstable" communities. These communities are inherently unstable under small perturbations.
Unpaired image-to-image translation has been applied successfully to natural images but has received very little attention for manifold-valued data such as in diffusion tensor imaging (DTI). The non-Euclidean nature of DTI prevents current generative adversarial networks (GANs) from generating plausible images and has mainly limited their application to diffusion MRI scalar maps, such as fractional anisotropy (FA) or mean diffusivity (MD). Even if these scalar maps are clinically useful, they mostly ignore fiber orientations and therefore have limited applications for analyzing brain fibers. Here, we propose a manifold-aware CycleGAN that learns the generation of high-resolution DTI from unpaired T1w images. We formulate the objective as a Wasserstein distance minimization problem of data distributions on a Riemannian manifold of symmetric positive definite 3x3 matrices SPD(3), using adversarial and cycle-consistency losses. To ensure that the generated diffusion tensors lie on the SPD(3) manifold, we exploit the theoretical properties of the exponential and logarithm maps of the Log-Euclidean metric. We demonstrate that, unlike standard GANs, our method is able to generate realistic high-resolution DTI that can be used to compute diffusion-based metrics and potentially run fiber tractography algorithms. To evaluate our model's performance, we compute the cosine similarity between the generated tensors principal orientation and their ground-truth orientation, the mean squared error (MSE) of their derived FA values and the Log-Euclidean distance between the tensors. We demonstrate that our method produces 2.5 times better FA MSE than a standard CycleGAN and up to 30% better cosine similarity than a manifold-aware Wasserstein GAN while synthesizing sharp high-resolution DTI.
The graph convolution network (GCN) is a widely-used facility to realize graph-based semi-supervised learning, which usually integrates node features and graph topologic information to build learning models. However, as for multi-label learning tasks, the supervision part of GCN simply minimizes the cross-entropy loss between the last layer outputs and the ground-truth label distribution, which tends to lose some useful information such as label correlations, so that prevents from obtaining high performance. In this paper, we pro-pose a novel GCN-based semi-supervised learning approach for multi-label classification, namely ML-GCN. ML-GCN first uses a GCN to embed the node features and graph topologic information. Then, it randomly generates a label matrix, where each row (i.e., label vector) represents a kind of labels. The dimension of the label vector is the same as that of the node vector before the last convolution operation of GCN. That is, all labels and nodes are embedded in a uniform vector space. Finally, during the ML-GCN model training, label vectors and node vectors are concatenated to serve as the inputs of the relaxed skip-gram model to detect the node-label correlation as well as the label-label correlation. Experimental results on several graph classification datasets show that the proposed ML-GCN outperforms four state-of-the-art methods.
In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into account the distribution (as estimated from training data) of the term across the classes of interest. Although `supervised term weighting' approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article we analyse the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimised on the training set of interest; we dub this approach \emph{Learning to Weight} (LTW). The experiments that we run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification.
Complex networks are used as an abstraction for systems modeling in physics, biology, sociology, and other areas. We propose an algorithm based on a fast personalized node ranking and recent advancements in deep learning for learning supervised network embeddings as well as to classify network nodes directly. Learning from homogeneous, as well as heterogeneous networks, our algorithm outperforms strong baselines on nine node-classification benchmarks from the domains of molecular biology, finance, social media and language processing---one of the largest node classification collections to date. The results are comparable or better than current state-of-the-art in terms of speed as well as predictive accuracy. Embeddings, obtained by the proposed algorithm, are also a viable option for network visualization.
Recent successes in word embedding and document embedding have motivated researchers to explore similar representations for networks and to use such representations for tasks such as edge prediction, node label prediction, and community detection. Such network embedding methods are largely focused on finding distributed representations for unsigned networks and are unable to discover embeddings that respect polarities inherent in edges. We propose SIGNet, a fast scalable embedding method suitable for signed networks. Our proposed objective function aims to carefully model the social structure implicit in signed networks by reinforcing the principles of social balance theory. Our method builds upon the traditional word2vec family of embedding approaches and adds a new targeted node sampling strategy to maintain structural balance in higher-order neighborhoods. We demonstrate the superiority of SIGNet over state-of-the-art methods proposed for both signed and unsigned networks on several real world datasets from different domains. In particular, SIGNet offers an approach to generate a richer vocabulary of features of signed networks to support representation and reasoning.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.