Finite-state transducers (FSTs) are frequently used in speech recognition. Transducer composition is an essential operation for combining different sources of information at different granularities. However, composition is also one of the more computationally expensive operations. Due to the heterogeneous structure of FSTs, parallel algorithms for composition are suboptimal in efficiency, generality, or both. We propose an algorithm for parallel composition and implement it on graphics processing units. We benchmark our parallel algorithm on the composition of random graphs and the composition of graphs commonly used in speech recognition. The parallel composition scales better with the size of the input graphs and for large graphs can be as much as 10 to 30 times faster than a sequential CPU algorithm.
While operating communication networks adaptively may improve utilization and performance, frequent adjustments also introduce an algorithmic challenge: the re-optimization of traffic engineering solutions is time-consuming and may limit the granularity at which a network can be adjusted. This paper is motivated by question whether the reactivity of a network can be improved by re-optimizing solutions dynamically rather than from scratch, especially if inputs such as link weights do not change significantly. This paper explores to what extent dynamic algorithms can be used to speed up fundamental tasks in network operations. We specifically investigate optimizations related to traffic engineering (namely shortest paths and maximum flow computations), but also consider spanning tree and matching applications. While prior work on dynamic graph algorithms focuses on link insertions and deletions, we are interested in the practical problem of link weight changes. We revisit existing upper bounds in the weight-dynamic model, and present several novel lower bounds on the amortized runtime for recomputing solutions. In general, we find that the potential performance gains depend on the application, and there are also strict limitations on what can be achieved, even if link weights change only slightly.
Gaussian mixture models find their place as a powerful tool, mostly in the clustering problem, but with proper preparation also in feature extraction, pattern recognition, image segmentation and in general machine learning. When faced with the problem of schema matching, different mixture models computed on different pieces of data can maintain crucial information about the structure of the dataset. In order to measure or compare results from mixture models, the Wasserstein distance can be very useful, however it is not easy to calculate for mixture distributions. In this paper we derive one of possible approximations for the Wasserstein distance between Gaussian mixture models and reduce it to linear problem. Furthermore, application examples concerning real world data are shown.
This paper deals with the $\lambda$-labeling and $L(2,1)$-coloring of simple graphs. A $\lambda$-labeling of a graph $G$ is any labeling of the vertices of $G$ with different labels such that any two adjacent vertices receive labels which differ at least two. Also an $L(2,1)$-coloring of $G$ is any labeling of the vertices of $G$ such that any two adjacent vertices receive labels which differ at least two and any two vertices with distance two receive distinct labels. Assume that a partial $\lambda$-labeling $f$ is given in a graph $G$. A general question is whether $f$ can be extended to a $\lambda$-labeling of $G$. We show that the extension is feasible if and only if a Hamiltonian path consistent with some distance constraints exists in the complement of $G$. Then we consider line graph of bipartite multigraphs and determine the minimum number of labels in $L(2,1)$-coloring and $\lambda$-labeling of these graphs. In fact we obtain easily computable formulas for the path covering number and the maximum path of the complement of these graphs. We obtain a polynomial time algorithm which generates all Hamiltonian paths in the related graphs. A special case is the Cartesian product graph $K_n\Box K_n$ and the generation of $\lambda$-squares.
The optimal design of federated learning (FL) algorithms for solving general machine learning (ML) problems in practical edge computing systems with quantized message passing remains an open problem. This paper considers an edge computing system where the server and workers have possibly different computing and communication capabilities and employ quantization before transmitting messages. To explore the full potential of FL in such an edge computing system, we first present a general FL algorithm, namely GenQSGD, parameterized by the numbers of global and local iterations, mini-batch size, and step size sequence. Then, we analyze its convergence for an arbitrary step size sequence and specify the convergence results under three commonly adopted step size rules, namely the constant, exponential, and diminishing step size rules. Next, we optimize the algorithm parameters to minimize the energy cost under the time constraint and convergence error constraint, with the focus on the overall implementing process of FL. Specifically, for any given step size sequence under each considered step size rule, we optimize the numbers of global and local iterations and mini-batch size to optimally implement FL for applications with preset step size sequences. We also optimize the step size sequence along with these algorithm parameters to explore the full potential of FL. The resulting optimization problems are challenging non-convex problems with non-differentiable constraint functions. We propose iterative algorithms to obtain KKT points using general inner approximation (GIA) and tricks for solving complementary geometric programming (CGP). Finally, we numerically demonstrate the remarkable gains of GenQSGD with optimized algorithm parameters over existing FL algorithms and reveal the significance of optimally designing general FL algorithms.
Let $G$ be a strongly connected directed graph and $u,v,w\in V(G)$ be three vertices. Then $w$ strongly resolves $u$ to $v$ if there is a shortest $u$-$w$-path containing $v$ or a shortest $w$-$v$-path containing $u$. A set $R\subseteq V(G)$ of vertices is a strong resolving set for a directed graph $G$ if for every pair of vertices $u,v\in V(G)$ there is at least one vertex in $R$ that strongly resolves $u$ to $v$ and at least one vertex in $R$ that strongly resolves $v$ to $u$. The distances of the vertices of $G$ to and from the vertices of a strong resolving set $R$ uniquely define the connectivity structure of the graph. The Strong Metric Dimension of a directed graph $G$ is the size of a smallest strong resolving set for $G$. The decision problem Strong Metric Dimension is the question whether $G$ has a strong resolving set of size at most $r$, for a given directed graph $G$ and a given number $r$. In this paper we study undirected and directed co-graphs and introduce linear time algorithms for Strong Metric Dimension. These algorithms can also compute strong resolving sets for co-graphs in linear time.
The inductive biases of graph representation learning algorithms are often encoded in the background geometry of their embedding space. In this paper, we show that general directed graphs can be effectively represented by an embedding model that combines three components: a pseudo-Riemannian metric structure, a non-trivial global topology, and a unique likelihood function that explicitly incorporates a preferred direction in embedding space. We demonstrate the representational capabilities of this method by applying it to the task of link prediction on a series of synthetic and real directed graphs from natural language applications and biology. In particular, we show that low-dimensional cylindrical Minkowski and anti-de Sitter spacetimes can produce equal or better graph representations than curved Riemannian manifolds of higher dimensions.
Knowledge graphs (KGs) are of great importance to many real world applications, but they generally suffer from incomplete information in the form of missing relations between entities. Knowledge graph completion (also known as relation prediction) is the task of inferring missing facts given existing ones. Most of the existing work is proposed by maximizing the likelihood of observed instance-level triples. Not much attention, however, is paid to the ontological information, such as type information of entities and relations. In this work, we propose a type-augmented relation prediction (TaRP) method, where we apply both the type information and instance-level information for relation prediction. In particular, type information and instance-level information are encoded as prior probabilities and likelihoods of relations respectively, and are combined by following Bayes' rule. Our proposed TaRP method achieves significantly better performance than state-of-the-art methods on three benchmark datasets: FB15K, YAGO26K-906, and DB111K-174. In addition, we show that TaRP achieves significantly improved data efficiency. More importantly, the type information extracted from a specific dataset can generalize well to other datasets through the proposed TaRP model.
Attributed graph clustering is challenging as it requires joint modelling of graph structures and node attributes. Recent progress on graph convolutional networks has proved that graph convolution is effective in combining structural and content information, and several recent methods based on it have achieved promising clustering performance on some real attributed networks. However, there is limited understanding of how graph convolution affects clustering performance and how to properly use it to optimize performance for different graphs. Existing methods essentially use graph convolution of a fixed and low order that only takes into account neighbours within a few hops of each node, which underutilizes node relations and ignores the diversity of graphs. In this paper, we propose an adaptive graph convolution method for attributed graph clustering that exploits high-order graph convolution to capture global cluster structure and adaptively selects the appropriate order for different graphs. We establish the validity of our method by theoretical analysis and extensive experiments on benchmark datasets. Empirical results show that our method compares favourably with state-of-the-art methods.
An attributed network enriches a pure network by encoding a part of widely accessible node auxiliary information into node attributes. Learning vector representation of each node a.k.a. Network Embedding (NE) for such an attributed network by considering both structure and attribute information has recently attracted considerable attention, since each node embedding is simply a unified low-dimension vector representation that makes downstream tasks e.g. link prediction more efficient and much easier to realize. Most of previous works have not considered the significant case of a network with incomplete structure information, which however, would often appear in our real-world scenarios e.g. the abnormal users in a social network who intentionally hide their friendships. And different networks obviously have different levels of incomplete structure information, which imposes more challenges to balance two sources of information. To tackle that, we propose a robust NE method called Attributed Biased Random Walks (ABRW) to employ attribute information for compensating incomplete structure information by using transition matrices. The experiments of link prediction and node classification tasks on real-world datasets confirm the robustness and effectiveness of our method to the different levels of the incomplete structure information.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.