The paper contains approximation guarantees for neural networks that are trained with gradient flow, with error measured in the continuous $L_2(\mathbb{S}^{d-1})$-norm on the $d$-dimensional unit sphere and targets that are Sobolev smooth. The networks are fully connected of constant depth and increasing width. Although all layers are trained, the gradient flow convergence is based on a neural tangent kernel (NTK) argument for the non-convex second but last layer. Unlike standard NTK analysis, the continuous error norm implies an under-parametrized regime, possible by the natural smoothness assumption required for approximation. The typical over-parametrization re-enters the results in form of a loss in approximation rate relative to established approximation methods for Sobolev smooth functions.
Robots performing human-scale manipulation tasks require an extensive amount of knowledge about their surroundings in order to perform their actions competently and human-like. In this work, we investigate the use of virtual reality technology as an implementation for robot environment modeling, and present a technique for translating scene graphs into knowledge bases. To this end, we take advantage of the Universal Scene Description (USD) format which is an emerging standard for the authoring, visualization and simulation of complex environments. We investigate the conversion of USD-based environment models into Knowledge Graph (KG) representations that facilitate semantic querying and integration with additional knowledge sources.
Path reasoning methods over knowledge graphs have gained popularity for their potential to improve transparency in recommender systems. However, the resulting models still rely on pre-trained knowledge graph embeddings, fail to fully exploit the interdependence between entities and relations in the KG for recommendation, and may generate inaccurate explanations. In this paper, we introduce PEARLM, a novel approach that efficiently captures user behaviour and product-side knowledge through language modelling. With our approach, knowledge graph embeddings are directly learned from paths over the KG by the language model, which also unifies entities and relations in the same optimisation space. Constraints on the sequence decoding additionally guarantee path faithfulness with respect to the KG. Experiments on two datasets show the effectiveness of our approach compared to state-of-the-art baselines. Source code and datasets: AVAILABLE AFTER GETTING ACCEPTED.
This paper proposes a new method for differentiating through optimal trajectories arising from non-convex, constrained discrete-time optimal control (COC) problems using the implicit function theorem (IFT). Previous works solve a differential Karush-Kuhn-Tucker (KKT) system for the trajectory derivative, and achieve this efficiently by solving an auxiliary Linear Quadratic Regulator (LQR) problem. In contrast, we directly evaluate the matrix equations which arise from applying variable elimination on the Lagrange multiplier terms in the (differential) KKT system. By appropriately accounting for the structure of the terms within the resulting equations, we show that the trajectory derivatives scale linearly with the number of timesteps. Furthermore, our approach allows for easy parallelization, significantly improved scalability with model size, direct computation of vector-Jacobian products and improved numerical stability compared to prior works. As an additional contribution, we unify prior works, addressing claims that computing trajectory derivatives using IFT scales quadratically with the number of timesteps. We evaluate our method on a both synthetic benchmark and four challenging, learning from demonstration benchmarks including a 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.
Backpropagation within neural networks leverages a fundamental element of automatic differentiation, which is referred to as the reverse mode differentiation, or vector Jacobian Product (VJP) or, in the context of differential geometry, known as the pull-back process. The computation of gradient is important as update of neural network parameters is performed using gradient descent method. In this study, we present a genric randomized method, which updates the parameters of neural networks by using directional derivatives of loss functions computed efficiently by using forward mode AD or Jacobian vector Product (JVP). These JVP are computed along the random directions sampled from different probability distributions e.g., Bernoulli, Normal, Wigner, Laplace and Uniform distributions. The computation of gradient is performed during the forward pass of the neural network. We also present a rigorous analysis of the presented methods providing the rate of convergence along with the computational experiments deployed in scientific Machine learning in particular physics-informed neural networks and Deep Operator Networks.
Deep sparse networks are widely investigated as a neural network architecture for prediction tasks with high-dimensional sparse features, with which feature interaction selection is a critical component. While previous methods primarily focus on how to search feature interaction in a coarse-grained space, less attention has been given to a finer granularity. In this work, we introduce a hybrid-grained feature interaction selection approach that targets both feature field and feature value for deep sparse networks. To explore such expansive space, we propose a decomposed space which is calculated on the fly. We then develop a selection algorithm called OptFeature, which efficiently selects the feature interaction from both the feature field and the feature value simultaneously. Results from experiments on three large real-world benchmark datasets demonstrate that OptFeature performs well in terms of accuracy and efficiency. Additional studies support the feasibility of our method.
This paper examines the problem of information routing in a large-scale communication network, which can be formulated as a constrained statistical learning problem having access to only local information. We delineate a novel State Augmentation (SA) strategy to maximize the aggregate information at source nodes using graph neural network (GNN) architectures, by deploying graph convolutions over the topological links of the communication network. The proposed technique leverages only the local information available at each node and efficiently routes desired information to the destination nodes. We leverage an unsupervised learning procedure to convert the output of the GNN architecture to optimal information routing strategies. In the experiments, we perform the evaluation on real-time network topologies to validate our algorithms. Numerical simulations depict the improved performance of the proposed method in training a GNN parameterization as compared to baseline algorithms.
In this paper, we study two well known methods of Ising structure learning, namely the pseudolikelihood approach and the interaction screening approach, in the context of tensor recovery in $k$-spin Ising models. We show that both these approaches, with proper regularization, retrieve the underlying hypernetwork structure using a sample size logarithmic in the number of network nodes, and exponential in the maximum interaction strength and maximum node-degree. We also track down the exact dependence of the rate of tensor recovery on the interaction order $k$, that is allowed to grow with the number of samples and nodes, for both the approaches. Finally, we provide a comparative discussion of the performance of the two approaches based on simulation studies, which also demonstrate the exponential dependence of the tensor recovery rate on the maximum coupling strength.
Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word ambiguity, (2) word synonymity, and (3) dynamic contextual dependency. To address these challenges, we propose a novel GNN-based sparse structure learning model for inductive document classification. Specifically, a document-level graph is initially generated by a disjoint union of sentence-level word co-occurrence graphs. Our model collects a set of trainable edges connecting disjoint words between sentences and employs structure learning to sparsely select edges with dynamic contextual dependencies. Graphs with sparse structures can jointly exploit local and global contextual information in documents through GNNs. For inductive learning, the refined document graph is further fed into a general readout function for graph-level classification and optimization in an end-to-end manner. Extensive experiments on several real-world datasets demonstrate that the proposed model outperforms most state-of-the-art results, and reveal the necessity to learn sparse structures for each document.
Ensembles over neural network weights trained from different random initialization, known as deep ensembles, achieve state-of-the-art accuracy and calibration. The recently introduced batch ensembles provide a drop-in replacement that is more parameter efficient. In this paper, we design ensembles not only over weights, but over hyperparameters to improve the state of the art in both settings. For best performance independent of budget, we propose hyper-deep ensembles, a simple procedure that involves a random search over different hyperparameters, themselves stratified across multiple random initializations. Its strong performance highlights the benefit of combining models with both weight and hyperparameter diversity. We further propose a parameter efficient version, hyper-batch ensembles, which builds on the layer structure of batch ensembles and self-tuning networks. The computational and memory costs of our method are notably lower than typical ensembles. On image classification tasks, with MLP, LeNet, and Wide ResNet 28-10 architectures, our methodology improves upon both deep and batch ensembles.
Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.