The Weighted Path Order of Yamada is a powerful technique for proving termination. It is also supported by CeTA, a certifier for checking untrusted termination proofs. To be more precise, CeTA contains a verified function that computes for two terms whether one of them is larger than the other for a given WPO, i.e., where all parameters of the WPO have been fixed. The problem of this verified function is its exponential runtime in the worst case. Therefore, in this work we develop a polynomial time implementation of WPO that is based on memoization. It also improves upon an earlier verified implementation of the Recursive Path Order: the RPO-implementation uses full terms as keys for the memory, a design which simplified the soundness proofs, but has some runtime overhead. In this work, keys are just numbers, so that the lookup in the memory is faster. Although trivial on paper, this change introduces some challenges for the verification task.
Nowadays, the versatile capabilities of Pre-trained Large Language Models (LLMs) have attracted much attention from the industry. However, some vertical domains are more interested in the in-domain capabilities of LLMs. For the Networks domain, we present NetEval, an evaluation set for measuring the comprehensive capabilities of LLMs in Network Operations (NetOps). NetEval is designed for evaluating the commonsense knowledge and inference ability in NetOps in a multi-lingual context. NetEval consists of 5,732 questions about NetOps, covering five different sub-domains of NetOps. With NetEval, we systematically evaluate the NetOps capability of 26 publicly available LLMs. The results show that only GPT-4 can achieve a performance competitive to humans. However, some open models like LLaMA 2 demonstrate significant potential.
We present a simple functional programming language, called Dual PCF, that implements forward mode automatic differentiation using dual numbers in the framework of exact real number computation. The main new feature of this language is the ability to evaluate correctly up to the precision specified by the user -- in a simple and direct way -- the directional derivative of functionals as well as first order functions. In contrast to other comparable languages, Dual PCF also includes the recursive operator for defining functions and functionals. We provide a wide range of examples of Lipschitz functions and functionals that can be defined in Dual PCF. We use domain theory both to give a denotational semantics to the language and to prove the correctness of the new derivative operator using logical relations. To be able to differentiate functionals -- including on function spaces equipped with their compact-open topology that do not admit a norm -- we develop a domain-theoretic directional derivative that is Scott continuous and extends Clarke's subgradient of real-valued locally Lipschitz maps on Banach spaces to real-valued continuous maps on Hausdorff topological vector spaces. Finally, we show that we can express arbitrary computable linear functionals in Dual PCF.
This work addresses the block-diagonal semidefinite program (SDP) relaxations for the clique number of the Paley graphs. The size of the maximal clique (clique number) of a graph is a classic NP-complete problem; a Paley graph is a deterministic graph where two vertices are connected if their difference is a quadratic residue (square) in a finite field with the number of elements given by certain primes and prime powers. Improving the upper bound for the Paley graph clique number for prime powers that are non-squares is an open problem in combinatorics. Moreover, since quadratic residues exhibit pseudorandom properties, Paley graphs are related to the construction of deterministic restricted isometries, an open problem in compressed sensing. Recent work provides numerical evidence that the current upper bounds can be improved by the sum-of-squares (SOS) relaxations. In particular, the bounds given by the SOS relaxations of degree 4 (SOS-4) have been empirically observed to be growing at an order smaller than square root of the prime. However, computations of SOS-4 appear to be intractable with respect to large graphs. Gvozdenovic et al. introduced a more computationally efficient block-diagonal hierarchy of SDPs and computed the values of these SDPs of degrees 2 (L2) for the Paley graph clique numbers associated with primes p less or equal to 809, which bound from above the corresponding SOS-4 relaxations. We compute the values of the L2 relaxations for p's between 821 and 997. Our results provide some numerical evidence that these relaxations, and therefore also the SOS-4 relaxations, may be scaling at an order smaller than the square root of p. However, due to the size of the SDPs, we have not been able to compute L2 relaxations for p's greater than 997. Therefore, our scaling estimate is not conclusive and presents an interesting open problem for further study.
The shortest path network interdiction (SPNI) problem poses significant computational challenges due to its NP-hardness. Current solutions, primarily based on integer programming methods, are inefficient for large-scale instances. In this paper, we introduce a novel hybrid algorithm that can utilize Ising Processing Units (IPUs) alongside classical solvers. This approach decomposes the problem into manageable sub-problems, which are then offloaded to the slow but high-quality classical solvers or IPU. Results are subsequently recombined to form a global solution. Our method demonstrates comparable quality to existing whole problem solvers while reducing computational time for large-scale instances. Furthermore, our approach is amenable to parallelization, allowing for simultaneous processing of decomposed sub-problems.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.
Seeking the equivalent entities among multi-source Knowledge Graphs (KGs) is the pivotal step to KGs integration, also known as \emph{entity alignment} (EA). However, most existing EA methods are inefficient and poor in scalability. A recent summary points out that some of them even require several days to deal with a dataset containing 200,000 nodes (DWY100K). We believe over-complex graph encoder and inefficient negative sampling strategy are the two main reasons. In this paper, we propose a novel KG encoder -- Dual Attention Matching Network (Dual-AMN), which not only models both intra-graph and cross-graph information smartly, but also greatly reduces computational complexity. Furthermore, we propose the Normalized Hard Sample Mining Loss to smoothly select hard negative samples with reduced loss shift. The experimental results on widely used public datasets indicate that our method achieves both high accuracy and high efficiency. On DWY100K, the whole running process of our method could be finished in 1,100 seconds, at least 10* faster than previous work. The performances of our method also outperform previous works across all datasets, where Hits@1 and MRR have been improved from 6% to 13%.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.
Machine learning techniques have deeply rooted in our everyday life. However, since it is knowledge- and labor-intensive to pursue good learning performance, human experts are heavily involved in every aspect of machine learning. In order to make machine learning techniques easier to apply and reduce the demand for experienced human experts, automated machine learning (AutoML) has emerged as a hot topic with both industrial and academic interest. In this paper, we provide an up to date survey on AutoML. First, we introduce and define the AutoML problem, with inspiration from both realms of automation and machine learning. Then, we propose a general AutoML framework that not only covers most existing approaches to date but also can guide the design for new methods. Subsequently, we categorize and review the existing works from two aspects, i.e., the problem setup and the employed techniques. Finally, we provide a detailed analysis of AutoML approaches and explain the reasons underneath their successful applications. We hope this survey can serve as not only an insightful guideline for AutoML beginners but also an inspiration for future research.