The number-theoretic codes are a class of codes defined by single or multiple congruences. These codes are mainly used for correcting insertion and deletion errors, and for correcting asymmetric errors. This paper presents a formula for a generalization of the complete weight enumerator for the number-theoretic codes. This formula allows us to derive the weight enumerators and cardinalities for the number-theoretic codes. As a special case, this paper provides the Hamming weight enumerators and cardinalities of the non-binary Tenengolts' codes, correcting single insertion or deletion. Moreover, we show that the formula deduces the MacWilliams identity for the linear codes over the ring of integers modulo $r$.
We consider edge modification problems towards block and strictly chordal graphs, where one is given an undirected graph $G = (V,E)$ and an integer $k \in \mathbb{N}$ and seeks to edit (add or delete) at most $k$ edges from $G$ to obtain a block graph or a strictly chordal graph. The completion and deletion variants of these problems are defined similarly by only allowing edge additions for the former and only edge deletions for the latter. Block graphs are a well-studied class of graphs and admit several characterizations, e.g. they are diamond-free chordal graphs. Strictly chordal graphs, also referred to as block duplicate graphs, are a natural generalization of block graphs where one can add true twins of cut-vertices. Strictly chordal graphs are exactly dart and gem-free chordal graphs. We prove the NP-completeness for most variants of these problems and provide $O(k^2)$ vertex-kernels for Block Graph Edition and Block Graph Deletion, $O(k^3)$ vertex-kernels for Strictly Chordal Completion and Strictly Chordal Deletion and a $O(k^4)$ vertex-kernel for Strictly Chordal Edition.
In this work, a sequential decoder for convolutional codes over channels that are vulnerable to insertion, deletion, and substitution errors, is described and analyzed. The decoder expands the code trellis by introducing a new channel state variable, called drift state, as proposed by Davey-MacKay. A suitable decoding metric on that trellis for sequential decoding is derived, in a manner that generalizes the original Fano metric. Under low-noise environments, this approach reduces the decoding complexity by a couple orders of magnitude in comparison to Viterbi's algorithm, albeit at relatively higher frame error rates. An analytical method to determine the computational cutoff rate is also suggested. This analysis is supported with numerical evaluations of frame error rates and computational complexity, which are compared with respect to optimal Viterbi decoding.
The boundary operator is a linear operator that acts on a collection of high-dimensional binary points (simplices) and maps them to their boundaries. This boundary map is one of the key components in numerous applications, including differential equations, machine learning, computational geometry, machine vision and control systems. We consider the problem of representing the full boundary operator on a quantum computer. We first prove that the boundary operator has a special structure in the form of a complete sum of fermionic creation and annihilation operators. We then use the fact that these operators pairwise anticommute to produce an $\mathcal{O}(n)$-depth circuit that exactly implements the boundary operator without any Trotterization or Taylor series approximation errors. Having fewer errors reduces the number of shots required to obtain desired accuracies.
The concept of bounded expansion provides a robust way to capture sparse graph classes with interesting algorithmic properties. Most notably, every problem definable in first-order logic can be solved in linear time on bounded expansion graph classes. First-order interpretations and transductions of sparse graph classes lead to more general, dense graph classes that seem to inherit many of the nice algorithmic properties of their sparse counterparts. In this paper, we show that one can encode graphs from a class with structurally bounded expansion via lacon-, shrub- and parity-decompositions from a class with bounded expansion. These decompositions are useful for lifting properties from sparse to structurally sparse graph classes.
A generalization of L{\"u}roth's theorem expresses that every transcendence degree 1 subfield of the rational function field is a simple extension. In this note we show that a classical proof of this theorem also holds to prove this generalization.
Neural networks have shown tremendous growth in recent years to solve numerous problems. Various types of neural networks have been introduced to deal with different types of problems. However, the main goal of any neural network is to transform the non-linearly separable input data into more linearly separable abstract features using a hierarchy of layers. These layers are combinations of linear and nonlinear functions. The most popular and common non-linearity layers are activation functions (AFs), such as Logistic Sigmoid, Tanh, ReLU, ELU, Swish and Mish. In this paper, a comprehensive overview and survey is presented for AFs in neural networks for deep learning. Different classes of AFs such as Logistic Sigmoid and Tanh based, ReLU based, ELU based, and Learning based are covered. Several characteristics of AFs such as output range, monotonicity, and smoothness are also pointed out. A performance comparison is also performed among 18 state-of-the-art AFs with different networks on different types of data. The insights of AFs are presented to benefit the researchers for doing further research and practitioners to select among different choices. The code used for experimental comparison is released at: \url{//github.com/shivram1987/ActivationFunctions}.
One of the fundamental problems in Artificial Intelligence is to perform complex multi-hop logical reasoning over the facts captured by a knowledge graph (KG). This problem is challenging, because KGs can be massive and incomplete. Recent approaches embed KG entities in a low dimensional space and then use these embeddings to find the answer entities. However, it has been an outstanding challenge of how to handle arbitrary first-order logic (FOL) queries as present methods are limited to only a subset of FOL operators. In particular, the negation operator is not supported. An additional limitation of present methods is also that they cannot naturally model uncertainty. Here, we present BetaE, a probabilistic embedding framework for answering arbitrary FOL queries over KGs. BetaE is the first method that can handle a complete set of first-order logical operations: conjunction ($\wedge$), disjunction ($\vee$), and negation ($\neg$). A key insight of BetaE is to use probabilistic distributions with bounded support, specifically the Beta distribution, and embed queries/entities as distributions, which as a consequence allows us to also faithfully model uncertainty. Logical operations are performed in the embedding space by neural operators over the probabilistic embeddings. We demonstrate the performance of BetaE on answering arbitrary FOL queries on three large, incomplete KGs. While being more general, BetaE also increases relative performance by up to 25.4% over the current state-of-the-art KG reasoning methods that can only handle conjunctive queries without negation.
Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.
Graph Neural Networks (GNN) come in many flavors, but should always be either invariant (permutation of the nodes of the input graph does not affect the output) or equivariant (permutation of the input permutes the output). In this paper, we consider a specific class of invariant and equivariant networks, for which we prove new universality theorems. More precisely, we consider networks with a single hidden layer, obtained by summing channels formed by applying an equivariant linear operator, a pointwise non-linearity and either an invariant or equivariant linear operator. Recently, Maron et al. (2019) showed that by allowing higher-order tensorization inside the network, universal invariant GNNs can be obtained. As a first contribution, we propose an alternative proof of this result, which relies on the Stone-Weierstrass theorem for algebra of real-valued functions. Our main contribution is then an extension of this result to the equivariant case, which appears in many practical applications but has been less studied from a theoretical point of view. The proof relies on a new generalized Stone-Weierstrass theorem for algebra of equivariant functions, which is of independent interest. Finally, unlike many previous settings that consider a fixed number of nodes, our results show that a GNN defined by a single set of parameters can approximate uniformly well a function defined on graphs of varying size.
For neural networks (NNs) with rectified linear unit (ReLU) or binary activation functions, we show that their training can be accomplished in a reduced parameter space. Specifically, the weights in each neuron can be trained on the unit sphere, as opposed to the entire space, and the threshold can be trained in a bounded interval, as opposed to the real line. We show that the NNs in the reduced parameter space are mathematically equivalent to the standard NNs with parameters in the whole space. The reduced parameter space shall facilitate the optimization procedure for the network training, as the search space becomes (much) smaller. We demonstrate the improved training performance using numerical examples.