Parameter-space and function-space provide two different duality frames in which to study neural networks. We demonstrate that symmetries of network densities may be determined via dual computations of network correlation functions, even when the density is unknown and the network is not equivariant. Symmetry-via-duality relies on invariance properties of the correlation functions, which stem from the choice of network parameter distributions. Input and output symmetries of neural network densities are determined, which recover known Gaussian process results in the infinite width limit. The mechanism may also be utilized to determine symmetries during training, when parameters are correlated, as well as symmetries of the Neural Tangent Kernel. We demonstrate that the amount of symmetry in the initialization density affects the accuracy of networks trained on Fashion-MNIST, and that symmetry breaking helps only when it is in the direction of ground truth.
In the Bayes paradigm and for a given loss function, we propose the construction of a new type of posterior distributions for estimating the law of an $n$-sample. The loss functions we have in mind are based on the total variation distance, the Hellinger distance as well as some $\mathbb{L}_{j}$-distances. We prove that, with a probability close to one, this new posterior distribution concentrates its mass in a neighbourhood of the law of the data, for the chosen loss function, provided that this law belongs to the support of the prior or, at least, lies close enough to it. We therefore establish that the new posterior distribution enjoys some robustness properties with respect to a possible misspecification of the prior, or more precisely, its support. For the total variation and squared Hellinger losses, we also show that the posterior distribution keeps its concentration properties when the data are only independent, hence not necessarily i.i.d., provided that most of their marginals are close enough to some probability distribution around which the prior puts enough mass. The posterior distribution is therefore also stable with respect to the equidistribution assumption. We illustrate these results by several applications. We consider the problems of estimating a location parameter or both the location and the scale of a density in a nonparametric framework. Finally, we also tackle the problem of estimating a density, with the squared Hellinger loss, in a high-dimensional parametric model under some sparcity conditions. The results established in this paper are non-asymptotic and provide, as much as possible, explicit constants.
Correlation measure of order $k$ is an important measure of randomness in binary sequences. This measure tries to look for dependence between several shifted version of a sequence. We study the relation between the correlation measure of order $k$ and another two pseudorandom measures: the $N$th linear complexity and the $N$th maximum order complexity. We simplify and improve several state-of-the-art lower bounds for these two measures using the Hamming bound as well as weaker bounds derived from it.
Due to wide applications of binary sequences with low correlation to communications, various constructions of such sequences have been proposed in literature. However, most of the known constructions via finite fields make use of the multiplicative cyclic group of $\F_{2^n}$. It is often overlooked in this community that all $2^n+1$ rational places (including "place at infinity") of the rational function field over $\F_{2^n}$ form a cyclic structure under an automorphism of order $2^n+1$. In this paper, we make use of this cyclic structure to provide an explicit construction of families of binary sequences of length $2^n+1$ via the finite field $\F_{2^n}$. Each family of sequences has size $2^n-1$ and its correlation is upper bounded by $\lfloor 2^{(n+2)/2}\rfloor$. Our sequences can be constructed explicitly and have competitive parameters. In particular, compared with the Gold sequences of length $2^n-1$ for even $n$, we have larger length and smaller correlation although the family size of our sequences is slightly smaller.
Network alignment is a problem of finding the node mapping between similar networks. It links the data from separate sources and is widely studied in bioinformation and social network fields. The critical difference between network alignment and exact graph matching is that the network alignment considers node mapping in non-isomorphic graphs with error tolerance. Researchers usually utilize AC (accuracy) to measure the performance of network alignments which comparing each output element with the benchmark directly. However, this metric neglects that some nodes are naturally indistinguishable even in single graphs (e.g., nodes have the same neighbors) and no need to distinguish across graphs. Such neglect leads to the underestimation of models. We propose an unbiased metric for network alignment that takes indistinguishable nodes into consideration to address this problem. Our detailed experiments with different scales on both synthetic and real-world datasets demonstrate that the proposed metric correctly reflects the deviation of result mapping from benchmark mapping as standard metric AC does. Comparing with the AC, the proposed metric effectively blocks the effect of indistinguishable nodes and retains stability under increasing indistinguishable nodes.
We develop an error estimator for neural network approximations of PDEs. The proposed approach is based on dual weighted residual estimator (DWR). It is destined to serve as a stopping criterion that guarantees the accuracy of the solution independently of the design of the neural network training. The result is equipped with computational examples for Laplace and Stokes problems.
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.
Neural architecture search has attracted wide attentions in both academia and industry. To accelerate it, researchers proposed weight-sharing methods which first train a super-network to reuse computation among different operators, from which exponentially many sub-networks can be sampled and efficiently evaluated. These methods enjoy great advantages in terms of computational costs, but the sampled sub-networks are not guaranteed to be estimated precisely unless an individual training process is taken. This paper owes such inaccuracy to the inevitable mismatch between assembled network layers, so that there is a random error term added to each estimation. We alleviate this issue by training a graph convolutional network to fit the performance of sampled sub-networks so that the impact of random errors becomes minimal. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates, which consequently leads to better performance of the final architecture. In addition, our approach also enjoys the flexibility of being used under different hardware constraints, since the graph convolutional network has provided an efficient lookup table of the performance of architectures in the entire search space.
The distance-geometric graph representation adopts a unified scheme (distance) for representing the geometry of three-dimensional(3D) graphs. It is invariant to rotation and translation of the graph and it reflects pair-wise node interactions and their generally local nature. To facilitate the incorporation of geometry in deep learning on 3D graphs, we propose a message-passing graph convolutional network based on the distance-geometric graph representation: DG-GCN (distance-geometric graph convolution network). It utilizes continuous-filter convolutional layers, with filter-generating networks, that enable learning of filter weights from distances, thereby incorporating the geometry of 3D graphs in graph convolutions. Our results for the ESOL and FreeSolv datasets show major improvement over those of standard graph convolutions. They also show significant improvement over those of geometric graph convolutions employing edge weight / edge distance power laws. Our work demonstrates the utility and value of DG-GCN for end-to-end deep learning on 3D graphs, particularly molecular graphs.
Graph Neural Networks (GNN) come in many flavors, but should always be either invariant (permutation of the nodes of the input graph does not affect the output) or equivariant (permutation of the input permutes the output). In this paper, we consider a specific class of invariant and equivariant networks, for which we prove new universality theorems. More precisely, we consider networks with a single hidden layer, obtained by summing channels formed by applying an equivariant linear operator, a pointwise non-linearity and either an invariant or equivariant linear operator. Recently, Maron et al. (2019) showed that by allowing higher-order tensorization inside the network, universal invariant GNNs can be obtained. As a first contribution, we propose an alternative proof of this result, which relies on the Stone-Weierstrass theorem for algebra of real-valued functions. Our main contribution is then an extension of this result to the equivariant case, which appears in many practical applications but has been less studied from a theoretical point of view. The proof relies on a new generalized Stone-Weierstrass theorem for algebra of equivariant functions, which is of independent interest. Finally, unlike many previous settings that consider a fixed number of nodes, our results show that a GNN defined by a single set of parameters can approximate uniformly well a function defined on graphs of varying size.
For neural networks (NNs) with rectified linear unit (ReLU) or binary activation functions, we show that their training can be accomplished in a reduced parameter space. Specifically, the weights in each neuron can be trained on the unit sphere, as opposed to the entire space, and the threshold can be trained in a bounded interval, as opposed to the real line. We show that the NNs in the reduced parameter space are mathematically equivalent to the standard NNs with parameters in the whole space. The reduced parameter space shall facilitate the optimization procedure for the network training, as the search space becomes (much) smaller. We demonstrate the improved training performance using numerical examples.