In 1973, Lemmens and Seidel posed the problem of determining $N_\alpha(r)$, the maximum number of equiangular lines in $\mathbb{R}^r$ with common angle $\arccos(\alpha)$. Recently, this question has been almost completely settled in the case where $r$ is large relative to $1/\alpha$, with the approach relying on Ramsey's theorem. In this paper, we use orthogonal projections of matrices with respect to the Frobenius inner product in order to overcome this limitation, thereby obtaining upper bounds on $N_{\alpha}(r)$ which significantly improve on the only previously known universal bound of Glazyrin and Yu, as well as taking an important step towards determining $N_\alpha(r)$ exactly for all $r, \alpha$. In particular, our results imply that $N_\alpha(r) = \Theta(r)$ for $\alpha \geq \Omega(1/r^{1/5})$. Our arguments rely on a new geometric inequality for equiangular lines in $\mathbb{R}^r$ which is tight when the number of lines meets the absolute bound $\binom{r+1}{2}$. Moreover, using the connection to graphs, we obtain lower bounds on the second eigenvalue of the adjacency matrix of a regular graph which are tight for strongly regular graphs corresponding to $\binom{r+1}{2}$ equiangular lines in $\mathbb{R}^r$. Our results only require that the spectral gap is less than half the number of vertices and can therefore be seen as an extension of the Alon-Boppana theorem to dense graphs. Generalizing to $\mathbb{C}$, we also obtain the first universal bound on the maximum number of complex equiangular lines in $\mathbb{C}^r$ with common Hermitian angle $\arccos(\alpha)$. In particular, we prove an inequality for complex equiangular lines in $\mathbb{C}^r$ which is tight if the number of lines meets the absolute bound $r^2$ and may be of independent interest in quantum theory. Additionally, we use our projection method to obtain an improvement to Welch's bound.
Unified understanding of neuro networks (NNs) gets the users into great trouble because they have been puzzled by what kind of rules should be obeyed to optimize the internal structure of NNs. Considering the potential capability of random graphs to alter how computation is performed, we demonstrate that they can serve as architecture generators to optimize the internal structure of NNs. To transform the random graph theory into an NN model with practical meaning and based on clarifying the input-output relationship of each neuron, we complete data feature mapping by calculating Fourier Random Features (FRFs). Under the usage of this low-operation cost approach, neurons are assigned to several groups of which connection relationships can be regarded as uniform representations of random graphs they belong to, and random arrangement fuses those neurons to establish the pattern matrix, markedly reducing manual participation and computational cost without the fixed and deep architecture. Leveraging this single neuromorphic learning model termed random graph-based neuro network (RGNN) we develop a joint classification mechanism involving information interaction between multiple RGNNs and realize significant performance improvements in supervised learning for three benchmark tasks, whereby they effectively avoid the adverse impact of the interpretability of NNs on the structure design and engineering practice.
For a graph $G$, $\chi(G)$ $(\omega(G))$ denote its chromatic (clique) number. A $P_5$ is the chordless path on five vertices, and a $4$-$wheel$ is the graph consisting of a chordless cycle on four vertices $C_4$ plus an additional vertex adjacent to all the vertices of the $C_4$. In this paper, we show that every ($P_5$, $4$-wheel)-free graph $G$ satisfies $\chi(G)\leq \frac{3}{2}\omega(G)$. Moreover, this bound is almost tight. That is, there is a class of ($P_5$, $4$-wheel)-free graphs $\cal L$ such that every graph $H\in \cal L$ satisfies $\chi(H)\geq\frac{10}{7}\omega(H)$. This generalizes/improves several previously known results in the literature.
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our contribution is substantially more efficient than k-means as it does not require an all to all comparison of data points and clusters. We show that the optimal solutions of our approximation are the same as in the exact solution. However, our approach is considerably more efficient at extracting these clusters compared to the state-of-the-art. We compare our approximation with the exact k-means and alternative approximation approaches on a series of standardised clustering tasks. For the evaluation, we consider the algorithmic complexity, including number of operations to convergence, and the stability of the results.
Huang proved that every set of more than half the vertices of the $d$-dimensional hypercube $Q_d$ induces a subgraph of maximum degree at least $\sqrt{d}$, which is tight by a result of Chung, F\"uredi, Graham, and Seymour. Huang asked whether similar results can be obtained for other highly symmetric graphs. First, we present three infinite families of Cayley graphs of unbounded degree that contain induced subgraphs of maximum degree $1$ on more than half the vertices. In particular, this refutes a conjecture of Potechin and Tsang, for which first counterexamples were shown recently by Lehner and Verret. The first family consists of dihedrants and contains a sporadic counterexample encountered earlier by Lehner and Verret. The second family are star graphs, these are edge-transitive Cayley graphs of the symmetric group. All members of the third family are $d$-regular containing an induced matching on a $\frac{d}{2d-1}$-fraction of the vertices. This is largest possible and answers a question of Lehner and Verret. Second, we consider Huang's lower bound for graphs with subcubes and show that the corresponding lower bound is tight for products of Coxeter groups of type $\mathbf{A_n}$, $\mathbf{I_2}(2k+1)$, and most exceptional cases. We believe that Coxeter groups are a suitable generalization of the hypercube with respect to Huang's question. Finally, we show that induced subgraphs on more than half the vertices of Levi graphs of projective planes and of the Ramanujan graphs of Lubotzky, Phillips, and Sarnak have unbounded degree. This gives classes of Cayley graphs with properties similar to the ones provided by Huang's results. However, in contrast to Coxeter groups these graphs have no subcubes.
Let $Q_{n}^{r}$ be the graph with vertex set $\{-1,1\}^{n}$ in which two vertices are joined if their Hamming distance is at most $r$. The edge-isoperimetric problem for $Q_{n}^{r}$ is that: For every $(n,r,M)$ such that $1\le r\le n$ and $1\le M\le2^{n}$, determine the minimum edge-boundary size of a subset of vertices of $Q_{n}^{r}$ with a given size $M$. In this paper, we apply two different approaches to prove bounds for this problem. The first approach is a linear programming approach and the second is a probabilistic approach. Our bound derived by the first approach generalizes the tight bound for $M=2^{n-1}$ derived by Kahn, Kalai, and Linial in 1989. Moreover, our bound is also tight for $M=2^{n-2}$ and $r\le\frac{n}{2}-1$. Our bounds derived by the second approach are expressed in terms of the \emph{noise stability}, and they are shown to be asymptotically tight as $n\to\infty$ when $r=2\lfloor\frac{\beta n}{2}\rfloor+1$ and $M=\lfloor\alpha2^{n}\rfloor$ for fixed $\alpha,\beta\in(0,1)$, and is tight up to a factor $2$ when $r=2\lfloor\frac{\beta n}{2}\rfloor$ and $M=\lfloor\alpha2^{n}\rfloor$. In fact, the edge-isoperimetric problem is equivalent to a ball-noise stability problem which is a variant of the traditional (i.i.d.-) noise stability problem. Our results can be interpreted as bounds for the ball-noise stability problem.
K-core decomposition is a commonly used metric to analyze graph structure or study the relative importance of nodes in complex graphs. Recent years have seen rapid growth in the scale of the graph, especially in industrial settings. For example, our industrial partner runs popular social applications with billions of users and is able to gather a rich set of user data. As a result, applying K-core decomposition on large graphs has attracted more and more attention from academics and the industry. A simple but effective method to deal with large graphs is to train them in the distributed settings, and some distributed K-core decomposition algorithms are also proposed. Despite their effectiveness, we experimentally and theoretically observe that these algorithms consume too many resources and become unstable on super-large-scale graphs, especially when the given resources are limited. In this paper, we deal with those super-large-scale graphs and propose a divide-and-conquer strategy on top of the distributed K-core decomposition algorithm. We evaluate our approach on three large graphs. The experimental results show that the consumption of resources can be significantly reduced, and the calculation on large-scale graphs becomes more stable than the existing methods. For example, the distributed K-core decomposition algorithm can scale to a large graph with 136 billion edges without losing correctness with our divide-and-conquer technique.
The Transformer is widely used in natural language processing tasks. To train a Transformer however, one usually needs a carefully designed learning rate warm-up stage, which is shown to be crucial to the final performance but will slow down the optimization and bring more hyper-parameter tunings. In this paper, we first study theoretically why the learning rate warm-up stage is essential and show that the location of layer normalization matters. Specifically, we prove with mean field theory that at initialization, for the original-designed Post-LN Transformer, which places the layer normalization between the residual blocks, the expected gradients of the parameters near the output layer are large. Therefore, using a large learning rate on those gradients makes the training unstable. The warm-up stage is practically helpful for avoiding this problem. On the other hand, our theory also shows that if the layer normalization is put inside the residual blocks (recently proposed as Pre-LN Transformer), the gradients are well-behaved at initialization. This motivates us to remove the warm-up stage for the training of Pre-LN Transformers. We show in our experiments that Pre-LN Transformers without the warm-up stage can reach comparable results with baselines while requiring significantly less training time and hyper-parameter tuning on a wide range of applications.
Graph Neural Networks (GNN) come in many flavors, but should always be either invariant (permutation of the nodes of the input graph does not affect the output) or equivariant (permutation of the input permutes the output). In this paper, we consider a specific class of invariant and equivariant networks, for which we prove new universality theorems. More precisely, we consider networks with a single hidden layer, obtained by summing channels formed by applying an equivariant linear operator, a pointwise non-linearity and either an invariant or equivariant linear operator. Recently, Maron et al. (2019) showed that by allowing higher-order tensorization inside the network, universal invariant GNNs can be obtained. As a first contribution, we propose an alternative proof of this result, which relies on the Stone-Weierstrass theorem for algebra of real-valued functions. Our main contribution is then an extension of this result to the equivariant case, which appears in many practical applications but has been less studied from a theoretical point of view. The proof relies on a new generalized Stone-Weierstrass theorem for algebra of equivariant functions, which is of independent interest. Finally, unlike many previous settings that consider a fixed number of nodes, our results show that a GNN defined by a single set of parameters can approximate uniformly well a function defined on graphs of varying size.
Knowledge bases (KB), both automatically and manually constructed, are often incomplete --- many valid facts can be inferred from the KB by synthesizing existing information. A popular approach to KB completion is to infer new relations by combinatory reasoning over the information found along other paths connecting a pair of entities. Given the enormous size of KBs and the exponential number of paths, previous path-based models have considered only the problem of predicting a missing relation given two entities or evaluating the truth of a proposed triple. Additionally, these methods have traditionally used random paths between fixed entity pairs or more recently learned to pick paths between them. We propose a new algorithm MINERVA, which addresses the much more difficult and practical task of answering questions where the relation is known, but only one entity. Since random walks are impractical in a setting with combinatorially many destinations from a start node, we present a neural reinforcement learning approach which learns how to navigate the graph conditioned on the input query to find predictive paths. Empirically, this approach obtains state-of-the-art results on several datasets, significantly outperforming prior methods.
Random walks are at the heart of many existing network embedding methods. However, such algorithms have many limitations that arise from the use of random walks, e.g., the features resulting from these methods are unable to transfer to new nodes and graphs as they are tied to vertex identity. In this work, we introduce the Role2Vec framework which uses the flexible notion of attributed random walks, and serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many others that leverage random walks. Our proposed framework enables these methods to be more widely applicable for both transductive and inductive learning as well as for use on graphs with attributes (if available). This is achieved by learning functions that generalize to new nodes and graphs. We show that our proposed framework is effective with an average AUC improvement of 16:55% while requiring on average 853x less space than existing methods on a variety of graphs.