We study vulnerability of a uniformly distributed random graph to an attack by an adversary who aims for a global change of the distribution while being able to make only a local change in the graph. We call a graph property $A$ anti-stochastic if the probability that a random graph $G$ satisfies $A$ is small but, with high probability, there is a small perturbation transforming $G$ into a graph satisfying $A$. While for labeled graphs such properties are easy to obtain from binary covering codes, the existence of anti-stochastic properties for unlabeled graphs is not so evident. If an admissible perturbation is either the addition or the deletion of one edge, we exhibit an anti-stochastic property that is satisfied by a random unlabeled graph of order $n$ with probability $(2+o(1))/n^2$, which is as small as possible. We also express another anti-stochastic property in terms of the degree sequence of a graph. This property has probability $(2+o(1))/(n\ln n)$, which is optimal up to factor of 2.
Echo State Networks (ESN) are a type of Recurrent Neural Network that yields promising results in representing time series and nonlinear dynamic systems. Although they are equipped with a very efficient training procedure, Reservoir Computing strategies, such as the ESN, require high-order networks, i.e., many neurons, resulting in a large number of states that are magnitudes higher than the number of model inputs and outputs. A large number of states not only makes the time-step computation more costly but also may pose robustness issues, especially when applying ESNs to problems such as Model Predictive Control (MPC) and other optimal control problems. One way to circumvent this complexity issue is through Model Order Reduction strategies such as the Proper Orthogonal Decomposition (POD) and its variants (POD-DEIM), whereby we find an equivalent lower order representation to an already trained high dimension ESN. To this end, this work aims to investigate and analyze the performance of POD methods in Echo State Networks, evaluating their effectiveness through the Memory Capacity (MC) of the POD-reduced network compared to the original (full-order) ESN. We also perform experiments on two numerical case studies: a NARMA10 difference equation and an oil platform containing two wells and one riser. The results show that there is little loss of performance comparing the original ESN to a POD-reduced counterpart and that the performance of a POD-reduced ESN tends to be superior to a normal ESN of the same size. Also, the POD-reduced network achieves speedups of around $80\%$ compared to the original ESN.
Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent perturbation of the spectrum of the neural network Gaussian process (NNGP) kernel, thus establishing a novel connection between the NNGP literature and the random matrix theory literature in the context of neural networks. Our analytical expression allows us to study the generalisation behavior of the corresponding kernel and GP regression, and provides a new interpretation of the double-descent phenomenon, namely as governed by the discrepancy between the width-dependent empirical kernel and the width-independent NNGP kernel.
We study the problem of decentralized power allocation in a multi-access channel (MAC) with non-cooperative users, additive noise of arbitrary distribution and a generalized power constraint, i.e., the transmit power constraint is modeled by an upper bound on $\mathbb{E}[\phi(|S|)]$, where $S$ is the transmit signal and $\phi(.)$ is some non-negative, increasing and bounded function. The generalized power constraint captures the notion of power for different wireless signals such as RF, optical, acoustic, etc. We derive the optimal power allocation policy when there a large number of non-cooperative users in the MAC. Further, we show that, once the number of users in the MAC crosses a finite threshold, the proposed power allocation policy of all users is optimal and remains invariant irrespective of the actual number of users. We derive the above results under the condition that the entropy power of the MAC, $e^{2h(S)+c}$, is strictly convex, where $h(S)$ is the maximum achievable entropy of the transmit signal and $c$ is a finite constant corresponding to the entropy of the additive noise.
We give the first agnostic, efficient, proper learning algorithm for monotone Boolean functions. Given $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$ uniformly random examples of an unknown function $f:\{\pm 1\}^n \rightarrow \{\pm 1\}$, our algorithm outputs a hypothesis $g:\{\pm 1\}^n \rightarrow \{\pm 1\}$ that is monotone and $(\mathrm{opt} + \varepsilon)$-close to $f$, where $\mathrm{opt}$ is the distance from $f$ to the closest monotone function. The running time of the algorithm (and consequently the size and evaluation time of the hypothesis) is also $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$, nearly matching the lower bound of Blais et al (RANDOM '15). We also give an algorithm for estimating up to additive error $\varepsilon$ the distance of an unknown function $f$ to monotone using a run-time of $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$. Previously, for both of these problems, sample-efficient algorithms were known, but these algorithms were not run-time efficient. Our work thus closes this gap in our knowledge between the run-time and sample complexity. This work builds upon the improper learning algorithm of Bshouty and Tamon (JACM '96) and the proper semiagnostic learning algorithm of Lange, Rubinfeld, and Vasilyan (FOCS '22), which obtains a non-monotone Boolean-valued hypothesis, then ``corrects'' it to monotone using query-efficient local computation algorithms on graphs. This black-box correction approach can achieve no error better than $2\mathrm{opt} + \varepsilon$ information-theoretically; we bypass this barrier by a) augmenting the improper learner with a convex optimization step, and b) learning and correcting a real-valued function before rounding its values to Boolean. Our real-valued correction algorithm solves the ``poset sorting'' problem of [LRV22] for functions over general posets with non-Boolean labels.
The Horvitz-Thompson (HT), the Rao-Hartley-Cochran (RHC) and the generalized regression (GREG) estimators of the finite population mean are considered, when the observations are from an infinite dimensional space. We compare these estimators based on their asymptotic distributions under some commonly used sampling designs and some superpopulations satisfying linear regression models. We show that the GREG estimator is asymptotically at least as efficient as any of the other two estimators under different sampling designs considered in this paper. Further, we show that the use of some well known sampling designs utilizing auxiliary information may have an adverse effect on the performance of the GREG estimator, when the degree of heteroscedasticity present in linear regression models is not very large. On the other hand, the use of those sampling designs improves the performance of this estimator, when the degree of heteroscedasticity present in linear regression models is large. We develop methods for determining the degree of heteroscedasticity, which in turn determines the choice of appropriate sampling design to be used with the GREG estimator. We also investigate the consistency of the covariance operators of the above estimators. We carry out some numerical studies using real and synthetic data, and our theoretical results are supported by the results obtained from those numerical studies.
Several well known estimators of finite population mean and its functions are investigated under some standard sampling designs. Such functions of mean include the variance, the correlation coefficient and the regression coefficient in the population as special cases. We compare the performance of these estimators under different sampling designs based on their asymptotic distributions. Equivalence classes of estimators under different sampling designs are constructed so that estimators in the same class have equivalent performance in terms of asymptotic mean squared errors (MSEs). Estimators in different equivalence classes are then compared under some superpopulations satisfying linear models. It is shown that the pseudo empirical likelihood (PEML) estimator of the population mean under simple random sampling without replacement (SRSWOR) has the lowest asymptotic MSE among all the estimators under different sampling designs considered in this paper. It is also shown that for the variance, the correlation coefficient and the regression coefficient of the population, the plug-in estimators based on the PEML estimator have the lowest asymptotic MSEs among all the estimators considered in this paper under SRSWOR. On the other hand, for any high entropy $\pi$PS (HE$\pi$PS) sampling design, which uses the auxiliary information, the plug-in estimators of those parameters based on the H\'ajek estimator have the lowest asymptotic MSEs among all the estimators considered in this paper.
This paper deals with Niho functions which are one of the most important classes of functions thanks to their close connections with a wide variety of objects from mathematics, such as spreads and oval polynomials or from applied areas, such as symmetric cryptography, coding theory and sequences. In this paper, we investigate specifically the $c$-differential uniformity of the power function $F(x)=x^{s(2^m-1)+1}$ over the finite field $\mathbb{F}_{2^n}$, where $n=2m$, $m$ is odd and $s=(2^k+1)^{-1}$ is the multiplicative inverse of $2^k+1$ modulo $2^m+1$, and show that the $c$-differential uniformity of $F(x)$ is $2^{\gcd(k,m)}+1$ by carrying out some subtle manipulation of certain equations over $\mathbb{F}_{2^n}$. Notably, $F(x)$ has a very low $c$-differential uniformity equals $3$ when $k$ and $m$ are coprime.
Traditionally, different types of feature operators (e.g., convolution, self-attention and involution) utilize different approaches to extract and aggregate the features. Resemblance can be hardly discovered from their mathematical formulas. However, these three operators all serve the same paramount purpose and bear no difference in essence. Hence we probe into the essence of various feature operators from a high-level perspective, transformed their components equivalently, and explored their mathematical expressions within higher dimensions. We raise one clear and concrete unified formula for different feature operators termed as Evolution. Evolution utilizes the Evolution Function to generate the Evolution Kernel, which extracts and aggregates the features in certain positions of the input feature map. We mathematically deduce the equivalent transformation from the traditional formulas of these feature operators to Evolution and prove the unification. In addition, we discuss the forms of Evolution Functions and the properties of generated Evolution Kernels, intending to give inspirations to the further research and innovations of powerful feature operators.
Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.