For $d\in\mathbb{N}$, let $S$ be a set of points in $\mathbb{R}^d$ in general position. A set $I$ of $k$ points from $S$ is a $k$-island in $S$ if the convex hull $\mathrm{conv}(I)$ of $I$ satisfies $\mathrm{conv}(I) \cap S = I$. A $k$-island in $S$ in convex position is a $k$-hole in $S$. For $d,k\in\mathbb{N}$ and a convex body $K\subseteq\mathbb{R}^d$ of volume $1$, let $S$ be a set of $n$ points chosen uniformly and independently at random from $K$. We show that the expected number of $k$-holes in $S$ is in $O(n^d)$. Our estimate improves and generalizes all previous bounds. In particular, we estimate the expected number of empty simplices in $S$ by $2^{d-1}\cdot d!\cdot\binom{n}{d}$. This is tight in the plane up to a lower-order term. Our method gives an asymptotically tight upper bound $O(n^d)$ even in the much more general setting, where we estimate the expected number of $k$-islands in $S$.
By defining two important terms called basic perturbation vectors and obtaining their linear bounds, we obtain the linear componentwise perturbation bounds for unitary factors and upper triangular factors of the generalized Schur decomposition. The perturbation bounds for the diagonal elements of the upper triangular factors and the generalized invariant subspace are also derived. From the former, we present an upper bound and a condition number of the generalized eigenvalue. Furthermore, with numerical iterative method, the nonlinear componentwise perturbation bounds of the generalized Schur decomposition are also provided. Numerical examples are given to test the obtained bounds. Among them, we compare our upper bound and condition number of the generalized eigenvalue with their counterparts given in the literature. Numerical results show that they are very close to each other but our results don't contain the information on the left and right generalized eigenvectors.
Graph Convolutional Networks (GCNs) are one of the most popular architectures that are used to solve classification problems accompanied by graphical information. We present a rigorous theoretical understanding of the effects of graph convolutions in multi-layer networks. We study these effects through the node classification problem of a non-linearly separable Gaussian mixture model coupled with a stochastic block model. First, we show that a single graph convolution expands the regime of the distance between the means where multi-layer networks can classify the data by a factor of at least $1/\sqrt[4]{\mathbb{E}{\rm deg}}$, where $\mathbb{E}{\rm deg}$ denotes the expected degree of a node. Second, we show that with a slightly stronger graph density, two graph convolutions improve this factor to at least $1/\sqrt[4]{n}$, where $n$ is the number of nodes in the graph. Finally, we provide both theoretical and empirical insights into the performance of graph convolutions placed in different combinations among the layers of a network, concluding that the performance is mutually similar for all combinations of the placement. We present extensive experiments on both synthetic and real-world data that illustrate our results.
Given a set $P$ of $n$ points in the plane, we consider the problem of computing the number of points of $P$ in a query unit disk (i.e., all query disks have the same radius). We show that the main techniques for simplex range searching in the plane can be adapted to this problem. For example, by adapting Matou\v{s}ek's results, we can build a data structure of $O(n)$ space so that each query can be answered in $O(\sqrt{n})$ time. Our techniques lead to improvements for several other classical problems, such as batched range searching, counting/reporting intersecting pairs of unit circles, distance selection, discrete 2-center, etc. For example, given a set of $n$ unit disks and a set of $n$ points in the plane, the batched range searching problem is to compute for each disk the number of points in it. Previous work [Katz and Sharir, 1997] solved the problem in $O(n^{4/3}\log n)$ time while our new algorithm runs in $O(n^{4/3})$ time.
Machine learning and computational intelligence technologies gain more and more popularity as possible solution for issues related to the power grid. One of these issues, the power flow calculation, is an iterative method to compute the voltage magnitudes of the power grid's buses from power values. Machine learning and, especially, artificial neural networks were successfully used as surrogates for the power flow calculation. Artificial neural networks highly rely on the quality and size of the training data, but this aspect of the process is apparently often neglected in the works we found. However, since the availability of high quality historical data for power grids is limited, we propose the Correlation Sampling algorithm. We show that this approach is able to cover a larger area of the sampling space compared to different random sampling algorithms from the literature and a copula-based approach, while at the same time inter-dependencies of the inputs are taken into account, which, from the other algorithms, only the copula-based approach does.
Covariance estimation for matrix-valued data has received an increasing interest in applications. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, we propose a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure. Under these conditions, the original covariance matrix is decomposed into a Kronecker product of two bandable small covariance matrices representing the variability over row and column directions. We formulate a unified framework for estimating bandable covariance, and introduce an efficient algorithm based on rank one unconstrained Kronecker product approximation. The convergence rates of the proposed estimators are established, and the derived minimax lower bound shows our proposed estimator is rate-optimal under certain divergence regimes of matrix size. We further introduce a class of robust covariance estimators and provide theoretical guarantees to deal with heavy-tailed data. We demonstrate the superior finite-sample performance of our methods using simulations and real applications from a gridded temperature anomalies dataset and a S&P 500 stock data analysis.
We study the distributed minimum spanning tree (MST) problem, a fundamental problem in distributed computing. It is well-known that distributed MST can be solved in $\tilde{O}(D+\sqrt{n})$ rounds in the standard CONGEST model (where $n$ is the network size and $D$ is the network diameter) and this is essentially the best possible round complexity (up to logarithmic factors). However, in resource-constrained networks such as ad hoc wireless and sensor networks, nodes spending so much time can lead to significant spending of resources such as energy. Motivated by the above consideration, we study distributed algorithms for MST under the \emph{sleeping model} [Chatterjee et al., PODC 2020], a model for design and analysis of resource-efficient distributed algorithms. In the sleeping model, a node can be in one of two modes in any round -- \emph{sleeping} or \emph{awake} (unlike the traditional model where nodes are always awake). Only the rounds in which a node is \emph{awake} are counted, while \emph{sleeping} rounds are ignored. A node spends resources only in the awake rounds and hence the main goal is to minimize the \emph{awake complexity} of a distributed algorithm, the worst-case number of rounds any node is awake. We present deterministic and randomized distributed MST algorithms that have an \emph{optimal} awake complexity of $O(\log n)$ time with a matching lower bound. We also show that our randomized awake-optimal algorithm has essentially the best possible round complexity by presenting a lower bound of $\tilde{\Omega}(n)$ on the product of the awake and round complexity of any distributed algorithm (including randomized) that outputs an MST, where $\tilde{\Omega}$ hides a $1/(\text{polylog } n)$ factor.
As the next-generation wireless networks thrive, full-duplex and relaying techniques are combined to improve the network performance. Random linear network coding (RLNC) is another popular technique to enhance the efficiency and reliability in wireless communications. In this paper, in order to explore the potential of RLNC in full-duplex relay networks, we investigate two fundamental perfect RLNC schemes and theoretically analyze their completion delay performance. The first scheme is a straightforward application of conventional perfect RLNC studied in wireless broadcast, so it involves no additional process at the relay. Its performance serves as an upper bound among all perfect RLNC schemes. The other scheme allows sufficiently large buffer and unconstrained linear coding at the relay. It attains the optimal performance and serves as a lower bound among all RLNC schemes. For both schemes, closed-form formulae to characterize the expected completion delay at a single receiver as well as for the whole system are derived. Numerical results are also demonstrated to justify the theoretical characterizations, and compare the two new schemes with the existing one.
Let ${\mathcal M}\subset {\mathbb R}^n$ be a $C^2$-smooth compact submanifold of dimension $d$. Assume that the volume of ${\mathcal M}$ is at most $V$ and the reach (i.e. the normal injectivity radius) of ${\mathcal M}$ is greater than $\tau$. Moreover, let $\mu$ be a probability measure on ${\mathcal M}$ whose density on ${\mathcal M}$ is a strictly positive Lipschitz-smooth function. Let $x_j\in {\mathcal M}$, $j=1,2,\dots,N$ be $N$ independent random samples from distribution $\mu$. Also, let $\xi_j$, $j=1,2,\dots, N$ be independent random samples from a Gaussian random variable in ${\mathbb R}^n$ having covariance $\sigma^2I$, where $\sigma$ is less than a certain specified function of $d, V$ and $\tau$. We assume that we are given the data points $y_j=x_j+\xi_j,$ $j=1,2,\dots,N$, modelling random points of ${\mathcal M}$ with measurement noise. We develop an algorithm which produces from these data, with high probability, a $d$ dimensional submanifold ${\mathcal M}_o\subset {\mathbb R}^n$ whose Hausdorff distance to ${\mathcal M}$ is less than $Cd\sigma^2/\tau$ and whose reach is greater than $c{\tau}/d^6$ with universal constants $C,c > 0$. The number $N$ of random samples required depends almost linearly on $n$, polynomially on $\sigma^{-1}$ and exponentially on $d$.
Binding operation is fundamental to many cognitive processes, such as cognitive map formation, relational reasoning, and language comprehension. In these processes, two different modalities, such as location and objects, events and their contextual cues, and words and their roles, need to be bound together, but little is known about the underlying neural mechanisms. Previous works introduced a binding model based on quadratic functions of bound pairs, followed by vector summation of multiple pairs. Based on this framework, we address following questions: Which classes of quadratic matrices are optimal for decoding relational structures? And what is the resultant accuracy? We introduce a new class of binding matrices based on a matrix representation of octonion algebra, an eight-dimensional extension of complex numbers. We show that these matrices enable a more accurate unbinding than previously known methods when a small number of pairs are present. Moreover, numerical optimization of a binding operator converges to this octonion binding. We also show that when there are a large number of bound pairs, however, a random quadratic binding performs as well as the octonion and previously-proposed binding methods. This study thus provides new insight into potential neural mechanisms of binding operations in the brain.
Neural networks have shown tremendous growth in recent years to solve numerous problems. Various types of neural networks have been introduced to deal with different types of problems. However, the main goal of any neural network is to transform the non-linearly separable input data into more linearly separable abstract features using a hierarchy of layers. These layers are combinations of linear and nonlinear functions. The most popular and common non-linearity layers are activation functions (AFs), such as Logistic Sigmoid, Tanh, ReLU, ELU, Swish and Mish. In this paper, a comprehensive overview and survey is presented for AFs in neural networks for deep learning. Different classes of AFs such as Logistic Sigmoid and Tanh based, ReLU based, ELU based, and Learning based are covered. Several characteristics of AFs such as output range, monotonicity, and smoothness are also pointed out. A performance comparison is also performed among 18 state-of-the-art AFs with different networks on different types of data. The insights of AFs are presented to benefit the researchers for doing further research and practitioners to select among different choices. The code used for experimental comparison is released at: \url{//github.com/shivram1987/ActivationFunctions}.