Community detection and orthogonal group synchronization are both fundamental problems with a variety of important applications in science and engineering. In this work, we consider the joint problem of community detection and orthogonal group synchronization which aims to recover the communities and perform synchronization simultaneously. To this end, we propose a simple algorithm that consists of a spectral decomposition step followed by a blockwise column pivoted QR factorization (CPQR). The proposed algorithm is efficient and scales linearly with the number of edges in the graph. We also leverage the recently developed `leave-one-out' technique to establish a near-optimal guarantee for exact recovery of the cluster memberships and stable recovery of the orthogonal transforms. Numerical experiments demonstrate the efficiency and efficacy of our algorithm and confirm our theoretical characterization of it.
Cross-correlation analysis is a powerful tool for understanding the mutual dynamics of time series. This study introduces a new method for predicting the future state of synchronization of the dynamics of two financial time series. To this end, we use the cross-recurrence plot analysis as a nonlinear method for quantifying the multidimensional coupling in the time domain of two time series and for determining their state of synchronization. We adopt a deep learning framework for methodologically addressing the prediction of the synchronization state based on features extracted from dynamically sub-sampled cross-recurrence plots. We provide extensive experiments on several stocks, major constituents of the S\&P100 index, to empirically validate our approach. We find that the task of predicting the state of synchronization of two time series is in general rather difficult, but for certain pairs of stocks attainable with very satisfactory performance.
Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. In this paper, to mitigate stragglers and improve communication efficiency, a novel local SGD strategy, named STSyn, is developed. The key point is to wait for the $K$ fastest workers, while keeping all the workers computing continually at each synchronization round, and making full use of any effective (completed) local update of each worker regardless of stragglers. An analysis of the average wall-clock time, average number of local updates and average number of uploading workers per round is provided to gauge the performance of STSyn. The convergence of STSyn is also rigorously established even when the objective function is nonconvex. Experimental results show the superiority of the proposed STSyn against state-of-the-art schemes through utilization of the straggler-tolerant technique and additional effective local updates at each worker, and the influence of system parameters is studied. By waiting for faster workers and allowing heterogeneous synchronization with different numbers of local updates across workers, STSyn provides substantial improvements both in time and communication efficiency.
Tailor-made for massive connectivity and sporadic access, grant-free random access has become a promising candidate access protocol for massive machine-type communications (mMTC). Compared with conventional grant-based protocols, grant-free random access skips the exchange of scheduling information to reduce the signaling overhead, and facilitates sharing of access resources to enhance access efficiency. However, some challenges remain to be addressed in the receiver design, such as unknown identity of active users and multi-user interference (MUI) on shared access resources. In this work, we deal with the problem of joint user activity and data detection for grant-free random access. Specifically, the approximate message passing (AMP) algorithm is first employed to mitigate MUI and decouple the signals of different users. Then, we extend the data symbol alphabet to incorporate the null symbols from inactive users. In this way, the joint user activity and data detection problem is formulated as a clustering problem under the Gaussian mixture model. Furthermore, in conjunction with the AMP algorithm, a variational Bayesian inference based clustering (VBIC) algorithm is developed to solve this clustering problem. Simulation results show that, compared with state-of-art solutions, the proposed AMP-combined VBIC (AMP-VBIC) algorithm achieves a significant performance gain in detection accuracy.
We study recovery of amplitudes and nodes of a finite impulse train from a limited number of equispaced noisy frequency samples. This problem is known as super-resolution (SR) under sparsity constraints and has numerous applications, including direction of arrival and finite rate of innovation sampling. Prony's method is an algebraic technique which fully recovers the signal parameters in the absence of measurement noise. In the presence of noise, Prony's method may experience significant loss of accuracy, especially when the separation between Dirac pulses is smaller than the Nyquist-Shannon-Rayleigh (NSR) limit. In this work we combine Prony's method with a recently established decimation technique for analyzing the SR problem in the regime where the distance between two or more pulses is much smaller than the NSR limit. We show that our approach attains optimal asymptotic stability in the presence of noise. Our result challenges the conventional belief that Prony-type methods tend to be highly numerically unstable.
The polar orthogonal Grassmann code $C(\mathbb{O}_{3,6})$ is the linear code associated to the Grassmann embedding of the Dual Polar space of $Q^+(5,q)$. In this manuscript we study the minimum distance of this embedding. We prove that the minimum distance of the polar orthogonal Grassmann code $C(\mathbb{O}_{3,6})$ is $q^3-q^3$ for $q$ odd and $q^3$ for $q$ even. Our technique is based on partitioning the orthogonal space into different sets such that on each partition the code $C(\mathbb{O}_{3,6})$ is identified with evaluations of determinants of skew--symmetric matrices. Our bounds come from elementary algebraic methods counting the zeroes of particular classes of polynomials. We expect our techniques may be applied to other polar Grassmann codes.
We build a general framework which establishes a one-to-one correspondence between species abundance distribution (SAD) and species accumulation curve (SAC). The appearance rates of the species and the appearance times of individuals of each species are modeled as Poisson processes. The number of species can be finite or infinite. Hill numbers are extended to the framework. We introduce a linear derivative ratio family of models, $\mathrm{LDR}_1$, of which the ratio of the first and the second derivatives of the expected SAC is a linear function. A D1/D2 plot is proposed to detect this linear pattern in the data. By extrapolation of the curve in the D1/D2 plot, a species richness estimator that extends Chao1 estimator is introduced. The SAD of $\mathrm{LDR}_1$ is the Engen's extended negative binomial distribution, and the SAC encompasses several popular parametric forms including the power law. Family $\mathrm{LDR}_1$ is extended in two ways: $\mathrm{LDR}_2$ which allows species with zero detection probability, and $\mathrm{RDR}_1$ where the derivative ratio is a rational function. Real data are analyzed to demonstrate the proposed methods. We also consider the scenario where we record only a few leading appearance times of each species. We show how maximum likelihood inference can be performed when only the empirical SAC is observed, and elucidate its advantages over the traditional curve-fitting method.
The existing discrete variational derivative method is only second-order accurate and fully implicit. In this paper, we propose a framework to construct an arbitrary high-order implicit (original) energy stable scheme and a second-order semi-implicit (modified) energy stable scheme. Combined with the Runge--Kutta process, we can build an arbitrary high-order and unconditionally (original) energy stable scheme based on the discrete variational derivative method. The new energy stable scheme is implicit and leads to a large sparse nonlinear algebraic system at each time step, which can be efficiently solved by using an inexact Newton type algorithm. To avoid solving nonlinear algebraic systems, we then present a relaxed discrete variational derivative method, which can construct second-order, linear, and unconditionally (modified) energy stable schemes. Several numerical simulations are performed to investigate the efficiency, stability, and accuracy of the newly proposed schemes.
One of the most important features of financial time series data is volatility. There are often structural changes in volatility over time, and an accurate estimation of the volatility of financial time series requires careful identification of change-points. A common approach to modeling the volatility of time series data is the well-known GARCH model. Although the problem of change-point estimation of volatility dynamics derived from the GARCH model has been considered in the literature, these approaches rely on parametric assumptions of the conditional error distribution, which are often violated in financial time series. This may lead to inaccuracies in change-point detection resulting in unreliable GARCH volatility estimates. This paper introduces a novel change-point detection algorithm based on a semiparametric GARCH model. The proposed method retains the structural advantages of the GARCH process while incorporating the flexibility of nonparametric conditional error distribution. The approach utilizes a penalized likelihood derived from a semiparametric GARCH model and an efficient binary segmentation algorithm. The results show that in terms of change-point estimation and detection accuracy, the semiparametric method outperforms the commonly used Quasi-MLE (QMLE) and other variations of GARCH models in wide-ranging scenarios.
This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. Specifically, we assume that each node can be represented as a mixture of communities, and each community is defined as a multinomial distribution over nodes. Both the mixing coefficients and the community distribution are parameterized by the low-dimensional representations of the nodes and communities. We designed an effective variational inference algorithm which regularizes the community membership of neighboring nodes to be similar in the latent space. Experimental results on multiple real-world graphs show that vGraph is very effective in both community detection and node representation learning, outperforming many competitive baselines in both tasks. We show that the framework of vGraph is quite flexible and can be easily extended to detect hierarchical communities.
Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data points into the eigenspace of their associated graph Laplacian matrix and subsequently clusters them. We train SpectralNet using a procedure that involves constrained stochastic optimization. Stochastic optimization allows it to scale to large datasets, while the constraints, which are implemented using a special-purpose output layer, allow us to keep the network output orthogonal. Moreover, the map learned by SpectralNet naturally generalizes the spectral embedding to unseen data points. To further improve the quality of the clustering, we replace the standard pairwise Gaussian affinities with affinities leaned from unlabeled data using a Siamese network. Additional improvement can be achieved by applying the network to code representations produced, e.g., by standard autoencoders. Our end-to-end learning procedure is fully unsupervised. In addition, we apply VC dimension theory to derive a lower bound on the size of SpectralNet. State-of-the-art clustering results are reported on the Reuters dataset. Our implementation is publicly available at //github.com/kstant0725/SpectralNet .