In this paper we study the asymptotic theory for spectral analysis of stationary random fields, including linear and nonlinear fields. Asymptotic properties of Fourier coefficients and periodograms, including limiting distributions of Fourier coefficients, and the uniform consistency of kernel spectral density estimators are obtained under various mild conditions on moments and dependence structures. The validity of the aforementioned asymptotic results for estimated spatial fields is also established.
Time series forecasting is an important problem across many domains, playing a crucial role in multiple real-world applications. In this paper, we propose a forecasting architecture that combines deep autoregressive models with a Spectral Attention (SA) module, which merges global and local frequency domain information in the model's embedded space. By characterizing in the spectral domain the embedding of the time series as occurrences of a random process, our method can identify global trends and seasonality patterns. Two spectral attention models, global and local to the time series, integrate this information within the forecast and perform spectral filtering to remove time series's noise. The proposed architecture has a number of useful properties: it can be effectively incorporated into well-know forecast architectures, requiring a low number of parameters and producing interpretable results that improve forecasting accuracy. We test the Spectral Attention Autoregressive Model (SAAM) on several well-know forecast datasets, consistently demonstrating that our model compares favorably to state-of-the-art approaches.
Inferential models (IMs) are data-dependent, probability-like structures designed to quantify uncertainty about unknowns. As the name suggests, the focus has been on uncertainty quantification for inference, and on establishing a validity property that ensures the IM is reliable in a specific sense. The present paper develops an IM framework for decision problems and, in particular, investigates the decision-theoretic implications of the aforementioned validity property. I show that a valid IM's assessment of an action's quality, defined by a Choquet integral, will not be too optimistic compared to that of an oracle. This ensures that a valid IM tends not to favor actions that the oracle doesn't also favor, hence a valid IM is reliable for decision-making too. In a certain special class of structured statistical models, further connections can be made between the valid IM's favored actions and those favored by other more familiar frameworks, from which certain optimality conclusions can be drawn. An important step in these decision-theoretic developments is a characterization of the valid IM's credal set in terms of confidence distributions, which may be of independent interest.
This paper analyzes how the distortion created by hardware impairments in a multiple-antenna base station affects the uplink spectral efficiency (SE), with focus on Massive MIMO. This distortion is correlated across the antennas, but has been often approximated as uncorrelated to facilitate (tractable) SE analysis. To determine when this approximation is accurate, basic properties of distortion correlation are first uncovered. Then, we separately analyze the distortion correlation caused by third-order non-linearities and by quantization. Finally, we study the SE numerically and show that the distortion correlation can be safely neglected in Massive MIMO when there are sufficiently many users. Under i.i.d. Rayleigh fading and equal signal-to-noise ratios (SNRs), this occurs for more than five transmitting users. Other channel models and SNR variations have only minor impact on the accuracy. We also demonstrate the importance of taking the distortion characteristics into account in the receive combining.
Matrix valued data has become increasingly prevalent in many applications. Most of the existing clustering methods for this type of data are tailored to the mean model and do not account for the dependence structure of the features, which can be very informative, especially in high-dimensional settings. To extract the information from the dependence structure for clustering, we propose a new latent variable model for the features arranged in matrix form, with some unknown membership matrices representing the clusters for the rows and columns. Under this model, we further propose a class of hierarchical clustering algorithms using the difference of a weighted covariance matrix as the dissimilarity measure. Theoretically, we show that under mild conditions, our algorithm attains clustering consistency in the high-dimensional setting. While this consistency result holds for our algorithm with a broad class of weighted covariance matrices, the conditions for this result depend on the choice of the weight. To investigate how the weight affects the theoretical performance of our algorithm, we establish the minimax lower bound for clustering under our latent variable model. Given these results, we identify the optimal weight in the sense that using this weight guarantees our algorithm to be minimax rate-optimal in terms of the magnitude of some cluster separation metric. The practical implementation of our algorithm with the optimal weight is also discussed. Finally, we conduct simulation studies to evaluate the finite sample performance of our algorithm and apply the method to a genomic dataset.
We propose a new wavelet-based method for density estimation when the data are size-biased. More specifically, we consider the power of the density of interest, where this power is some value greater than or equal to half. Warped wavelet bases are employed, where warping is attained by some continuous cumulative distribution function. This can be seen as a general framework for which the conventional orthonormal wavelet estimation is the case with the standard uniform c.d.f. We show that both linear and nonlinear wavelet estimators are consistent, with optimal and/or near-optimal rates. Monte Carlo simulations are performed to compare four special set-ups which are easy to interpret in practice. A real dataset application illustrates the method. We observe that warped bases provide more flexible and better estimates for both simulated and real data. Moreover, we can see that estimating the density power (for instance, its square root) further improves results.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.
We propose a Bayesian convolutional neural network built upon Bayes by Backprop and elaborate how this known method can serve as the fundamental construct of our novel, reliable variational inference method for convolutional neural networks. First, we show how Bayes by Backprop can be applied to convolutional layers where weights in filters have probability distributions instead of point-estimates; and second, how our proposed framework leads with various network architectures to performances comparable to convolutional neural networks with point-estimates weights. In the past, Bayes by Backprop has been successfully utilised in feedforward and recurrent neural networks, but not in convolutional ones. This work symbolises the extension of the group of Bayesian neural networks which encompasses all three aforementioned types of network architectures now.
Many problems on signal processing reduce to nonparametric function estimation. We propose a new methodology, piecewise convex fitting (PCF), and give a two-stage adaptive estimate. In the first stage, the number and location of the change points is estimated using strong smoothing. In the second stage, a constrained smoothing spline fit is performed with the smoothing level chosen to minimize the MSE. The imposed constraint is that a single change point occurs in a region about each empirical change point of the first-stage estimate. This constraint is equivalent to requiring that the third derivative of the second-stage estimate has a single sign in a small neighborhood about each first-stage change point. We sketch how PCF may be applied to signal recovery, instantaneous frequency estimation, surface reconstruction, image segmentation, spectral estimation and multivariate adaptive regression.
High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.
Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data points into the eigenspace of their associated graph Laplacian matrix and subsequently clusters them. We train SpectralNet using a procedure that involves constrained stochastic optimization. Stochastic optimization allows it to scale to large datasets, while the constraints, which are implemented using a special-purpose output layer, allow us to keep the network output orthogonal. Moreover, the map learned by SpectralNet naturally generalizes the spectral embedding to unseen data points. To further improve the quality of the clustering, we replace the standard pairwise Gaussian affinities with affinities leaned from unlabeled data using a Siamese network. Additional improvement can be achieved by applying the network to code representations produced, e.g., by standard autoencoders. Our end-to-end learning procedure is fully unsupervised. In addition, we apply VC dimension theory to derive a lower bound on the size of SpectralNet. State-of-the-art clustering results are reported on the Reuters dataset. Our implementation is publicly available at //github.com/kstant0725/SpectralNet .