Consider $k$ independent random samples from $p$-dimensional multivariate normal distributions. We are interested in the limiting distribution of the log-likelihood ratio test statistics for testing for the equality of $k$ covariance matrices. It is well known from classical multivariate statistics that the limit is a chi-square distribution when $k$ and $p$ are fixed integers. Jiang and Yang~\cite{JY13} and Jiang and Qi~\cite{JQ15} have obtained the central limit theorem for the log-likelihood ratio test statistics when the dimension $p$ goes to infinity with the sample sizes. In this paper, we derive the central limit theorem when either $p$ or $k$ goes to infinity. We also propose adjusted test statistics which can be well approximated by chi-squared distributions regardless of values for $p$ and $k$. Furthermore, we present numerical simulation results to evaluate the performance of our adjusted test statistics and the log-likelihood ratio statistics based on classical chi-square approximation and the normal approximation.
This paper presents a novel strategy to decentralize the soft detection procedure in an uplink cell-free massive multiple-input-multiple-output network. We propose efficient approaches to compute the a posteriori probability-per-bit, exactly or approximately when having a sequential fronthaul. More precisely, each access point (AP) in the network computes partial sufficient statistics locally, fuses it with received partial statistics from another AP, and then forwards the result to the next AP. Once the sufficient statistics reach the central processing unit, it performs the soft demodulation by computing the log-likelihood ratio (LLR) per bit, and then a channel decoding algorithm (e.g., a Turbo decoder) is utilized to decode the bits. We derive the distributed computation of LLR analytically.
The asymptotic behaviour of Linear Spectral Statistics (LSS) of the smoothed periodogram estimator of the spectral coherency matrix of a complex Gaussian high-dimensional time series $(\y_n)_{n \in \mathbb{Z}}$ with independent components is studied under the asymptotic regime where the sample size $N$ converges towards $+\infty$ while the dimension $M$ of $\y$ and the smoothing span of the estimator grow to infinity at the same rate in such a way that $\frac{M}{N} \rightarrow 0$. It is established that, at each frequency, the estimated spectral coherency matrix is close from the sample covariance matrix of an independent identically $\mathcal{N}_{\mathbb{C}}(0,\I_M)$ distributed sequence, and that its empirical eigenvalue distribution converges towards the Marcenko-Pastur distribution. This allows to conclude that each LSS has a deterministic behaviour that can be evaluated explicitly. Using concentration inequalities, it is shown that the order of magnitude of the supremum over the frequencies of the deviation of each LSS from its deterministic approximation is of the order of $\frac{1}{M} + \frac{\sqrt{M}}{N}+ (\frac{M}{N})^{3}$ where $N$ is the sample size. Numerical simulations supports our results.
We show that the nonstandard limiting distribution of HAR test statistics under fixed-b asymptotics is not pivotal (even after studentization) when the data are nonstationarity. It takes the form of a complicated function of Gaussian processes and depends on the integrated local long-run variance and on on the second moments of the relevant series (e.g., of the regressors and errors for the case of the linear regression model). Hence, existing fixed-b inference methods based on stationarity are not theoretically valid in general. The nuisance parameters entering the fixed-b limiting distribution can be consistently estimated under small-b asymptotics but only with nonparametric rate of convergence. Hence, We show that the error in rejection probability (ERP) is an order of magnitude larger than that under stationarity and is also larger than that of HAR tests based on HAC estimators under conventional asymptotics. These theoretical results reconcile with recent finite-sample evidence in Casini (2021) and Casini, Deng and Perron (2021) who showing that fixed-b HAR tests can perform poorly when the data are nonstationary. They can be conservative under the null hypothesis and have non-monotonic power under the alternative hypothesis irrespective of how large the sample size is.
Given a target distribution $\mu \propto e^{-\mathcal{H}}$ to sample from with Hamiltonian $\mathcal{H}$, in this paper we propose and analyze new Metropolis-Hastings sampling algorithms that target an alternative distribution $\mu^f_{1,\alpha,c} \propto e^{-\mathcal{H}^{f}_{1,\alpha,c}}$, where $\mathcal{H}^{f}_{1,\alpha,c}$ is a landscape-modified Hamiltonian which we introduce explicitly. The advantage of the Metropolis dynamics which targets $\pi^f_{1,\alpha,c}$ is that it enjoys reduced critical height described by the threshold parameter $c$, function $f$, and a penalty parameter $\alpha \geq 0$ that controls the state-dependent effect. First, we investigate the case of fixed $\alpha$ and propose a self-normalized estimator that corrects for the bias of sampling and prove asymptotic convergence results and Chernoff-type bound of the proposed estimator. Next, we consider the case of annealing the penalty parameter $\alpha$. We prove strong ergodicity and bounds on the total variation mixing time of the resulting non-homogeneous chain subject to appropriate assumptions on the decay of $\alpha$. We illustrate the proposed algorithms by comparing their mixing times with the original Metropolis dynamics on statistical physics models including the ferromagnetic Ising model on the hypercube or the complete graph and the $q$-state Potts model on the two-dimensional torus. In these cases, the mixing times of the classical Glauber dynamics are at least exponential in the system size as the critical height grows at least linearly with the size, while the proposed annealing algorithm, with appropriate choice of $f$, $c$, and annealing schedule on $\alpha$, mixes rapidly with at most polynomial dependence on the size. The crux of the proof harnesses on the important observation that the reduced critical height can be bounded independently of the size that gives rise to rapid mixing.
A distributional symmetry is invariance of a distribution under a group of transformations. Exchangeability and stationarity are examples. We explain that a result of ergodic theory provides a law of large numbers: If the group satisfies suitable conditions, expectations can be estimated by averaging over subsets of transformations, and these estimators are strongly consistent. We show that, if a mixing condition holds, the averages also satisfy a central limit theorem, a Berry-Esseen bound, and concentration. These are extended further to apply to triangular arrays, to randomly subsampled averages, and to a generalization of U-statistics. As applications, we obtain new results on exchangeability, random fields, network models, and a class of marked point processes. We also establish asymptotic normality of the empirical entropy for a large class of processes. Some known results are recovered as special cases, and can hence be interpreted as an outcome of symmetry. The proofs adapt Stein's method.
Cell-free massive MIMO is one of the core technologies for future wireless networks. It is expected to bring enormous benefits, including ultra-high reliability, data throughput, energy efficiency, and uniform coverage. As a radically distributed system, the performance of cell-free massive MIMO critically relies on efficient distributed processing algorithms. In this paper, we propose a distributed expectation propagation (EP) detector for cell-free massive MIMO, which consists of two modules: a nonlinear module at the central processing unit (CPU) and a linear module at each access point (AP). The turbo principle in iterative channel decoding is utilized to compute and pass the extrinsic information between the two modules. An analytical framework is provided to characterize the asymptotic performance of the proposed EP detector with a large number of antennas. Furthermore, a distributed joint channel estimation and data detection (JCD) algorithm is developed to handle the practical setting with imperfect channel state information (CSI). Simulation results will show that the proposed method outperforms existing detectors for cell-free massive MIMO systems in terms of the bit-error rate and demonstrate that the developed theoretical analysis accurately predicts system performance. Finally, it is shown that with imperfect CSI, the proposed JCD algorithm improves the system performance significantly and enables non-orthogonal pilots to reduce the pilot overhead.
The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.
This paper studies distributed binary test of statistical independence under communication (information bits) constraints. While testing independence is very relevant in various applications, distributed independence test is particularly useful for event detection in sensor networks where data correlation often occurs among observations of devices in the presence of a signal of interest. By focusing on the case of two devices because of their tractability, we begin by investigating conditions on Type I error probability restrictions under which the minimum Type II error admits an exponential behavior with the sample size. Then, we study the finite sample-size regime of this problem. We derive new upper and lower bounds for the gap between the minimum Type II error and its exponential approximation under different setups, including restrictions imposed on the vanishing Type I error probability. Our theoretical results shed light on the sample-size regimes at which approximations of the Type II error probability via error exponents became informative enough in the sense of predicting well the actual error probability. We finally discuss an application of our results where the gap is evaluated numerically, and we show that exponential approximations are not only tractable but also a valuable proxy for the Type II probability of error in the finite-length regime.
We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.