We investigate the asymptotic distribution of the maximum of a frequency smoothed estimate of the spectral coherence of a M-variate complex Gaussian time series with mutually independent components when the dimension M and the number of samples N both converge to infinity. If B denotes the smoothing span of the underlying smoothed periodogram estimator, a type I extreme value limiting distribution is obtained under the rate assumptions M N $\rightarrow$ 0 and M B $\rightarrow$ c $\in$ (0, +$\infty$). This result is then exploited to build a statistic with controlled asymptotic level for testing independence between the M components of the observed time series. Numerical simulations support our results.
Discretization of the uniform norm of functions from a given finite dimensional subspace of continuous functions is studied. We pay special attention to the case of trigonometric polynomials with frequencies from an arbitrary finite set with fixed cardinality. We give two different proofs of the fact that for any $N$-dimensional subspace of the space of continuous functions it is sufficient to use $e^{CN}$ sample points for an accurate upper bound for the uniform norm. Previous known results show that one cannot improve on the exponential growth of the number of sampling points for a good discretization theorem in the uniform norm. Also, we prove a general result, which connects the upper bound on the number of sampling points in the discretization theorem for the uniform norm with the best $m$-term bilinear approximation of the Dirichlet kernel associated with the given subspace. We illustrate application of our technique on the example of trigonometric polynomials.
It is well known that the independence of the sample mean and the sample variance characterizes the normal distribution. By using Anosov's theorem, we further investigate the analogous characteristic properties in terms of the sample mean and some feasible definite statistics. The latter statistics introduced in this paper for the first time are based on nonnegative, definite and continuous functions of ordered arguments with positive degree of homogeneity. The proposed approach seems to be natural and can be used to derive easily characterization results for many feasible definite statistics, such as known characterizations involving the sample variance, sample range as well as Gini's mean difference.
We study the eigenvalue distributions for sums of independent rank-one $k$-fold tensor products of large $n$-dimensional vectors. Previous results in the literature assume that $k=o(n)$ and show that the eigenvalue distributions converge to the celebrated Mar\v{c}enko-Pastur law under appropriate moment conditions on the base vectors. In this paper, motivated by quantum information theory, we study the regime where $k$ grows faster, namely $k=O(n)$. We show that the moment sequences of the eigenvalue distributions have a limit, which is different from the Mar\v{c}enko-Pastur law. As a byproduct, we show that the Mar\v{c}enko-Pastur law limit holds if and only if $k=o(n)$ for this tensor model. The approach is based on the method of moments.
Spatially inhomogeneous functions, which may be smooth in some regions and rough in other regions, are modelled naturally in a Bayesian manner using so-called Besov priors which are given by random wavelet expansions with Laplace-distributed coefficients. This paper studies theoretical guarantees for such prior measures - specifically, we examine their frequentist posterior contraction rates in the setting of non-linear inverse problems with Gaussian white noise. Our results are first derived under a general local Lipschitz assumption on the forward map. We then verify the assumption for two non-linear inverse problems arising from elliptic partial differential equations, the Darcy flow model from geophysics as well as a model for the Schr\"odinger equation appearing in tomography. In the course of the proofs, we also obtain novel concentration inequalities for penalized least squares estimators with $\ell^1$ wavelet penalty, which have a natural interpretation as maximum a posteriori (MAP) estimators. The true parameter is assumed to belong to some spatially inhomogeneous Besov class $B^{\alpha}_{11}$, $\alpha>0$. In a setting with direct observations, we complement these upper bounds with a lower bound on the rate of contraction for arbitrary Gaussian priors. An immediate consequence of our results is that while Laplace priors can achieve minimax-optimal rates over $B^{\alpha}_{11}$-classes, Gaussian priors are limited to a (by a polynomial factor) slower contraction rate. This gives information-theoretical justification for the intuition that Laplace priors are more compatible with $\ell^1$ regularity structure in the underlying parameter.
Even the most carefully curated economic data sets have variables that are noisy, missing, discretized, or privatized. The standard workflow for empirical research involves data cleaning followed by data analysis that typically ignores the bias and variance consequences of data cleaning. We formulate a semiparametric model for causal inference with corrupted data to encompass both data cleaning and data analysis. We propose a new end-to-end procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency, Gaussian approximation, and semiparametric efficiency for our estimator of the causal parameter by finite sample arguments. The rate of Gaussian approximation is $n^{-1/2}$ for global parameters such as average treatment effect, and it degrades gracefully for local parameters such as heterogeneous treatment effect for a specific demographic. Our key assumption is that the true covariates are approximately low rank. In our analysis, we provide nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. We verify the coverage of the data cleaning-adjusted confidence intervals in simulations calibrated to resemble differential privacy as implemented in the 2020 US Census.
A solution manifold is the collection of points in a $d$-dimensional space satisfying a system of $s$ equations with $s<d$. Solution manifolds occur in several statistical problems including hypothesis testing, curved-exponential families, constrained mixture models, partial identifications, and nonparametric set estimation. We analyze solution manifolds both theoretically and algorithmically. In terms of theory, we derive five useful results: the smoothness theorem, the stability theorem (which implies the consistency of a plug-in estimator), the convergence of a gradient flow, the local center manifold theorem and the convergence of the gradient descent algorithm. To numerically approximate a solution manifold, we propose a Monte Carlo gradient descent algorithm. In the case of likelihood inference, we design a manifold constraint maximization procedure to find the maximum likelihood estimator on the manifold. We also develop a method to approximate a posterior distribution defined on a solution manifold.
Let $K$ be a $k$-dimensional simplicial complex having $n$ faces of dimension $k$ and $M$ a closed $(k-1)$-connected PL $2k$-dimensional manifold. We prove that for $k\ge3$ odd $K$ embeds into $M$ if and only if there are $\bullet$ a skew-symmetric $n\times n$-matrix $A$ with $\mathbb Z$-entries whose rank over $\mathbb Q$ does not exceed $rk H_k(M;\mathbb Z)$, $\bullet$ a general position PL map $f:K\to\mathbb R^{2k}$, and $\bullet$ a collection of orientations on $k$-faces of $K$ such that for any nonadjacent $k$-faces $\sigma,\tau$ of $K$ the element $A_{\sigma,\tau}$ equals to the algebraic intersection of $f\sigma$ and $f\tau$. We prove some analogues of this result including those for $\mathbb Z_2$- and $\mathbb Z$-embeddability. Our results generalize the Bikeev-Fulek-Kyn\v cl-Schaefer-Stefankovi\v c criteria for the $\mathbb Z_2$- and $\mathbb Z$-embeddability of graphs to surfaces, and are related to the Harris-Krushkal-Johnson-Pat\'ak-Tancer criteria for the embeddability of $k$-complexes into $2k$-manifolds.
This paper considers maximum likelihood (ML) estimation in a large class of models with hidden Markov regimes. We investigate consistency of the ML estimator and local asymptotic normality for the models under general conditions which allow for autoregressive dynamics in the observable process, Markov regime sequences with covariate-dependent transition matrices, and possible model misspecification. A Monte Carlo study examines the finite-sample properties of the ML estimator in correctly specified and misspecified models. An empirical application is also discussed.
One of the most pressing problems in modern analysis is the study of the growth rate of the norms of all possible matrix products $\|A_{i_{n}}\cdots A_{i_{0}}\|$ with factors from a set of matrices $\mathscr{A}$. So far, only for a relatively small number of classes of matrices $\mathscr{A}$ has it been possible to rigorously describe the sequences of matrices $\{A_{i_{n}}\}$ that guarantee the maximal growth rate of the corresponding norms. Moreover, in almost all theoretically studied cases, the index sequences $\{i_{n}\}$ of matrices maximizing the norms of the corresponding matrix products turned out to be periodic or so-called Sturmian sequences, which entails a whole set of "good" properties of the sequences $\{A_{i_{n}}\}$, in particular the existence of a limiting frequency of occurrence of each matrix factor $A_{i}\in\mathscr{A}$ in them. The paper determines a class of $2\times 2$ matrices consisting of two matrices similar to rotations of the plane in which the sequence $\{A_{i_{n}}\}$ maximizing the growth rate of the norms $\|A_{i_{n}}\cdots A_{i_{0}}\|$ is not Sturmian. All considerations are based on numerical modeling and cannot be considered mathematically rigorous in this part. Rather, they should be interpreted as a set of questions for further comprehensive theoretical analysis.
The analysis of the spectral features of a Toeplitz matrix-sequence $\left\{T_{n}(f)\right\}_{n\in\mathbb N}$, generated by a symbol $f\in L^1([-\pi,\pi])$, real-valued almost everywhere (a.e.), has been provided in great detail in the last century, as well as the study of the conditioning, when $f$ is nonnegative a.e. Here we consider a novel type of problem arising in the numerical approximation of distributed-order fractional differential equations (FDEs), where the matrices under consideration take the form \[ \mathcal{T}_{n}=c_0T_{n}(f_0)+c_{1} h^h T_{n}(f_{1})+c_{2} h^{2h} T_{n}(f_{2})+\cdots+c_{n-1} h^{(n-1)h}T_{n}(f_{n-1}), \] $c_0,c_{1},\ldots, c_{n-1} \in [c_*,c^*]$, $c^*\ge c_*>0$, independent of $n$, $h=\frac{1}{n}$, $f_j\sim g_j$, $g_j=|\theta|^{2-jh}$, $j=0,\ldots,n-1$. Since the resulting generating function depends on $n$, the standard theory cannot be applied and the analysis has to be performed using new ideas. Few selected numerical experiments are presented, also in connection with matrices that come from distributed-order FDE problems, and the adherence with the theoretical analysis is discussed together with open questions and future investigations.