We consider the $\mathcal{H}^2$-formatted compression and computational estimation of covariance functions on a compact set in $\mathbb{R}^d$. The classical sample covariance or Monte Carlo estimator is prohibitively expensive for many practically relevant problems, where often approximation spaces with many degrees of freedom and many samples for the estimator are needed. In this article, we propose and analyze a data sparse multilevel sample covariance estimator, i.e., a multilevel Monte Carlo estimator. For this purpose, we generalize the notion of asymptotically smooth kernel functions to a Gevrey type class of kernels for which we derive new variable-order $\mathcal{H}^2$-approximation rates. These variable-order $\mathcal{H}^2$-approximations can be considered as a variant of $hp$-approximations. Our multilevel sample covariance estimator then uses an approximate multilevel hierarchy of variable-order $\mathcal{H}^2$-approximations to compress the sample covariances on each level. The non-nestedness of the different levels makes the reduction to the final estimator nontrivial and we present a suitable algorithm which can handle this task in linear complexity. This allows for a data sparse multilevel estimator of Gevrey covariance kernel functions in the best possible complexity for Monte Carlo type multilevel estimators, which is quadratic. Numerical examples which estimate covariance matrices with tens of billions of entries are presented.
The concepts of sparsity, and regularised estimation, have proven useful in many high-dimensional statistical applications. Dynamic factor models (DFMs) provide a parsimonious approach to modelling high-dimensional time series, however, it is often hard to interpret the meaning of the latent factors. This paper formally introduces a class of sparse DFMs whereby the loading matrices are constrained to have few non-zero entries, thus increasing interpretability of factors. We present a regularised M-estimator for the model parameters, and construct an efficient expectation maximisation algorithm to enable estimation. Synthetic experiments demonstrate consistency in terms of estimating the loading structure, and superior predictive performance where a low-rank factor structure may be appropriate. The utility of the method is further illustrated in an application forecasting electricity consumption across a large set of smart meters.
To derive the auto-covariance function from a sampled and time-limited signal or the cross-covariance function from two such signals, the mean values must be estimated and removed from the signals. If no a priori information about the correct mean values is available and the mean values must be derived from the time series themselves, the estimates will be biased. For the estimation of the variance from independent data the appropriate correction is widely known as Bessel's correction. Similar corrections for the auto-covariance and for the cross-covariance functions are shown here, including individual weighting of the samples. The corrected estimates then can be used to correct also the variance estimate in the case of correlated data. The programs used here are available online at //sigproc.nambis.de/programs.
Finding meaningful ways to measure the statistical dependency between random variables $\xi$ and $\zeta$ is a timeless statistical endeavor. In recent years, several novel concepts, like the distance covariance, have extended classical notions of dependency to more general settings. In this article, we propose and study an alternative framework that is based on optimal transport. The transport dependency $\tau \ge 0$ applies to general Polish spaces and intrinsically respects metric properties. For suitable ground costs, independence is fully characterized by $\tau = 0$. Via proper normalization of $\tau$, three transport correlations $\rho_\alpha$, $\rho_\infty$, and $\rho_*$ with values in $[0, 1]$ are defined. They attain the value $1$ if and only if $\zeta = \varphi(\xi)$, where $\varphi$ is an $\alpha$-Lipschitz function for $\rho_\alpha$, a measurable function for $\rho_\infty$, or a multiple of an isometry for $\rho_*$. The transport dependency can be estimated consistently by an empirical plug-in approach, but alternative estimators with the same convergence rate but significantly reduced computational costs are also proposed. Numerical results suggest that $\tau$ robustly recovers dependency between data sets with different internal metric structures. The usage for inferential tasks, like transport dependency based independence testing, is illustrated on a data set from a cancer study.
We consider the problem of state estimation from $m$ linear measurements, where the state $u$ to recover is an element of the manifold $\mathcal{M}$ of solutions of a parameter-dependent equation. The state is estimated using a prior knowledge on $\mathcal{M}$ coming from model order reduction. Variational approaches based on linear approximation of $\mathcal{M}$, such as PBDW, yields a recovery error limited by the Kolmogorov $m$-width of $\mathcal{M}$. To overcome this issue, piecewise-affine approximations of $\mathcal{M}$ have also be considered, that consist in using a library of linear spaces among which one is selected by minimizing some distance to $\mathcal{M}$. In this paper, we propose a state estimation method relying on dictionary-based model reduction, where a space is selected from a library generated by a dictionary of snapshots, using a distance to the manifold. The selection is performed among a set of candidate spaces obtained from the path of a $\ell_1$-regularized least-squares problem. Then, in the framework of parameter-dependent operator equations (or PDEs) with affine parameterizations, we provide an efficient offline-online decomposition based on randomized linear algebra, that ensures efficient and stable computations while preserving theoretical guarantees.
Statistical inference for stochastic processes based on high-frequency observations has been an active research area for more than two decades. One of the most well-known and widely studied problems is the estimation of the quadratic variation of the continuous component of an It\^o semimartingale with jumps. Several rate- and variance-efficient estimators have been proposed in the literature when the jump component is of bounded variation. However, to date, very few methods can deal with jumps of unbounded variation. By developing new high-order expansions of the truncated moments of a locally stable L\'evy process, we construct a new rate- and variance-efficient volatility estimator for a class of It\^o semimartingales whose jumps behave locally like those of a stable L\'evy process with Blumenthal-Getoor index $Y\in (1,8/5)$ (hence, of unbounded variation). The proposed method is based on a two-step debiasing procedure for the truncated realized quadratic variation of the process. Our Monte Carlo experiments indicate that the method outperforms other efficient alternatives in the literature in the setting covered by our theoretical framework.
We consider network games where a large number of agents interact according to a network sampled from a random network model, represented by a graphon. By exploiting previous results on convergence of such large network games to graphon games, we examine a procedure for estimating unknown payoff parameters, from observations of equilibrium actions, without the need for exact network information. We prove smoothness and local convexity of the optimization problem involved in computing the proposed estimator. Additionally, under a notion of graphon parameter identifiability, we show that the optimal estimator is globally unique. We present several examples of identifiable homogeneous and heterogeneous parameters in different classes of linear quadratic network games with numerical simulations to validate the proposed estimator.
The Wasserstein distance between mixing measures has come to occupy a central place in the statistical analysis of mixture models. This work proposes a new canonical interpretation of this distance and provides tools to perform inference on the Wasserstein distance between mixing measures in topic models. We consider the general setting of an identifiable mixture model consisting of mixtures of distributions from a set $\mathcal{A}$ equipped with an arbitrary metric $d$, and show that the Wasserstein distance between mixing measures is uniquely characterized as the most discriminative convex extension of the metric $d$ to the set of mixtures of elements of $\mathcal{A}$. The Wasserstein distance between mixing measures has been widely used in the study of such models, but without axiomatic justification. Our results establish this metric to be a canonical choice. Specializing our results to topic models, we consider estimation and inference of this distance. Though upper bounds for its estimation have been recently established elsewhere, we prove the first minimax lower bounds for the estimation of the Wasserstein distance in topic models. We also establish fully data-driven inferential tools for the Wasserstein distance in the topic model context. Our results apply to potentially sparse mixtures of high-dimensional discrete probability distributions. These results allow us to obtain the first asymptotically valid confidence intervals for the Wasserstein distance in topic models.
Physics-based covariance models provide a systematic way to construct covariance models that are consistent with the underlying physical laws in Gaussian process analysis. The unknown parameters in the covariance models can be estimated using maximum likelihood estimation, but direct construction of the covariance matrix and classical strategies of computing with it requires $n$ physical model runs, $n^2$ storage complexity, and $n^3$ computational complexity. To address such challenges, we propose to approximate the discretized covariance function using hierarchical matrices. By utilizing randomized range sketching for individual off-diagonal blocks, the construction process of the hierarchical covariance approximation requires $O(\log{n})$ physical model applications and the maximum likelihood computations require $O(n\log^2{n})$ effort per iteration. We propose a new approach to compute exactly the trace of products of hierarchical matrices which results in the expected Fischer information matrix being computable in $O(n\log^2{n})$ as well. The construction is totally matrix-free and the derivatives of the covariance matrix can then be approximated in the same hierarchical structure by differentiating the whole process. Numerical results are provided to demonstrate the effectiveness, accuracy, and efficiency of the proposed method for parameter estimations and uncertainty quantification.
Flexible Bayesian models are typically constructed using limits of large parametric models with a multitude of parameters that are often uninterpretable. In this article, we offer a novel alternative by constructing an exponentially tilted empirical likelihood carefully designed to concentrate near a parametric family of distributions of choice with respect to a novel variant of the Wasserstein metric, which is then combined with a prior distribution on model parameters to obtain a robustified posterior. The proposed approach finds applications in a wide variety of robust inference problems, where we intend to perform inference on the parameters associated with the centering distribution in presence of outliers. Our proposed transport metric enjoys great computational simplicity, exploiting the Sinkhorn regularization for discrete optimal transport problems, and being inherently parallelizable. We demonstrate superior performance of our methodology when compared against state-of-the-art robust Bayesian inference methods. We also demonstrate equivalence of our approach with a nonparametric Bayesian formulation under a suitable asymptotic framework, testifying to its flexibility. The constrained entropy maximization that sits at the heart of our likelihood formulation finds its utility beyond robust Bayesian inference; an illustration is provided in a trustworthy machine learning application.
A new nonparametric estimator for Toeplitz covariance matrices is proposed. This estimator is based on a data transformation that translates the problem of Toeplitz covariance matrix estimation to the problem of mean estimation in an approximate Gaussian regression. The resulting Toeplitz covariance matrix estimator is positive definite by construction, fully data-driven and computationally very fast. Moreover, this estimator is shown to be minimax optimal under the spectral norm for a large class of Toeplitz matrices. These results are readily extended to estimation of inverses of Toeplitz covariance matrices. Also, an alternative version of the Whittle likelihood for the spectral density based on the Discrete Cosine Transform (DCT) is proposed. The method is implemented in the R package vstdct that accompanies the paper.