To understand the behavior of large dynamical systems like transportation networks, one must often rely on measurements transmitted by a set of sensors, for instance individual vehicles. Such measurements are likely to be incomplete and imprecise, which makes it hard to recover the underlying signal of interest.Hoping to quantify this phenomenon, we study the properties of a partially-observed state-space model. In our setting, the latent state $X$ follows a high-dimensional Vector AutoRegressive process $X_t = \theta X_{t-1} + \varepsilon_t$. Meanwhile, the observations $Y$ are given by a noise-corrupted random sample from the state $Y_t = \Pi_t X_t + \eta_t$. Several random sampling mechanisms are studied, allowing us to investigate the effect of spatial and temporal correlations in the distribution of the sampling matrices $\Pi_t$.We first prove a lower bound on the minimax estimation error for the transition matrix $\theta$. We then describe a sparse estimator based on the Dantzig selector and upper bound its non-asymptotic error, showing that it achieves the optimal convergence rate for most of our sampling mechanisms. Numerical experiments on simulated time series validate our theoretical findings, while an application to open railway data highlights the relevance of this model for public transport traffic analysis.
We consider the problem of federated learning in a one-shot setting in which there are $m$ machines, each observing $n$ samples function from an unknown distribution on non-convex loss functions. Let $F:[-1,1]^d\to\mathbb{R}$ be the expected loss function with respect to this unknown distribution. The goal is to find an estimate of the minimizer of $F$. Based on its observations, each machine generates a signal of bounded length $B$ and sends it to a server. The sever collects signals of all machines and outputs an estimate of the minimizer of $F$. We propose a distributed learning algorithm, called Multi-Resolution Estimator for Non-Convex loss function (MRE-NC), whose expected error is bounded by $\max\big(1/\sqrt{n}(mB)^{1/d}, 1/\sqrt{mn}\big)$, up to polylogarithmic factors. We also provide a matching lower bound on the performance of any algorithm, showing that MRE-NC is order optimal in terms of $n$ and $m$. Experiments on synthetic and real data show the effectiveness of MRE-NC in distributed learning of model's parameters for non-convex loss functions.
Time-to-event endpoints show an increasing popularity in phase II cancer trials. The standard statistical tool for such endpoints in one-armed trials is the one-sample log-rank test. It is widely known, that the asymptotic providing the correctness of this test does not come into effect to full extent for small sample sizes. There have already been some attempts to solve this problem. While some do not allow easy power and sample size calculations, others lack a clear theoretical motivation and require further considerations. The problem itself can partly be attributed to the dependence of the compensated counting process and its variance estimator. We provide a framework in which the variance estimator can be flexibly adopted to the present situation while maintaining its asymptotical properties. We exemplarily suggest a variance estimator which is uncorrelated to the compensated counting process. Furthermore, we provide sample size and power calculations for any approach fitting into our framework. Finally, we compare several methods via simulation studies and the hypothetical setup of a Phase II trial based on real world data.
Gaussian processes that can be decomposed into a smooth mean function and a stationary autocorrelated noise process are considered and a fully automatic nonparametric method to simultaneous estimation of mean and auto-covariance functions of such processes is developed. Our empirical Bayes approach is data-driven, numerically efficient and allows for the construction of confidence sets for the mean function. Performance is demonstrated in simulations and real data analysis. The method is implemented in the R package eBsc that accompanies the paper.
Solving semiparametric models can be computationally challenging because the dimension of parameter space may grow large with increasing sample size. Classical Newton's method becomes quite slow and unstable with intensive calculation of the large Hessian matrix and its inverse. Iterative methods separately update parameters for finite dimensional component and infinite dimensional component have been developed to speed up single iteration, but they often take more steps until convergence or even sometimes sacrifice estimation precision due to sub-optimal update direction. We propose a computationally efficient implicit profiling algorithm that achieves simultaneously the fast iteration step in iterative methods and the optimal update direction in the Newton's method by profiling out the infinite dimensional component as the function of the finite dimensional component. We devise a first order approximation when the profiling function has no explicit analytical form. We show that our implicit profiling method always solve any local quadratic programming problem in two steps. In two numerical experiments under semiparametric transformation models and GARCH-M models, we demonstrated the computational efficiency and statistical precision of our implicit profiling method.
The Chebyshev or $\ell_{\infty}$ estimator is an unconventional alternative to the ordinary least squares in solving linear regressions. It is defined as the minimizer of the $\ell_{\infty}$ objective function \begin{align*} \hat{\boldsymbol{\beta}} := \arg\min_{\boldsymbol{\beta}} \|\boldsymbol{Y} - \mathbf{X}\boldsymbol{\beta}\|_{\infty}. \end{align*} The asymptotic distribution of the Chebyshev estimator under fixed number of covariates were recently studied (Knight, 2020), yet finite sample guarantees and generalizations to high-dimensional settings remain open. In this paper, we develop non-asymptotic upper bounds on the estimation error $\|\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}^*\|_2$ for a Chebyshev estimator $\hat{\boldsymbol{\beta}}$, in a regression setting with uniformly distributed noise $\varepsilon_i\sim U([-a,a])$ where $a$ is either known or unknown. With relatively mild assumptions on the (random) design matrix $\mathbf{X}$, we can bound the error rate by $\frac{C_p}{n}$ with high probability, for some constant $C_p$ depending on the dimension $p$ and the law of the design. Furthermore, we illustrate that there exist designs for which the Chebyshev estimator is (nearly) minimax optimal. In addition we show that "Chebyshev's LASSO" has advantages over the regular LASSO in high dimensional situations, provided that the noise is uniform. Specifically, we argue that it achieves a much faster rate of estimation under certain assumptions on the growth rate of the sparsity level and the ambient dimension with respect to the sample size.
Quantile regression is a statistical method for estimating conditional quantiles of a response variable. In addition, for mean estimation, it is well known that quantile regression is more robust to outliers than $l_2$-based methods. By using the fused lasso penalty over a $K$-nearest neighbors graph, we propose an adaptive quantile estimator in a non-parametric setup. We show that the estimator attains optimal rate of $n^{-1/d}$ up to a logarithmic factor, under mild assumptions on the data generation mechanism of the $d$-dimensional data. We develop algorithms to compute the estimator and discuss methodology for model selection. Numerical experiments on simulated and real data demonstrate clear advantages of the proposed estimator over state of the art methods.
In this paper, we study the weight spectrum of linear codes with \emph{super-linear} field size and use the probabilistic method to show that for nearly all such codes, the corresponding weight spectrum is very close to that of a maximum distance separable (MDS) code.
In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.
Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.