In Stein's method, one can characterize probability distributions with differential operators. We use these characterizations to obtain a new class of point estimators for i.i.d.\ observations. These so-called Stein estimators satisfy the desirable classical properties such as consistency and asymptotic normality. As a consequence of the usually simple form of the operator, we obtain explicit estimators in cases where standard methods such as maximum likelihood estimation (MLE) require a numerical procedure to calculate the estimate. In addition, with our approach, one can choose from a large class of test functions which allows to improve significantly on the moment estimator. For several probability laws, we can determine an estimator that shows an asymptotic behaviour close to efficiency. Moreover, we retrieve data-dependent functions that result in asymptotically efficient estimators and give a sequence of explicit Stein estimators that converge to the MLE.
We consider the framework of penalized estimation where the penalty term is given by a real-valued polyhedral gauge, which encompasses methods such as LASSO (and many variants thereof such as the generalized LASSO), SLOPE, OSCAR, PACS and others. Each of these estimators can uncover a different structure or ``pattern'' of the unknown parameter vector. We define a general notion of patterns based on subdifferentials and formalize an approach to measure their complexity. For pattern recovery, we provide a minimal condition for a particular pattern to be detected by the procedure with positive probability, the so-called accessibility condition. Using our approach, we also introduce the stronger noiseless recovery condition. For the LASSO, it is well known that the irrepresentability condition is necessary for pattern recovery with probability larger than $1/2$ and we show that the noiseless recovery plays exactly the same role, thereby extending and unifying the irrepresentability condition of the LASSO to a broad class of penalized estimators. We show that the noiseless recovery condition can be relaxed when turning to thresholded penalized estimators, extending the idea of the thresholded LASSO: we prove that the accessibility condition is already sufficient (and necessary) for sure pattern recovery by thresholded penalized estimation provided that the signal of the pattern is large enough. Throughout the article, we demonstrate how our findings can be interpreted through a geometrical lens.
This paper is concerned with goal-oriented a posteriori error estimation for nonlinear functionals in the context of nonlinear variational problems solved with continuous Galerkin finite element discretizations. A two-level, or discrete, adjoint-based approach for error estimation is considered. The traditional method to derive an error estimate in this context requires linearizing both the nonlinear variational form and the nonlinear functional of interest which introduces linearization errors into the error estimate. In this paper, we investigate these linearization errors. In particular, we develop a novel discrete goal-oriented error estimate that accounts for traditionally neglected nonlinear terms at the expense of greater computational cost. We demonstrate how this error estimate can be used to drive mesh adaptivity. We show that accounting for linearization errors in the error estimate can improve its effectivity for several nonlinear model problems and quantities of interest. We also demonstrate that an adaptive strategy based on the newly proposed estimate can lead to more accurate approximations of the nonlinear functional with fewer degrees of freedom when compared to uniform refinement and traditional adjoint-based approaches.
This work presents a non-parametric estimator for the cumulative distribution function (CDF) of the jump-size distribution for a storage system with compound Poisson input. The workload process is observed according to an independent Poisson sampling process. The nonparametric estimator is constructed by first estimating the characteristic function (CF) and then applying an inversion formula. The convergence rate of the CF estimator at $s$ is shown to be of the order of $s^2/n$, where $n$ is the sample size. This convergence rate is leveraged to explore the bias-variance tradeoff of the inversion estimator. It is demonstrated that within a certain class of continuous distributions, the risk, in terms of MSE, is uniformly bounded by $C n^{-\frac{\eta}{1+\eta}}$, where $C$ is a positive constant and the parameter $\eta>0$ depends on the smoothness of the underlying class of distributions. A heuristic method is further developed to address the case of an unknown rate of the compound Poisson input process.
We adopt a maximum-likelihood framework to estimate parameters of a stochastic susceptible-infected-recovered (SIR) model with contact tracing on a rooted random tree. Given the number of detectees per index case, our estimator allows to determine the degree distribution of the random tree as well as the tracing probability. Since we do not discover all infectees via contact tracing, this estimation is non-trivial. To keep things simple and stable, we develop an approximation suited for realistic situations (contract tracing probability small, or the probability for the detection of index cases small). In this approximation, the only epidemiological parameter entering the estimator is $R_0$. The estimator is tested in a simulation study and is furthermore applied to covid-19 contact tracing data from India. The simulation study underlines the efficiency of the method. For the empirical covid-19 data, we compare different degree distributions and perform a sensitivity analysis. We find that particularly a power-law and a negative binomial degree distribution fit the data well and that the tracing probability is rather large. The sensitivity analysis shows no strong dependency of the estimates on the reproduction number. Finally, we discuss the relevance of our findings.
We consider detecting the evolutionary oscillatory pattern of a signal when it is contaminated by non-stationary noises with complexly time-varying data generating mechanism. A high-dimensional dense progressive periodogram test is proposed to accurately detect all oscillatory frequencies. A further phase-adjusted local change point detection algorithm is applied in the frequency domain to detect the locations at which the oscillatory pattern changes. Our method is shown to be able to detect all oscillatory frequencies and the corresponding change points within an accurate range with a prescribed probability asymptotically. This study is motivated by oscillatory frequency estimation and change point detection problems encountered in physiological time series analysis. An application to spindle detection and estimation in sleep EEG data is used to illustrate the usefulness of the proposed methodology. A Gaussian approximation scheme and an overlapping-block multiplier bootstrap methodology for sums of complex-valued high dimensional non-stationary time series without variance lower bounds are established, which could be of independent interest.
There are multiple cluster randomised trial designs that vary in when the clusters cross between control and intervention states, when observations are made within clusters, and how many observations are made at that time point. Identifying the most efficient study design is complex though, owing to the correlation between observations within clusters and over time. In this article, we present a review of statistical and computational methods for identifying optimal cluster randomised trial designs. We also adapt methods from the experimental design literature for experimental designs with correlated observations to the cluster trial context. We identify three broad classes of methods: using exact formulae for the treatment effect estimator variance for specific models to derive algorithms or weights for cluster sequences; generalised methods for estimating weights for experimental units; and, combinatorial optimisation algorithms to select an optimal subset of experimental units. We also discuss methods for rounding weights to whole numbers of clusters and extensions to non-Gaussian models. We present results from multiple cluster trial examples that compare the different methods, including problems involving determining optimal allocation of clusters across a set of cluster sequences, and selecting the optimal number of single observations to make in each cluster-period for both Gaussian and non-Gaussian models, and including exchangeable and exponential decay covariance structures.
Manifold learning is a central task in modern statistics and data science. Many datasets (cells, documents, images, molecules) can be represented as point clouds embedded in a high dimensional ambient space, however the degrees of freedom intrinsic to the data are usually far fewer than the number of ambient dimensions. The task of detecting a latent manifold along which the data are embedded is a prerequisite for a wide family of downstream analyses. Real-world datasets are subject to noisy observations and sampling, so that distilling information about the underlying manifold is a major challenge. We propose a method for manifold learning that utilises a symmetric version of optimal transport with a quadratic regularisation that constructs a sparse and adaptive affinity matrix, that can be interpreted as a generalisation of the bistochastic kernel normalisation. We prove that the resulting kernel is consistent with a Laplace-type operator in the continuous limit, establish robustness to heteroskedastic noise and exhibit these results in simulations. We identify a highly efficient computational scheme for computing this optimal transport for discrete data and demonstrate that it outperforms competing methods in a set of examples.
In this paper, we study a priori error estimates for the finite element approximation of the nonlinear Schr\"{o}dinger-Poisson model. The electron density is defined by an infinite series over all eigenvalues of the Hamiltonian operator. To establish the error estimate, we present a unified theory of error estimates for a class of nonlinear problems. The theory is based on three conditions: 1) the original problem has a solution $u$ which is the fixed point of a compact operator $\Ca$, 2) $\Ca$ is Fr\'{e}chet-differentiable at $u$ and $\Ci-\Ca'[u]$ has a bounded inverse in a neighborhood of $u$, and 3) there exists an operator $\Ca_h$ which converges to $\Ca$ in the neighborhood of $u$. The theory states that $\Ca_h$ has a fixed point $u_h$ which solves the approximate problem. It also gives the error estimate between $u$ and $u_h$, without assumptions on the well-posedness of the approximate problem. We apply the unified theory to the finite element approximation of the Schr\"{o}dinger-Poisson model and obtain optimal error estimate between the numerical solution and the exact solution. Numerical experiments are presented to verify the convergence rates of numerical solutions.
In this article we perform an asymptotic analysis of parallel Bayesian logspline density estimators. Such estimators are useful for the analysis of datasets that are partitioned into subsets and stored in separate databases without the capability of accessing the full dataset from a single computer. The parallel estimator we introduce is in the spirit of a kernel density estimator introduced in recent studies. We provide a numerical procedure that produces the normalized density estimator itself in place of the sampling algorithm. We then derive an error bound for the mean integrated squared error of the full dataset posterior estimator. The error bound depends upon the parameters that arise in logspline density estimation and the numerical approximation procedure. In our analysis, we identify the choices for the parameters that result in the error bound scaling optimally in relation to the number of samples. This provides our method with increased estimation accuracy, while also minimizing the computational cost.
Choice Modeling is at the core of many economics, operations, and marketing problems. In this paper, we propose a fundamental characterization of choice functions that encompasses a wide variety of extant choice models. We demonstrate how nonparametric estimators like neural nets can easily approximate such functionals and overcome the curse of dimensionality that is inherent in the non-parametric estimation of choice functions. We demonstrate through extensive simulations that our proposed functionals can flexibly capture underlying consumer behavior in a completely data-driven fashion and outperform traditional parametric models. As demand settings often exhibit endogenous features, we extend our framework to incorporate estimation under endogenous features. Further, we also describe a formal inference procedure to construct valid confidence intervals on objects of interest like price elasticity. Finally, to assess the practical applicability of our estimator, we utilize a real-world dataset from S. Berry, Levinsohn, and Pakes (1995). Our empirical analysis confirms that the estimator generates realistic and comparable own- and cross-price elasticities that are consistent with the observations reported in the existing literature.