Estimating characteristics of domains (referred to as small areas) within a population from sample surveys of the population is an important problem in survey statistics. In this paper, we consider model-based small area estimation under the nested error regression model. We discuss the construction of mixed model estimators (empirical best linear unbiased predictors, EBLUPs) of small area means and the conditional linear predictors of small area means. Under the asymptotic framework of increasing numbers of small areas and increasing numbers of units in each area, we establish asymptotic linearity results and central limit theorems for these estimators which allow us to establish asymptotic equivalences between estimators, approximate their sampling distributions, obtain simple expressions for and construct simple estimators of their asymptotic mean squared errors, and justify asymptotic prediction intervals. We present model-based simulations that show that in quite small, finite samples, our mean squared error estimator performs as well or better than the widely-used \cite{prasad1990estimation} estimator and is much simpler, so is easier to interpret. We also carry out a design-based simulation using real data on consumer expenditure on fresh milk products to explore the design-based properties of the mixed model estimators. We explain and interpret some surprising simulation results through analysis of the population and further design-based simulations. The simulations highlight important differences between the model- and design-based properties of mixed model estimators in small area estimation.
We consider autocovariance operators of a stationary stochastic process on a Polish space that is embedded into a reproducing kernel Hilbert space. We investigate how empirical estimates of these operators converge along realizations of the process under various conditions. In particular, we examine ergodic and strongly mixing processes and obtain several asymptotic results as well as finite sample error bounds. We provide applications of our theory in terms of consistency results for kernel PCA with dependent data and the conditional mean embedding of transition probabilities. Finally, we use our approach to examine the nonparametric estimation of Markov transition operators and highlight how our theory can give a consistency analysis for a large family of spectral analysis methods including kernel-based dynamic mode decomposition.
We identify the average dose-response function (ADRF) for a continuously valued error-contaminated treatment by a weighted conditional expectation. We then estimate the weights nonparametrically by maximising a local generalised empirical likelihood subject to an expanding set of conditional moment equations incorporated into the deconvolution kernels. Thereafter, we construct a deconvolution kernel estimator of ADRF. We derive the asymptotic bias and variance of our ADRF estimator and provide its asymptotic linear expansion, which helps conduct statistical inference. To select our smoothing parameters, we adopt the simulation-extrapolation method and propose a new extrapolation procedure to stabilise the computation. Monte Carlo simulations and a real data study illustrate our method's practical performance.
Neural networks with random weights appear in a variety of machine learning applications, most prominently as the initialization of many deep learning algorithms and as a computationally cheap alternative to fully learned neural networks. In the present article, we enhance the theoretical understanding of random neural networks by addressing the following data separation problem: under what conditions can a random neural network make two classes $\mathcal{X}^-, \mathcal{X}^+ \subset \mathbb{R}^d$ (with positive distance) linearly separable? We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. Crucially, the number of required neurons is explicitly linked to geometric properties of the underlying sets $\mathcal{X}^-, \mathcal{X}^+$ and their mutual arrangement. This instance-specific viewpoint allows us to overcome the usual curse of dimensionality (exponential width of the layers) in non-pathological situations where the data carries low-complexity structure. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity (based on a localized version of Gaussian mean width), which leads to sound and informative separation guarantees. We connect our result with related lines of work on approximation, memorization, and generalization.
In the present paper we prove a new theorem, resulting in an exact updating formula for linear regression model residuals to calculate the segmented cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-validation segment sizes and can be executed with high efficiency in parallel. The well-known formula for leave-one-out cross-validation follows as a special case of our theorem. In situations where the cross-validation segments consist of small groups of repeated measurements, we suggest a heuristic strategy for fast serial approximations of the cross-validated residuals and associated PRESS statistic. We also suggest strategies for quick estimation of the exact minimum PRESS value and full PRESS function over a selected interval of regularisation values. The computational effectiveness of the parameter selection for Ridge-/Tikhonov regression modelling resulting from our theoretical findings and heuristic arguments is demonstrated for several practical applications.
Big data mining is well known to be an important task for data science, because it can provide useful observations and new knowledge hidden in given large datasets. Proximity-based data analysis is particularly utilized in many real-life applications. In such analysis, the distances to k nearest neighbors are usually employed, thus its main bottleneck is derived from data retrieval. Much efforts have been made to improve the efficiency of these analyses. However, they still incur large costs, because they essentially need many data accesses. To avoid this issue, we propose a machine-learning technique that quickly and accurately estimates the k-NN distances (i.e., distances to the k nearest neighbors) of a given query. We train a fully connected neural network model and utilize pivots to achieve accurate estimation. Our model is designed to have useful advantages: it infers distances to the k-NNs at a time, its inference time is O(1) (no data accesses are incurred), but it keeps high accuracy. Our experimental results and case studies on real datasets demonstrate the efficiency and effectiveness of our solution.
We consider the problem of parameter estimation for a stochastic McKean-Vlasov equation, and the associated system of weakly interacting particles. We study two cases: one in which we observe multiple independent trajectories of the McKean-Vlasov SDE, and another in which we observe multiple particles from the interacting particle system. In each case, we begin by establishing consistency and asymptotic normality of the (approximate) offline maximum likelihood estimator, in the limit as the number of observations $N\rightarrow\infty$. We then propose an online maximum likelihood estimator, which is based on a continuous-time stochastic gradient ascent scheme with respect to the asymptotic log-likelihood of the interacting particle system. We characterise the asymptotic behaviour of this estimator in the limit as $t\rightarrow\infty$, and also in the joint limit as $t\rightarrow\infty$ and $N\rightarrow\infty$. In these two cases, we obtain a.s. or $\mathbb{L}_1$ convergence to the stationary points of a limiting contrast function, under suitable conditions which guarantee ergodicity and uniform-in-time propagation of chaos. We also establish, under the additional condition of global strong concavity, $\mathbb{L}_2$ convergence to the unique maximiser of the asymptotic log-likelihood of the McKean-Vlasov SDE, with an asymptotic convergence rate which depends on the learning rate, the number of observations, and the dimension of the non-linear process. Our theoretical results are supported by two numerical examples, a linear mean field model and a stochastic opinion dynamics model.
In the famous least sum of trimmed squares (LTS) of residuals estimator (Rousseeuw (1984)), residuals are first squared and then trimmed. In this article, we first trim residuals - using a depth trimming scheme - and then square the rest of residuals. The estimator that can minimize the sum of squares of the trimmed residuals, is called an LST estimator. It turns out that LST is a robust alternative to the classic least sum of squares (LS) estimator. Indeed, it has a very high finite sample breakdown point, and can resist, asymptotically, up to $50\%$ contamination without breakdown - in sharp contrast to the $0\%$ of the LS estimator. The population version of LST is Fisher consistent, and the sample version is strong and root-$n$ consistent and asymptotically normal. Approximate algorithms for computing LST are proposed and tested in synthetic and real data examples. These experiments indicate that one of the algorithms can compute the LST estimator very fast and with relatively smaller variances than the famous LTS estimator. All the evidence suggests that LST deserves to be a robust alternative to the LS estimator and is feasible in practice for high dimensional data sets (with possible contamination and outliers).
The purpose of this article is to develop a general parametric estimation theory that allows the derivation of the limit distribution of estimators in non-regular models where the true parameter value may lie on the boundary of the parameter space or where even identifiability fails. For that, we propose a more general local approximation of the parameter space (at the true value) than previous studies. This estimation theory is comprehensive in that it can handle penalized estimation as well as quasi-maximum likelihood estimation under such non-regular models. Besides, our results can apply to the so-called non-ergodic statistics, where the Fisher information is random in the limit, including the regular experiment that is locally asymptotically mixed normal. In penalized estimation, depending on the boundary constraint, even the Bridge estimator with $q<1$ does not necessarily give selection consistency. Therefore, some sufficient condition for selection consistency is described, precisely evaluating the balance between the boundary constraint and the form of the penalty. Examples handled in the paper are: (i) ML estimation of the generalized inverse Gaussian distribution, (ii) quasi-ML estimation of the diffusion parameter in a non-ergodic It\^o process whose parameter space consists of positive semi-definite symmetric matrices, while the drift parameter is treated as nuisance and (iii) penalized ML estimation of variance components of random effects in linear mixed models.
The optimal error estimate that depending only on the polynomial degree of $ \varepsilon^{-1}$ is established for the temporal semi-discrete scheme of the Cahn-Hilliard equation, which is based on the scalar auxiliary variable (SAV) formulation. The key to our analysis is to convert the structure of the SAV time-stepping scheme back to a form compatible with the original format of the Cahn-Hilliard equation, which makes it feasible to use spectral estimates to handle the nonlinear term. Based on the transformation of the SAV numerical scheme, the optimal error estimate for the temporal semi-discrete scheme which depends only on the low polynomial order of $\varepsilon^{-1}$ instead of the exponential order, is derived by using mathematical induction, spectral arguments, and the superconvergence properties of some nonlinear terms. Numerical examples are provided to illustrate the discrete energy decay property and validate our theoretical convergence analysis.
Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems. However, when the distribution is discrete, most common gradient estimators suffer from excessive variance. To improve the quality of gradient estimation, we introduce a variance reduction technique based on Stein operators for discrete distributions. We then use this technique to build flexible control variates for the REINFORCE leave-one-out estimator. Our control variates can be adapted online to minimize variance and do not require extra evaluations of the target function. In benchmark generative modeling tasks such as training binary variational autoencoders, our gradient estimator achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.