In this study, we develop a novel estimation method for quantile treatment effects (QTE) under rank invariance and rank stationarity assumptions. Ishihara (2020) explores identification of the nonseparable panel data model under these assumptions and proposes a parametric estimation based on the minimum distance method. However, when the dimensionality of the covariates is large, the minimum distance estimation using this process is computationally demanding. To overcome this problem, we propose a two-step estimation method based on the quantile regression and minimum distance methods. We then show the uniform asymptotic properties of our estimator and the validity of the nonparametric bootstrap. The Monte Carlo studies indicate that our estimator performs well in finite samples. Finally, we present two empirical illustrations, to estimate the distributional effects of insurance provision on household production and TV watching on child cognitive development.
We consider the problem of finding tuned regularized parameter estimators for linear models. We start by showing that three known optimal linear estimators belong to a wider class of estimators that can be formulated as a solution to a weighted and constrained minimization problem. The optimal weights, however, are typically unknown in many applications. This begs the question, how should we choose the weights using only the data? We propose using the covariance fitting SPICE-methodology to obtain data-adaptive weights and show that the resulting class of estimators yields tuned versions of known regularized estimators - such as ridge regression, LASSO, and regularized least absolute deviation. These theoretical results unify several important estimators under a common umbrella. The resulting tuned estimators are also shown to be practically relevant by means of a number of numerical examples.
Continuous determinantal point processes (DPPs) are a class of repulsive point processes on $\mathbb{R}^d$ with many statistical applications. Although an explicit expression of their density is known, it is too complicated to be used directly for maximum likelihood estimation. In the stationary case, an approximation using Fourier series has been suggested, but it is limited to rectangular observation windows and no theoretical results support it. In this contribution, we investigate a different way to approximate the likelihood by looking at its asymptotic behaviour when the observation window grows towards $\mathbb{R}^d$. This new approximation is not limited to rectangular windows, is faster to compute than the previous one, does not require any tuning parameter, and some theoretical justifications are provided. It moreover provides an explicit formula for estimating the asymptotic variance of the associated estimator. The performances are assessed in a simulation study on standard parametric models on $\mathbb{R}^d$ and compare favourably to common alternative estimation methods for continuous DPPs.
Multilevel regression and poststratification (MRP) is a flexible modeling technique that has been used in a broad range of small-area estimation problems. Traditionally, MRP studies have been focused on non-causal settings, where estimating a single population value using a nonrepresentative sample was of primary interest. In this manuscript, MRP-style estimators will be evaluated in an experimental causal inference setting. We simulate a large-scale randomized control trial with a stratified cluster sampling design, and compare traditional and nonparametric treatment effect estimation methods with MRP methodology. Using MRP-style estimators, treatment effect estimates for areas as small as 1.3$\%$ of the population have lower bias and variance than standard causal inference methods, even in the presence of treatment effect heterogeneity. The design of our simulation studies also requires us to build upon a MRP variant that allows for non-census covariates to be incorporated into poststratification.
In this paper, we present numerical procedures to compute solutions of partial differential equations posed on fractals. In particular, we consider the strong form of the equation using standard graph Laplacian matrices and also weak forms of the equation derived using standard length or area measure on a discrete approximation of the fractal set. We then introduce a numerical procedure to normalize the obtained diffusions, that is, a way to compute the renormalization constant needed in the definitions of the actual partial differential equation on the fractal set. A particular case that is studied in detail is the solution of the Dirichlet problem in the Sierpinski triangle. Other examples are also presented including a non-planar Hata tree.
This paper studies the inference of the regression coefficient matrix under multivariate response linear regressions in the presence of hidden variables. A novel procedure for constructing confidence intervals of entries of the coefficient matrix is proposed. Our method first utilizes the multivariate nature of the responses by estimating and adjusting the hidden effect to construct an initial estimator of the coefficient matrix. By further deploying a low-dimensional projection procedure to reduce the bias introduced by the regularization in the previous step, a refined estimator is proposed and shown to be asymptotically normal. The asymptotic variance of the resulting estimator is derived with closed-form expression and can be consistently estimated. In addition, we propose a testing procedure for the existence of hidden effects and provide its theoretical justification. Both our procedures and their analyses are valid even when the feature dimension and the number of responses exceed the sample size. Our results are further backed up via extensive simulations and a real data analysis.
The Student-$t$ distribution is widely used in statistical modeling of datasets involving outliers since its longer-than-normal tails provide a robust approach to hand such data. Furthermore, data collected over time may contain censored or missing observations, making it impossible to use standard statistical procedures. This paper proposes an algorithm to estimate the parameters of a censored linear regression model when the regression errors are autocorrelated and the innovations follow a Student-$t$ distribution. To fit the proposed model, maximum likelihood estimates are obtained throughout the SAEM algorithm, which is a stochastic approximation of the EM algorithm useful for models in which the E-step does not have an analytic form. The methods are illustrated by the analysis of a real dataset that has left-censored and missing observations. We also conducted two simulations studies to examine the asymptotic properties of the estimates and the robustness of the model.
We consider parametric estimation and tests for multi-dimensional diffusion processes with a small dispersion parameter $\varepsilon$ from discrete observations. For parametric estimation of diffusion processes, the main target is to estimate the drift parameter and the diffusion parameter. In this paper, we propose two types of adaptive estimators for both parameters and show their asymptotic properties under $\varepsilon\to0$, $n\to\infty$ and the balance condition that $(\varepsilon n^\rho)^{-1} =O(1)$ for some $\rho>0$. Using these adaptive estimators, we also introduce consistent adaptive testing methods and prove that test statistics for adaptive tests have asymptotic distributions under null hypothesis. In simulation studies, we examine and compare asymptotic behaviors of the two kinds of adaptive estimators and test statistics. Moreover, we treat the SIR model which describes a simple epidemic spread for a biological application.
Bayesian approaches are appealing for constrained inference problems by allowing a probabilistic characterization of uncertainty, while providing a computational machinery for incorporating complex constraints in hierarchical models. However, the usual Bayesian strategy of placing a prior on the constrained space and conducting posterior computation with Markov chain Monte Carlo algorithms is often intractable. An alternative is to conduct inference for a less constrained posterior and project samples to the constrained space through a minimal distance mapping. We formalize and provide a unifying framework for such posterior projections. For theoretical tractability, we initially focus on constrained parameter spaces corresponding to closed and convex subsets of the original space. We then consider non-convex Stiefel manifolds. We provide a general formulation of projected posteriors in a Bayesian decision-theoretic framework. We show that asymptotic properties of the unconstrained posterior are transferred to the projected posterior, leading to asymptotically correct credible intervals. We demonstrate numerically that projected posteriors can have better performance that competitor approaches in real data examples.
Heatmap-based methods dominate in the field of human pose estimation by modelling the output distribution through likelihood heatmaps. In contrast, regression-based methods are more efficient but suffer from inferior performance. In this work, we explore maximum likelihood estimation (MLE) to develop an efficient and effective regression-based methods. From the perspective of MLE, adopting different regression losses is making different assumptions about the output density function. A density function closer to the true distribution leads to a better regression performance. In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution. Concretely, RLE learns the change of the distribution instead of the unreferenced underlying distribution to facilitate the training process. With the proposed reparameterization design, our method is compatible with off-the-shelf flow models. The proposed method is effective, efficient and flexible. We show its potential in various human pose estimation tasks with comprehensive experiments. Compared to the conventional regression paradigm, regression with RLE bring 12.4 mAP improvement on MSCOCO without any test-time overhead. Moreover, for the first time, especially on multi-person pose estimation, our regression method is superior to the heatmap-based methods. Our code is available at //github.com/Jeff-sjtu/res-loglikelihood-regression
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.