Truncated conditional expectation functions are objects of interest in a wide range of economic applications, including income inequality measurement, financial risk management, and impact evaluation. They typically involve truncating the outcome variable above or below certain quantiles of its conditional distribution. In this paper, based on local linear methods, a novel, two-stage, nonparametric estimator of such functions is proposed. In this estimation problem, the conditional quantile function is a nuisance parameter that has to be estimated in the first stage. The proposed estimator is insensitive to the first-stage estimation error owing to the use of a Neyman-orthogonal moment in the second stage. This construction ensures that inference methods developed for the standard nonparametric regression can be readily adapted to conduct inference on truncated conditional expectations. As an extension, estimation with an estimated truncation quantile level is considered. The proposed estimator is applied in two empirical settings: sharp regression discontinuity designs with a manipulated running variable and randomized experiments with sample selection.
Modern computational models in supervised machine learning are often highly parameterized universal approximators. As such, the value of the parameters is unimportant, and only the out of sample performance is considered. On the other hand much of the literature on model estimation assumes that the parameters themselves have intrinsic value, and thus is concerned with bias and variance of parameter estimates, which may not have any simple relationship to out of sample model performance. Therefore, within supervised machine learning, heavy use is made of ridge regression (i.e., L2 regularization), which requires the the estimation of hyperparameters and can be rendered ineffective by certain model parameterizations. We introduce an objective function which we refer to as Information-Corrected Estimation (ICE) that reduces KL divergence based generalization error for supervised machine learning. ICE attempts to directly maximize a corrected likelihood function as an estimator of the KL divergence. Such an approach is proven, theoretically, to be effective for a wide class of models, with only mild regularity restrictions. Under finite sample sizes, this corrected estimation procedure is shown experimentally to lead to significant reduction in generalization error compared to maximum likelihood estimation and L2 regularization.
In principal modern detectors, the task of object localization is implemented by the box subnet which concentrates on bounding box regression. The box subnet customarily predicts the position of the object by regressing box center position and scaling factors. Although this approach is frequently adopted, we observe that the result of localization remains defective, which makes the performance of the detector unsatisfactory. In this paper, we prove the flaws in the previous method through theoretical analysis and experimental verification and propose a novel solution to detect objects precisely. Rather than plainly focusing on center and size, our approach refines the edges of the bounding box on previous localization results by estimating the distribution at the boundary of the object. Experimental results have shown the potentiality and generalization of our proposed method.
Simultaneous inference for high-dimensional non-Gaussian time series is always considered to be a challenging problem. Such tasks require not only robust estimation of the coefficients in the random process, but also deriving limiting distribution for a sum of dependent variables. In this paper, we propose a multiplier bootstrap procedure to conduct simultaneous inference for the transition coefficients in high-dimensional non-Gaussian vector autoregressive (VAR) models. This bootstrap-assisted procedure allows the dimension of the time series to grow exponentially fast in the number of observations. As a test statistic, a de-biased estimator is constructed for simultaneous inference. Unlike the traditional de-biased/de-sparsifying Lasso estimator, robust convex loss function and normalizing weight function are exploited to avoid any unfavorable behavior at the tail of the distribution. We develop Gaussian approximation theory for VAR model to derive the asymptotic distribution of the de-biased estimator and propose a multiplier bootstrap-assisted procedure to obtain critical values under very mild moment conditions on the innovations. As an important tool in the convergence analysis of various estimators, we establish a Bernstein-type probabilistic concentration inequality for bounded VAR models. Numerical experiments verify the validity and efficiency of the proposed method.
We demonstrate a method for localizing where two smooths differ using a true discovery proportion (TDP) based interpretation. The procedure yields a statement on the proportion of some region where true differences exist between two smooths, which results from use of hypothesis tests on collections of basis coefficients parametrizing the smooths. The methodology avoids otherwise ad hoc means of doing so such as performing hypothesis tests on entire smooths of subsetted data. TDP estimates are 1-alpha confidence bounded simultaneously, assuring that the estimate for a region is a lower bound on the proportion of actual difference, or true discoveries, in that region with high confidence regardless of the number, location, or size of regions for which TDP is estimated. Our procedure is based on closed-testing using Simes local test. We develop expressions for the covariance of quadratic forms because of the multiple regression framework in which we use closed-testing results, which are shown to be non-negative in many settings. Our procedure is well-powered because of a result on the off-diagonal decay structure of the covariance matrix of penalized B-splines of degree two or less. We demonstrate achievement of estimated TDP in simulation for different specified alpha levels and degree of difference and analyze a data set of walking gait of cerebral palsy patients. Keywords: splines; smoothing; multiple testing; closed-testing; simultaneous confidence
COVID-19 incidence is analyzed at the provinces of some Spanish Communities during the period February-October, 2020. Two infinite-dimensional regression approaches are tested. The first one is implemented in the regression framework introduced in Ruiz-Medina, Miranda and Espejo (2019). Specifically, a bayesian framework is adopted in the estimation of the pure point spectrum of the temporal autocorrelation operator, characterizing the second-order structure of a surface sequence. The second approach is formulated in the context of spatial curve regression. A nonparametric estimator of the spectral density operator, based on the spatial periodogram operator, is computed to approximate the spatial correlation between curves. Dimension reduction is achieved by projection onto the empirical eigenvectors of the long-run spatial covariance operator. Cross-validation procedures are implemented to test the performance of the two functional regression approaches.
We consider the problem of estimating a $d$-dimensional discrete distribution from its samples observed under a $b$-bit communication constraint. In contrast to most previous results that largely focus on the global minimax error, we study the local behavior of the estimation error and provide \emph{pointwise} bounds that depend on the target distribution $p$. In particular, we show that the $\ell_2$ error decays with $O\left(\frac{\lVert p\rVert_{1/2}}{n2^b}\vee \frac{1}{n}\right)$ (In this paper, we use $a\vee b$ and $a \wedge b$ to denote $\max(a, b)$ and $\min(a,b)$ respectively.) when $n$ is sufficiently large, hence it is governed by the \emph{half-norm} of $p$ instead of the ambient dimension $d$. For the achievability result, we propose a two-round sequentially interactive estimation scheme that achieves this error rate uniformly over all $p$. Our scheme is based on a novel local refinement idea, where we first use a standard global minimax scheme to localize $p$ and then use the remaining samples to locally refine our estimate. We also develop a new local minimax lower bound with (almost) matching $\ell_2$ error, showing that any interactive scheme must admit a $\Omega\left( \frac{\lVert p \rVert_{{(1+\delta)}/{2}}}{n2^b}\right)$ $\ell_2$ error for any $\delta > 0$. The lower bound is derived by first finding the best parametric sub-model containing $p$, and then upper bounding the quantized Fisher information under this model. Our upper and lower bounds together indicate that the $\mathcal{H}_{1/2}(p) = \log(\lVert p \rVert_{{1}/{2}})$ bits of communication is both sufficient and necessary to achieve the optimal (centralized) performance, where $\mathcal{H}_{{1}/{2}}(p)$ is the R\'enyi entropy of order $2$. Therefore, under the $\ell_2$ loss, the correct measure of the local communication complexity at $p$ is its R\'enyi entropy.
This paper deals with a projection least square estimator of the function $J_0$ computed from multiple independent observations on $[0,T]$ of the process $Z$ defined by $dZ_t = J_0(t)d\langle M\rangle_t + dM_t$, where $M$ is a centered, continuous and square integrable martingale vanishing at $0$. Risk bounds are established on this estimator and on an associated adaptive estimator. An appropriate transformation allows to rewrite the differential equation $dX_t = V(X_t)(b_0(t)dt +\sigma(t)dB_t)$, where $B$ is a fractional Brownian motion of Hurst parameter $H\in (1/2,1)$, as a model of the previous type. So, the second part of the paper deals with risk bounds on a nonparametric estimator of $b_0$ derived from the results on the projection least square estimator of $J_0$. In particular, our results apply to the estimation of the drift function in a non-autonomous extension of the fractional Black-Scholes model introduced in Hu et al. (2003).
Many problems on signal processing reduce to nonparametric function estimation. We propose a new methodology, piecewise convex fitting (PCF), and give a two-stage adaptive estimate. In the first stage, the number and location of the change points is estimated using strong smoothing. In the second stage, a constrained smoothing spline fit is performed with the smoothing level chosen to minimize the MSE. The imposed constraint is that a single change point occurs in a region about each empirical change point of the first-stage estimate. This constraint is equivalent to requiring that the third derivative of the second-stage estimate has a single sign in a small neighborhood about each first-stage change point. We sketch how PCF may be applied to signal recovery, instantaneous frequency estimation, surface reconstruction, image segmentation, spectral estimation and multivariate adaptive regression.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.
This study considers the 3D human pose estimation problem in a single RGB image by proposing a conditional random field (CRF) model over 2D poses, in which the 3D pose is obtained as a byproduct of the inference process. The unary term of the proposed CRF model is defined based on a powerful heat-map regression network, which has been proposed for 2D human pose estimation. This study also presents a regression network for lifting the 2D pose to 3D pose and proposes the prior term based on the consistency between the estimated 3D pose and the 2D pose. To obtain the approximate solution of the proposed CRF model, the N-best strategy is adopted. The proposed inference algorithm can be viewed as sequential processes of bottom-up generation of 2D and 3D pose proposals from the input 2D image based on deep networks and top-down verification of such proposals by checking their consistencies. To evaluate the proposed method, we use two large-scale datasets: Human3.6M and HumanEva. Experimental results show that the proposed method achieves the state-of-the-art 3D human pose estimation performance.