Flexible estimation of multiple conditional quantiles is of interest in numerous applications, such as studying the effect of pregnancy-related factors on low and high birth weight. We propose a Bayesian non-parametric method to simultaneously estimate non-crossing, non-linear quantile curves. We expand the conditional distribution function of the response in I-spline basis functions where the covariate-dependent coefficients are modeled using neural networks. By leveraging the approximation power of splines and neural networks, our model can approximate any continuous quantile function. Compared to existing models, our model estimates all rather than a finite subset of quantiles, scales well to high dimensions, and accounts for estimation uncertainty. While the model is arbitrarily flexible, interpretable marginal quantile effects are estimated using accumulative local effect plots and variable importance measures. A simulation study shows that our model can better recover quantiles of the response distribution when the data is sparse, and an analysis of birth weight data is presented.
Compared to the nominal scale, the ordinal scale for a categorical outcome variable has the property of making a monotonicity assumption for the covariate effects meaningful. This assumption is encoded in the commonly used proportional odds model, but there it is combined with other parametric assumptions such as linearity and additivity. Herein, the considered models are non-parametric and the only condition imposed is that the effects of the covariates on the outcome categories are stochastically monotone according to the ordinal scale. We are not aware of the existence of other comparable multivariable models that would be suitable for inference purposes. We generalize our previously proposed Bayesian monotonic multivariable regression model to ordinal outcomes, and propose an estimation procedure based on reversible jump Markov chain Monte Carlo. The model is based on a marked point process construction, which allows it to approximate arbitrary monotonic regression function shapes, and has a built-in covariate selection property. We study the performance of the proposed approach through extensive simulation studies, and demonstrate its practical application in two real data examples.
We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. In this distribution-free regression setting, we show that boundedness of the conditional second moment of the response given the covariates is a necessary and sufficient condition for achieving nontrivial guarantees. As a starting point, we prove an optimal version of the classical in-expectation bound for the truncated least squares estimator due to Gy\"{o}rfi, Kohler, Krzy\.{z}ak, and Walk. However, we show that this procedure fails with constant probability for some distributions despite its optimal in-expectation performance. Then, combining the ideas of truncated least squares, median-of-means procedures, and aggregation theory, we construct a non-linear estimator achieving excess risk of order $d/n$ with an optimal sub-exponential tail. While existing approaches to linear regression for heavy-tailed distributions focus on proper estimators that return linear functions, we highlight that the improperness of our procedure is necessary for attaining nontrivial guarantees in the distribution-free setting.
In this article, a discrete analogue of continuous Teissier distribution is presented. Its several important distributional characteristics have been derived. The estimation of the unknown parameter has been done using the method of maximum likelihood and the method of moment. Two real data applications have been presented to show the applicability of the proposed model.
In observational studies, causal inference relies on several key identifying assumptions. One identifiability condition is the positivity assumption, which requires the probability of treatment be bounded away from 0 and 1. That is, for every covariate combination, it should be possible to observe both treated and control subjects, i.e., the covariate distributions should overlap between treatment arms. If the positivity assumption is violated, population-level causal inference necessarily involves some extrapolation. Ideally, a greater amount of uncertainty about the causal effect estimate should be reflected in such situations. With that goal in mind, we construct a Gaussian process model for estimating treatment effects in the presence of practical violations of positivity. Advantages of our method include minimal distributional assumptions, a cohesive model for estimating treatment effects, and more uncertainty associated with areas in the covariate space where there is less overlap. We assess the performance of our approach with respect to bias and efficiency using simulation studies. The method is then applied to a study of critically ill female patients to examine the effect of undergoing right heart catheterization.
Interpretability of learning algorithms is crucial for applications involving critical decisions, and variable importance is one of the main interpretation tools. Shapley effects are now widely used to interpret both tree ensembles and neural networks, as they can efficiently handle dependence and interactions in the data, as opposed to most other variable importance measures. However, estimating Shapley effects is a challenging task, because of the computational complexity and the conditional expectation estimates. Accordingly, existing Shapley algorithms have flaws: a costly running time, or a bias when input variables are dependent. Therefore, we introduce SHAFF, SHApley eFfects via random Forests, a fast and accurate Shapley effect estimate, even when input variables are dependent. We show SHAFF efficiency through both a theoretical analysis of its consistency, and the practical performance improvements over competitors with extensive experiments. An implementation of SHAFF in C++ and R is available online.
This paper proposes methods for Bayesian inference in time-varying parameter (TVP) quantile regression (QR) models featuring conditional heteroskedasticity. I use data augmentation schemes to render the model conditionally Gaussian and develop an efficient Gibbs sampling algorithm. Regularization of the high-dimensional parameter space is achieved via flexible dynamic shrinkage priors. A simple version of TVP-QR based on an unobserved component model is applied to dynamically trace the quantiles of the distribution of inflation in the United States, the United Kingdom and the euro area. In an out-of-sample forecast exercise, I find the proposed model to be competitive and perform particularly well for higher-order and tail forecasts. A detailed analysis of the resulting predictive distributions reveals that they are sometimes skewed and occasionally feature heavy tails.
In this paper we propose and study a version of the Dyadic Classification and Regression Trees (DCART) estimator from Donoho (1997) for (fixed design) quantile regression in general dimensions. We refer to this proposed estimator as the QDCART estimator. Just like the mean regression version, we show that a) a fast dynamic programming based algorithm with computational complexity $O(N \log N)$ exists for computing the QDCART estimator and b) an oracle risk bound (trading off squared error and a complexity parameter of the true signal) holds for the QDCART estimator. This oracle risk bound then allows us to demonstrate that the QDCART estimator enjoys adaptively rate optimal estimation guarantees for piecewise constant and bounded variation function classes. In contrast to existing results for the DCART estimator which requires subgaussianity of the error distribution, for our estimation guarantees to hold we do not need any restrictive tail decay assumptions on the error distribution. For instance, our results hold even when the error distribution has no first moment such as the Cauchy distribution. Apart from the Dyadic CART method, we also consider other variant methods such as the Optimal Regression Tree (ORT) estimator introduced in Chatterjee and Goswami (2019). In particular, we also extend the ORT estimator to the quantile setting and establish that it enjoys analogous guarantees. Thus, this paper extends the scope of these globally optimal regression tree based methodologies to be applicable for heavy tailed data. We then perform extensive numerical experiments on both simulated and real data which illustrate the usefulness of the proposed methods.
Given independent identically-distributed samples from a one-dimensional distribution, IAs are formed by partitioning samples into pairs, triplets, or nth-order groupings and retaining the median of those groupings that are approximately equal. A new statistical method, Independent Approximates (IAs), is defined and proven to enable closed-form estimation of the parameters of heavy-tailed distributions. The pdf of the IAs is proven to be the normalized nth-power of the original density. From this property, heavy-tailed distributions are proven to have well-defined means for their IA pairs, finite second moments for their IA triplets, and a finite, well-defined (n-1)th-moment for the nth-grouping. Estimation of the location, scale, and shape (inverse of degree of freedom) of the generalized Pareto and Student's t distributions are possible via a system of three equations. Performance analysis of the IA estimation methodology is conducted for the Student's t distribution using between 1000 to 100,000 samples. Closed-form estimates of the location and scale are determined from the mean of the IA pairs and the variance of the IA triplets, respectively. For the Student's t distribution, the geometric mean of the original samples provides a third equation to determine the shape, though its nonlinear solution requires an iterative solver. With 10,000 samples the relative bias of the parameter estimates is less than 0.01 and the relative precision is less than +/-0.1. The theoretical precision is finite for a limited range of the shape but can be extended by using higher-order groupings for a given moment.
In this paper, we are interested in nonparametric kernel estimation of a generalized regression function, including conditional cumulative distribution and conditional quantile functions, based on an incomplete sample $(X_t, Y_t, \zeta_t)_{t\in \mathbb{ R}^+}$ copies of a continuous-time stationary ergodic process $(X, Y, \zeta)$. The predictor $X$ is valued in some infinite-dimensional space, whereas the real-valued process $Y$ is observed when $\zeta= 1$ and missing whenever $\zeta = 0$. Pointwise and uniform consistency (with rates) of these estimators as well as a central limit theorem are established. Conditional bias and asymptotic quadratic error are also provided. Asymptotic and bootstrap-based confidence intervals for the generalized regression function are also discussed. A first simulation study is performed to compare the discrete-time to the continuous-time estimations. A second simulation is also conducted to discuss the selection of the optimal sampling mesh in the continuous-time case. Finally, it is worth noting that our results are stated under ergodic assumption without assuming any classical mixing conditions.
Heatmap-based methods dominate in the field of human pose estimation by modelling the output distribution through likelihood heatmaps. In contrast, regression-based methods are more efficient but suffer from inferior performance. In this work, we explore maximum likelihood estimation (MLE) to develop an efficient and effective regression-based methods. From the perspective of MLE, adopting different regression losses is making different assumptions about the output density function. A density function closer to the true distribution leads to a better regression performance. In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution. Concretely, RLE learns the change of the distribution instead of the unreferenced underlying distribution to facilitate the training process. With the proposed reparameterization design, our method is compatible with off-the-shelf flow models. The proposed method is effective, efficient and flexible. We show its potential in various human pose estimation tasks with comprehensive experiments. Compared to the conventional regression paradigm, regression with RLE bring 12.4 mAP improvement on MSCOCO without any test-time overhead. Moreover, for the first time, especially on multi-person pose estimation, our regression method is superior to the heatmap-based methods. Our code is available at //github.com/Jeff-sjtu/res-loglikelihood-regression