Beta regression model is useful in the analysis of bounded continuous outcomes such as proportions. It is well known that for any regression model, the presence of multicollinearity leads to poor performance of the maximum likelihood estimators. The ridge type estimators have been proposed to alleviate the adverse effects of the multicollinearity. Furthermore, when some of the predictors have insignificant or weak effects on the outcomes, it is desired to recover as much information as possible from these predictors instead of discarding them all together. In this paper we proposed ridge type shrinkage estimators for the low and high dimensional beta regression model, which address the above two issues simultaneously. We compute the biases and variances of the proposed estimators in closed forms and use Monte Carlo simulations to evaluate their performances. The results show that, both in low and high dimensional data, the performance of the proposed estimators are superior to ridge estimators that discard weak or insignificant predictors. We conclude this paper by applying the proposed methods for two real data from econometric and medicine.
We propose a new wavelet-based method for density estimation when the data are size-biased. More specifically, we consider a power of the density of interest, where this power exceeds 1/2. Warped wavelet bases are employed, where warping is attained by some continuous cumulative distribution function. This can be seen as a general framework in which the conventional orthonormal wavelet estimation is the case where warping distribution is the standard uniform c.d.f. We show that both linear and nonlinear wavelet estimators are consistent, with optimal and/or near-optimal rates. Monte Carlo simulations are performed to compare four special settings which are easy to interpret in practice. An application with a real dataset on fatal traffic accidents involving alcohol illustrates the method. We observe that warped bases provide more flexible and superior estimates for both simulated and real data. Moreover, we find that estimating the power of a density (for instance, its square root) further improves the results.
Standard regression methods can lead to inconsistent estimates of causal effects when there are time-varying treatment effects and time-varying covariates. This is due to certain non-confounding latent variables that create colliders in the causal graph. These latent variables, which we call phantoms, do not harm the identifiability of the causal effect, but they render naive regression estimates inconsistent. Motivated by this, we ask: how can we modify regression methods so that they hold up even in the presence of phantoms? We develop an estimator for this setting based on regression modeling (linear, log-linear, probit and Cox regression), proving that it is consistent for the causal effect of interest. In particular, the estimator is a regression model fit with a simple adjustment for collinearity, making it easy to understand and implement with standard regression software. From a causal point of view, the proposed estimator is an instance of the parametric g-formula. Importantly, we show that our estimator is immune to the null paradox that plagues most other parametric g-formula methods.
It is common practice to use Laplace approximations to compute marginal likelihoods in Bayesian versions of generalised linear models (GLM). Marginal likelihoods combined with model priors are then used in different search algorithms to compute the posterior marginal probabilities of models and individual covariates. This allows performing Bayesian model selection and model averaging. For large sample sizes, even the Laplace approximation becomes computationally challenging because the optimisation routine involved needs to evaluate the likelihood on the full set of data in multiple iterations. As a consequence, the algorithm is not scalable for large datasets. To address this problem, we suggest using a version of a popular batch stochastic gradient descent (BSGD) algorithm for estimating the marginal likelihood of a GLM by subsampling from the data. We further combine the algorithm with Markov chain Monte Carlo (MCMC) based methods for Bayesian model selection and provide some theoretical results on the convergence of the estimates. Finally, we report results from experiments illustrating the performance of the proposed algorithm.
We are concerned with the problem of decomposing the parameter space of a parametric system of polynomial equations, and possibly some polynomial inequality constraints, with respect to the number of real solutions that the system attains. Previous studies apply a two step approach to this problem, where first the discriminant variety of the system is computed via a Groebner Basis (GB), and then a Cylindrical Algebraic Decomposition (CAD) of this is produced to give the desired computation. However, even on some reasonably small applied examples this process is too expensive, with computation of the discriminant variety alone infeasible. In this paper we develop new approaches to build the discriminant variety using resultant methods (the Dixon resultant and a new method using iterated univariate resultants). This reduces the complexity compared to GB and allows for a previous infeasible example to be tackled. We demonstrate the benefit by giving a symbolic solution to a problem from population dynamics - the analysis of the steady states of three connected populations which exhibit Allee effects - which previously could only be tackled numerically.
We study regression adjustments with additional covariates in randomized experiments under covariate-adaptive randomizations (CARs) when subject compliance is imperfect. We develop a regression-adjusted local average treatment effect (LATE) estimator that is proven to improve efficiency in the estimation of LATEs under CARs. Our adjustments can be parametric in linear and nonlinear forms, nonparametric, and high-dimensional. Even when the adjustments are misspecified, our proposed estimator is still consistent and asymptotically normal, and their inference method still achieves the exact asymptotic size under the null. When the adjustments are correctly specified, our estimator achieves the minimum asymptotic variance. When the adjustments are parametrically misspecified, we construct a new estimator which is weakly more efficient than linearly and nonlinearly adjusted estimators, as well as the one without any adjustments. Simulation evidence and empirical application confirm efficiency gains achieved by regression adjustments relative to both the estimator without adjustment and the standard two-stage least squares estimator.
In clinical trials, there is potential to improve precision and reduce the required sample size by appropriately adjusting for baseline variables in the statistical analysis. This is called covariate adjustment. Despite recommendations by the U.S. Food and Drug Administration and the European Medicines Agency in favor of covariate adjustment, it remains underutilized leading to inefficient trials. We address two obstacles that make it challenging to use covariate adjustment. A first obstacle is the incompatibility of many covariate adjusted estimators with commonly used stopping boundaries in group sequential designs (GSDs). A second obstacle is the uncertainty at the design stage about how much precision gain will result from covariate adjustment; an incorrect projection of a covariate's prognostic value risks an over- or underpowered trial. To address these obstacles, we extend the theory of information-monitoring in GSDs to handle covariate adjusted estimators. In particular, we propose a new statistical method that modifies the original estimator so that it becomes compatible with GSDs, while increasing or leaving unchanged the estimator's precision. This is needed since many covariate adjusted estimators don't satisfy the key property (i.e., independent information increments) needed to apply commonly used stopping boundaries in GSDs. Our approach allows the use of any asymptotically linear estimator, which covers many estimators used in randomized trials. Building on this, we propose using an information adaptive design, that is, continuing the trial until the required information level is achieved. Such a design adapts to the amount of precision gain due to covariate adjustment, resulting in trials that are correctly powered and that fully leverage prognostic baseline variables; this can lead to faster, more efficient trials, without sacrificing validity or power.
Massive sized survival datasets are becoming increasingly prevalent with the development of the healthcare industry. Such datasets pose computational challenges unprecedented in traditional survival analysis use-cases. A popular way for coping with massive datasets is downsampling them to a more manageable size, such that the computational resources can be afforded by the researcher. Cox proportional hazards regression has remained one of the most popular statistical models for the analysis of survival data to-date. This work addresses the settings of right censored and possibly left truncated data with rare events, such that the observed failure times constitute only a small portion of the overall sample. We propose Cox regression subsampling-based estimators that approximate their full-data partial-likelihood-based counterparts, by assigning optimal sampling probabilities to censored observations, and including all observed failures in the analysis. Asymptotic properties of the proposed estimators are established under suitable regularity conditions, and simulation studies are carried out to evaluate the finite sample performance of the estimators. We further apply our procedure on UK-biobank colorectal cancer genetic and environmental risk factors.
Penalized $M-$estimators for logistic regression models have been previously study for fixed dimension in order to obtain sparse statistical models and automatic variable selection. In this paper, we derive asymptotic results for penalized $M-$estimators when the dimension $p$ grows to infinity with the sample size $n$. Specifically, we obtain consistency and rates of convergence results, for some choices of the penalty function. Moreover, we prove that these estimators consistently select variables with probability tending to 1 and derive their asymptotic distribution.
A kernel method for estimating a probability density function (pdf) from an i.i.d. sample drawn from such density is presented. Our estimator is a linear combination of kernel functions, the coefficients of which are determined by a linear equation. An error analysis for the mean integrated squared error is established in a general reproducing kernel Hilbert space setting. The theory developed is then applied to estimate pdfs belonging to weighted Korobov spaces, for which a dimension independent convergence rate is established. Under a suitable smoothness assumption, our method attains a rate arbitrarily close to the optimal rate. Numerical results support our theory.
This paper considers the problem of measure estimation under the barycentric coding model (BCM), in which an unknown measure is assumed to belong to the set of Wasserstein-2 barycenters of a finite set of known measures. Estimating a measure under this model is equivalent to estimating the unknown barycenteric coordinates. We provide novel geometrical, statistical, and computational insights for measure estimation under the BCM, consisting of three main results. Our first main result leverages the Riemannian geometry of Wasserstein-2 space to provide a procedure for recovering the barycentric coordinates as the solution to a quadratic optimization problem assuming access to the true reference measures. The essential geometric insight is that the parameters of this quadratic problem are determined by inner products between the optimal displacement maps from the given measure to the reference measures defining the BCM. Our second main result then establishes an algorithm for solving for the coordinates in the BCM when all the measures are observed empirically via i.i.d. samples. We prove precise rates of convergence for this algorithm -- determined by the smoothness of the underlying measures and their dimensionality -- thereby guaranteeing its statistical consistency. Finally, we demonstrate the utility of the BCM and associated estimation procedures in three application areas: (i) covariance estimation for Gaussian measures; (ii) image processing; and (iii) natural language processing.