Many datasets are collected automatically, and are thus easily contaminated by outliers. In order to overcome this issue there was recently a regain of interest in robust estimation. However, most robust estimation methods are designed for specific models. In regression, methods have been notably developed for estimating the regression coefficients in generalized linear models, while some other approaches have been proposed e.g.\ for robust inference in beta regression or in sample selection models. In this paper, we propose Maximum Mean Discrepancy optimization as a universal framework for robust regression. We prove non-asymptotic error bounds, showing that our estimators are robust to Huber-type contamination. We also provide a (stochastic) gradient algorithm for computing these estimators, whose implementation requires only to be able to sample from the model and to compute the gradient of its log-likelihood function. We finally illustrate the proposed approach by a set of simulations.
In time-to-event settings, g-computation and doubly robust estimators are based on discrete-time data. However, many biological processes are evolving continuously over time. In this paper, we extend the g-computation and the doubly robust standardisation procedures to a continuous-time context. We compare their performance to the well-known inverse-probability-weighting (IPW) estimator for the estimation of the hazard ratio and restricted mean survival times difference, using a simulation study. Under a correct model specification, all methods are unbiased, but g-computation and the doubly robust standardisation are more efficient than inverse probability weighting. We also analyse two real-world datasets to illustrate the practical implementation of these approaches. We have updated the R package RISCA to facilitate the use of these methods and their dissemination.
In randomized experiments, adjusting for observed features when estimating treatment effects has been proposed as a way to improve asymptotic efficiency. However, only linear regression has been proven to form an estimate of the average treatment effect that is asymptotically no less efficient than the treated-minus-control difference in means regardless of the true data generating process. Randomized treatment assignment provides this "do-no-harm" property, with neither truth of a linear model nor a generative model for the outcomes being required. We present a general calibration method which confers the same no-harm property onto estimators leveraging a broad class of nonlinear models. This recovers the usual regression-adjusted estimator when ordinary least squares is used, and further provides non-inferior treatment effect estimators using methods such as logistic and Poisson regression. The resulting estimators are non-inferior to both the difference in means estimator and to treatment effect estimators that have not undergone calibration. We show that our estimator is asymptotically equivalent to an inverse probability weighted estimator using a logit link with predicted potential outcomes as covariates. In a simulation study, we demonstrate that common nonlinear estimators without our calibration procedure may perform markedly worse than both the calibrated estimator and the unadjusted difference in means.
Goodness-of-fit (GoF) testing is ubiquitous in statistics, with direct ties to model selection, confidence interval construction, conditional independence testing, and multiple testing, just to name a few applications. While testing the GoF of a simple (point) null hypothesis provides an analyst great flexibility in the choice of test statistic while still ensuring validity, most GoF tests for composite null hypotheses are far more constrained, as the test statistic must have a tractable distribution over the entire null model space. A notable exception is co-sufficient sampling (CSS): resampling the data conditional on a sufficient statistic for the null model guarantees valid GoF testing using any test statistic the analyst chooses. But CSS testing requires the null model to have a compact (in an information-theoretic sense) sufficient statistic, which only holds for a very limited class of models; even for a null model as simple as logistic regression, CSS testing is powerless. In this paper, we leverage the concept of approximate sufficiency to generalize CSS testing to essentially any parametric model with an asymptotically-efficient estimator; we call our extension "approximate CSS" (aCSS) testing. We quantify the finite-sample Type I error inflation of aCSS testing and show that it is vanishing under standard maximum likelihood asymptotics, for any choice of test statistic. We apply our proposed procedure both theoretically and in simulation to a number of models of interest to demonstrate its finite-sample Type I error and power.
In this paper, a functional partial quantile regression approach, a quantile regression analog of the functional partial least squares regression, is proposed to estimate the function-on-function linear quantile regression model. A partial quantile covariance function is first used to extract the functional partial quantile regression basis functions. The extracted basis functions are then used to obtain the functional partial quantile regression components and estimate the final model. In our proposal, the functional forms of the discretely observed random variables are first constructed via a finite-dimensional basis function expansion method. The functional partial quantile regression constructed using the functional random variables is approximated via the partial quantile regression constructed using the basis expansion coefficients. The proposed method uses an iterative procedure to extract the partial quantile regression components. A Bayesian information criterion is used to determine the optimum number of retained components. The proposed functional partial quantile regression model allows for more than one functional predictor in the model. However, the true form of the proposed model is unspecified, as the relevant predictors for the model are unknown in practice. Thus, a forward variable selection procedure is used to determine the significant predictors for the proposed model. Moreover, a case-sampling-based bootstrap procedure is used to construct pointwise prediction intervals for the functional response. The predictive performance of the proposed method is evaluated using several Monte Carlo experiments under different data generation processes and error distributions. Through an empirical data example, air quality data are analyzed to demonstrate the effectiveness of the proposed method.
Correlated data are ubiquitous in today's data-driven society. A fundamental task in analyzing these data is to understand, characterize and utilize the correlations in them in order to conduct valid inference. Yet explicit regression analysis of correlations has been so far limited to longitudinal data, a special form of correlated data, while implicit analysis via mixed-effects models lacks generality as a full inferential tool. This paper proposes a novel regression approach for modelling the correlation structure, leveraging a new generalized z-transformation. This transformation maps correlation matrices that are constrained to be positive definite to vectors with un-restricted support, and is order-invariant. Building on these two properties, we develop a regression model to relate the transformed parameters to any covariates. We show that coupled with a mean and a variance regression model, the use of maximum likelihood leads to asymptotically normal parameter estimates, and crucially enables statistical inference for all the parameters. The performance of our framework is demonstrated in extensive simulation. More importantly, we illustrate the use of our model with the analysis of the classroom data, a highly unbalanced multilevel clustered data with within-class and within-school correlations, and the analysis of the malaria immune response data in Benin, a longitudinal data with time-dependent covariates in addition to time. Our analyses reveal new insights not previously known.
We study a functional linear regression model that deals with functional responses and allows for both functional covariates and high-dimensional vector covariates. The proposed model is flexible and nests several functional regression models in the literature as special cases. Based on the theory of reproducing kernel Hilbert spaces (RKHS), we propose a penalized least squares estimator that can accommodate functional variables observed on discrete sample points. Besides a conventional smoothness penalty, a group Lasso-type penalty is further imposed to induce sparsity in the high-dimensional vector predictors. We derive finite sample theoretical guarantees and show that the excess prediction risk of our estimator is minimax optimal. Furthermore, our analysis reveals an interesting phase transition phenomenon that the optimal excess risk is determined jointly by the smoothness and the sparsity of the functional regression coefficients. A novel efficient optimization algorithm based on iterative coordinate descent is devised to handle the smoothness and group penalties simultaneously. Simulation studies and real data applications illustrate the promising performance of the proposed approach compared to the state-of-the-art methods in the literature.
Competing risk data appear widely in modern biomedical research. Cause-specific hazard models are often used to deal with competing risk data in the past two decades. There is no current study on the kernel likelihood method for the cause-specific hazard model with time-varying coefficients. We propose to use the local partial log-likelihood approach for nonparametric time-varying coefficient estimation. Simulation studies demonstrate that our proposed nonparametric kernel estimator has a good performance under assumed finite sample settings. Finally, we apply the proposed method to analyze a diabetes dialysis study with competing death causes.
The maximum norm error estimations for virtual element methods are studied. To establish the error estimations, we prove higher local regularity based on delicate analysis of Green's functions and high-order local error estimations for the partition of the virtual element solutions. The maximum norm of the exact gradient and the gradient of the projection of the virtual element solutions are proved to achieve optimal convergence results. For high-order virtual element methods, we establish the optimal convergence results in $L^{\infty}$ norm. Our theoretical discoveries are validated by a numerical example on general polygonal meshes.
Many complicated Bayesian posteriors are difficult to approximate by either sampling or optimisation methods. Therefore we propose a novel approach combining features of both. We use a flexible parameterised family of densities, such as a normalising flow. Given a density from this family approximating the posterior, we use importance sampling to produce a weighted sample from a more accurate posterior approximation. This sample is then used in optimisation to update the parameters of the approximate density, which we view as distilling the importance sampling results. We iterate these steps and gradually improve the quality of the posterior approximation. We illustrate our method in two challenging examples: a queueing model and a stochastic differential equation model.
This paper gives a new approach for the maximum likelihood estimation of the joint of the location and scale of the Cauchy distribution. We regard the joint as a single complex parameter and derive a new form of the likelihood equation of a complex variable. Based on the equation, we provide a new iterative scheme approximating the maximum likelihood estimate. We also handle the equation in an algebraic manner and derive a polynomial containing the maximum likelihood estimate as a root. This algebraic approach provides another scheme approximating the maximum likelihood estimate by root-finding algorithms for polynomials, and furthermore, gives non-existence of closed-form formulae for the case that the sample size is five. We finally provide some numerical examples to show our method is effective.