Joint modeling longitudinal and survival data offers many advantages such as addressing measurement error and missing data in the longitudinal processes, understanding and quantifying the association between the longitudinal markers and the survival events and predicting the risk of events based on the longitudinal markers. A joint model involves multiple submodels (one for each longitudinal/survival outcome) usually linked together through correlated or shared random effects. Their estimation is computationally expensive (particularly due to a multidimensional integration of the likelihood over the random effects distribution) so that inference methods become rapidly intractable, and restricts applications of joint models to a small number of longitudinal markers and/or random effects. We introduce a Bayesian approximation based on the Integrated Nested Laplace Approximation algorithm implemented in the R package R-INLA to alleviate the computational burden and allow the estimation of multivariate joint models with less restrictions. Our simulation studies show that R-INLA substantially reduces the computation time and the variability of the parameter estimates compared to alternative estimation strategies. We further apply the methodology to analyze 5 longitudinal markers (3 continuous, 1 count, 1 binary, and 16 random effects) and competing risks of death and transplantation in a clinical trial on primary biliary cholangitis. R-INLA provides a fast and reliable inference technique for applying joint models to the complex multivariate data encountered in health research.
We study numerical integration over bounded regions in $\mathbb{R}^s, s\ge1$ with respect to some probability measure. We replace random sampling with quasi-Monte Carlo methods, where the underlying point set is derived from deterministic constructions that aim to fill the space more evenly than random points. Such quasi-Monte Carlo point sets are ordinarily designed for the uniform measure, and the theory only works for product measures when a coordinate-wise transformation is applied. Going beyond this setting, we first consider the case where the target density is a mixture distribution where each term in the mixture comes from a product distribution. Next we consider target densities which can be approximated with such mixture distributions. We require the approximation to be a sum of coordinate-wise products and the approximation to be positive everywhere (so that they can be re-scaled to probability density functions). We use tensor product hat function approximations for this purpose here, since a hat function approximation of a positive function is itself positive. We also study more complex algorithms, where we first approximate the target density with a general Gaussian mixture distribution and approximate the mixtures with an adaptive hat function approximation on rotated intervals. The Gaussian mixture approximation allows us to locate the essential parts of the target density, whereas the adaptive hat function approximation allows us to approximate the finer structure of the target density. We prove convergence rates for each of the integration techniques based on quasi-Monte Carlo sampling for integrands with bounded partial mixed derivatives. The employed algorithms are based on digital $(t,s)$-sequences over the finite field $\mathbb{F}_2$ and an inversion method. Numerical examples illustrate the performance of the algorithms for some target densities and integrands.
The well-known discrete Fourier transform (DFT) can easily be generalized to arbitrary nodes in the spatial domain. The fast procedure for this generalization is referred to as nonequispaced fast Fourier transform (NFFT). Various applications such as MRI, solution of PDEs, etc., are interested in the inverse problem, i.e., computing Fourier coefficients from given nonequispaced data. In this paper we survey different kinds of approaches to tackle this problem. In contrast to iterative procedures, where multiple iteration steps are needed for computing a solution, we focus especially on so-called direct inversion methods. We review density compensation techniques and introduce a new scheme that leads to an exact reconstruction for trigonometric polynomials. In addition, we consider a matrix optimization approach using Frobenius norm minimization to obtain an inverse NFFT.
We provide a novel characterization of augmented balancing weights, also known as Automatic Debiased Machine Learning (AutoDML). These estimators combine outcome modeling with balancing weights, which estimate inverse propensity score weights directly. When the outcome and weighting models are both linear in some (possibly infinite) basis, we show that the augmented estimator is equivalent to a single linear model with coefficients that combine the original outcome model coefficients and OLS; in many settings, the augmented estimator collapses to OLS alone. We then extend these results to specific choices of outcome and weighting models. We first show that the combined estimator that uses (kernel) ridge regression for both outcome and weighting models is equivalent to a single, undersmoothed (kernel) ridge regression; this also holds when considering asymptotic rates. When the weighting model is instead lasso regression, we give closed-form expressions for special cases and demonstrate a ``double selection'' property. Finally, we generalize these results to linear estimands via the Riesz representer. Our framework ``opens the black box'' on these increasingly popular estimators and provides important insights into estimation choices for augmented balancing weights.
Longitudinal studies are subject to nonresponse when individuals fail to provide data for entire waves or particular questions of the survey. We compare approaches to nonresponse bias analysis (NRBA) in longitudinal studies and illustrate them on the Early Childhood Longitudinal Study, Kindergarten Class of 2010-11 (ECLS-K:2011). Wave nonresponse with attrition often yields a monotone missingness pattern, and the missingness mechanism can be missing at random (MAR) or missing not at random (MNAR). We discuss weighting, multiple imputation (MI), incomplete data modeling, and Bayesian approaches to NRBA for monotone patterns. Weighting adjustments are effective when the constructed weights are correlated to the survey outcome of interest. MI allows for variables with missing values to be included in the imputation model, yielding potentially less biased and more efficient estimates. Multilevel models with maximum likelihood estimation and marginal models estimated using generalized estimating equations can also handle incomplete longitudinal data. Bayesian methods introduce prior information and potentially stabilize model estimation. We add offsets in the MAR results to provide sensitivity analyses to assess MNAR deviations. We conduct NRBA for descriptive summaries and analytic model estimates and find that in the ECLS-K:2011 application, NRBA yields minor changes to the substantive conclusions. The strength of evidence about our NRBA depends on the strength of the relationship between the characteristics in the nonresponse adjustment and the key survey outcomes, so the key to a successful NRBA is to include strong predictors.
The Naive Bayesian classifier is a popular classification method employing the Bayesian paradigm. The concept of having conditional dependence among input variables sounds good in theory but can lead to a majority vote style behaviour. Achieving conditional independence is often difficult, and they introduce decision biases in the estimates. In Naive Bayes, certain features are called independent features as they have no conditional correlation or dependency when predicting a classification. In this paper, we focus on the optimal partition of features by proposing a novel technique called the Comonotone-Independence Classifier (CIBer) which is able to overcome the challenges posed by the Naive Bayes method. For different datasets, we clearly demonstrate the efficacy of our technique, where we achieve lower error rates and higher or equivalent accuracy compared to models such as Random Forests and XGBoost.
Model distillation has been a popular method for producing interpretable machine learning. It uses an interpretable "student" model to mimic the predictions made by the black box "teacher" model. However, when the student model is sensitive to the variability of the data sets used for training even when keeping the teacher fixed, the corresponded interpretation is not reliable. Existing strategies stabilize model distillation by checking whether a large enough corpus of pseudo-data is generated to reliably reproduce student models, but methods to do so have so far been developed for a specific student model. In this paper, we develop a generic approach for stable model distillation based on central limit theorem for the average loss. We start with a collection of candidate student models and search for candidates that reasonably agree with the teacher. Then we construct a multiple testing framework to select a corpus size such that the consistent student model would be selected under different pseudo samples. We demonstrate the application of our proposed approach on three commonly used intelligible models: decision trees, falling rule lists and symbolic regression. Finally, we conduct simulation experiments on Mammographic Mass and Breast Cancer datasets and illustrate the testing procedure throughout a theoretical analysis with Markov process. The code is publicly available at //github.com/yunzhe-zhou/GenericDistillation.
Poverty is a multifaceted phenomenon linked to the lack of capabilities of households to earn a sustainable livelihood, increasingly being assessed using multidimensional indicators. Its spatial pattern depends on social, economic, political, and regional variables. Artificial intelligence has shown immense scope in analyzing the complexities and nuances of poverty. The proposed project aims to examine the poverty situation of rural India for the period of 1990-2022 based on the quality of life and livelihood indicators. The districts will be classified into `advanced', `catching up', `falling behind', and `lagged' regions. The project proposes to integrate multiple data sources, including conventional national-level large sample household surveys, census surveys, and proxy variables like daytime, and nighttime data from satellite images, and communication networks, to name a few, to provide a comprehensive view of poverty at the district level. The project also intends to examine causation and longitudinal analysis to examine the reasons for poverty. Poverty and inequality could be widening in developing countries due to demographic and growth-agglomerating policies. Therefore, targeting the lagging regions and the vulnerable population is essential to eradicate poverty and improve the quality of life to achieve the goal of `zero poverty'. Thus, the study also focuses on the districts with a higher share of the marginal section of the population compared to the national average to trace the performance of development indicators and their association with poverty in these regions.
We propose a constrained maximum partial likelihood estimator for dimension reduction in integrative (e.g., pan-cancer) survival analysis with high-dimensional covariates. We assume that for each population in the study, the hazard function follows a distinct Cox proportional hazards model. To borrow information across populations, we assume that all of the hazard functions depend only on a small number of linear combinations of the predictors. We estimate these linear combinations using an algorithm based on "distance-to-set" penalties. This allows us to impose both low-rankness and sparsity. We derive asymptotic results which reveal that our regression coefficient estimator is more efficient than fitting a separate proportional hazards model for each population. Numerical experiments suggest that our method outperforms related competitors under various data generating models. We use our method to perform a pan-cancer survival analysis relating protein expression to survival across 18 distinct cancer types. Our approach identifies six linear combinations, depending on only 20 proteins, which explain survival across the cancer types. Finally, we validate our fitted model on four external datasets and show that our estimated coefficients can lead to better prediction than popular competitors.
Multi-task learning (MTL) is a methodology that aims to improve the general performance of estimation and prediction by sharing common information among related tasks. In the MTL, there are several assumptions for the relationships and methods to incorporate them. One of the natural assumptions in the practical situation is that tasks are classified into some clusters with their characteristics. For this assumption, the group fused regularization approach performs clustering of the tasks by shrinking the difference among tasks. This enables us to transfer common information within the same cluster. However, this approach also transfers the information between different clusters, which worsens the estimation and prediction. To overcome this problem, we propose an MTL method with a centroid parameter representing a cluster center of the task. Because this model separates parameters into the parameters for regression and the parameters for clustering, we can improve estimation and prediction accuracy for regression coefficient vectors. We show the effectiveness of the proposed method through Monte Carlo simulations and applications to real data.
The Bayesian paradigm has the potential to solve core issues of deep neural networks such as poor calibration and data inefficiency. Alas, scaling Bayesian inference to large weight spaces often requires restrictive approximations. In this work, we show that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors. The other weights are kept as point estimates. This subnetwork inference framework enables us to use expressive, otherwise intractable, posterior approximations over such subsets. In particular, we implement subnetwork linearized Laplace: We first obtain a MAP estimate of all weights and then infer a full-covariance Gaussian posterior over a subnetwork. We propose a subnetwork selection strategy that aims to maximally preserve the model's predictive uncertainty. Empirically, our approach is effective compared to ensembles and less expressive posterior approximations over full networks.