Early identification of Mild Cognitive Impairment (MCI) subjects who will eventually progress to Alzheimer Disease (AD) is challenging. Existing deep learning models are mostly single-modality single-task models predicting risk of disease progression at a fixed timepoint. We proposed a multimodal hierarchical multi-task learning approach which can monitor the risk of disease progression at each timepoint of the visit trajectory. Longitudinal visit data from multiple modalities (MRI, cognition, and clinical data) were collected from MCI individuals of the Alzheimer Disease Neuroimaging Initiative (ADNI) dataset. Our hierarchical model predicted at every timepoint a set of neuropsychological composite cognitive function scores as auxiliary tasks and used the forecasted scores at every timepoint to predict the future risk of disease. Relevance weights for each composite function provided explanations about potential factors for disease progression. Our proposed model performed better than state-of-the-art baselines in predicting AD progression risk and the composite scores. Ablation study on the number of modalities demonstrated that imaging and cognition data contributed most towards the outcome. Model explanations at each timepoint can inform clinicians 6 months in advance the potential cognitive function decline that can lead to progression to AD in future. Our model monitored their risk of AD progression every 6 months throughout the visit trajectory of individuals. The hierarchical learning of auxiliary tasks allowed better optimization and allowed longitudinal explanations for the outcome. Our framework is flexible with the number of input modalities and the selection of auxiliary tasks and hence can be generalized to other clinical problems too.
Recently, a new Continual Learning (CL) paradigm was presented to control catastrophic forgetting, called Interval Continual Learning (InterContiNet), which relies on enforcing interval constraints on the neural network parameter space. Unfortunately, InterContiNet training is challenging due to the high dimensionality of the weight space, making intervals difficult to manage. To address this issue, we introduce HyperInterval, a technique that employs interval arithmetic within the embedding space and utilizes a hypernetwork to map these intervals to the target network parameter space. We train interval embeddings for consecutive tasks and train a hypernetwork to transform these embeddings into weights of the target network. An embedding for a given task is trained along with the hypernetwork, preserving the response of the target network for the previous task embeddings. Interval arithmetic works with a more manageable, lower-dimensional embedding space rather than directly preparing intervals in a high-dimensional weight space. Our model allows faster and more efficient training. Furthermore, HyperInterval maintains the guarantee of not forgetting. At the end of training, we can choose one universal embedding to produce a single network dedicated to all tasks. In such a framework, hypernetwork is used only for training and can be seen as a meta-trainer. HyperInterval obtains significantly better results than InterContiNet and gives SOTA results on several benchmarks.
Observational epidemiological studies commonly seek to estimate the causal effect of an exposure on an outcome. Adjustment for potential confounding bias in modern studies is challenging due to the presence of high-dimensional confounding, induced when there are many confounders relative to sample size, or complex relationships between continuous confounders and exposure and outcome. As a promising avenue to overcome this challenge, doubly robust methods (Augmented Inverse Probability Weighting (AIPW) and Targeted Maximum Likelihood Estimation (TMLE)) enable the use of data-adaptive approaches to fit the two models they involve. Biased standard errors may result when the data-adaptive approaches used are very complex. The coupling of doubly robust methods with cross-fitting has been proposed to tackle this. Despite advances, limited evaluation, comparison, and guidance are available on the implementation of AIPW and TMLE with data-adaptive approaches and cross-fitting in realistic settings where high-dimensional confounding is present. We conducted an extensive simulation study to compare the relative performance of AIPW and TMLE using data-adaptive approaches in estimating the average causal effect (ACE) and evaluated the benefits of using cross-fitting with a varying number of folds, as well as the impact of using a reduced versus full (larger, more diverse) library in the Super Learner (SL) ensemble learning approach used for the data-adaptive models. A range of scenarios in terms of data generation, and sample size were considered. We found that AIPW and TMLE performed similarly in most cases for estimating the ACE, but TMLE was more stable. Cross-fitting improved the performance of both methods, with the number of folds a less important consideration. Using a full SL library was important to reduce bias and variance in the complex scenarios typical of modern health research studies.
A high-order, degree-adaptive hybridizable discontinuous Galerkin (HDG) method is presented for two-fluid incompressible Stokes flows, with boundaries and interfaces described using NURBS. The NURBS curves are embedded in a fixed Cartesian grid, yielding an unfitted HDG scheme capable of treating the exact geometry of the boundaries/interfaces, circumventing the need for fitted, high-order, curved meshes. The framework of the NURBS-enhanced finite element method (NEFEM) is employed for accurate quadrature along immersed NURBS and in elements cut by NURBS curves. A Nitsche's formulation is used to enforce Dirichlet conditions on embedded surfaces, yielding unknowns only on the mesh skeleton as in standard HDG, without introducing any additional degree of freedom on non-matching boundaries/interfaces. The resulting unfitted HDG-NEFEM method combines non-conforming meshes, exact NURBS geometry and high-order approximations to provide high-fidelity results on coarse meshes, independent of the geometric features of the domain. Numerical examples illustrate the optimal accuracy and robustness of the method, even in the presence of badly cut cells or faces, and its suitability to simulate microfluidic systems from CAD geometries.
Glucose meal response information collected via Continuous Glucose Monitoring (CGM) is relevant to the assessment of individual metabolic status and the support of personalized diet prescriptions. However, the complexity of the data produced by CGM monitors pushes the limits of existing analytic methods. CGM data often exhibits substantial within-person variability and has a natural multilevel structure. This research is motivated by the analysis of CGM data from individuals without diabetes in the AEGIS study. The dataset includes detailed information on meal timing and nutrition for each individual over different days. The primary focus of this study is to examine CGM glucose responses following patients' meals and explore the time-dependent associations with dietary and patient characteristics. Motivated by this problem, we propose a new analytical framework based on multilevel functional models, including a new functional mixed R-square coefficient. The use of these models illustrates 3 key points: (i) The importance of analyzing glucose responses across the entire functional domain when making diet recommendations; (ii) The differential metabolic responses between normoglycemic and prediabetic patients, particularly with regards to lipid intake; (iii) The importance of including random, person-level effects when modelling this scientific problem.
The computing resource needs of LHC experiments are expected to continue growing significantly during the Run 3 and into the HL-LHC era. The landscape of available resources will also evolve, as High Performance Computing (HPC) and Cloud resources will provide a comparable, or even dominant, fraction of the total compute capacity. The future years present a challenge for the experiments' resource provisioning models, both in terms of scalability and increasing complexity. The CMS Submission Infrastructure (SI) provisions computing resources for CMS workflows. This infrastructure is built on a set of federated HTCondor pools, currently aggregating 400k CPU cores distributed worldwide and supporting the simultaneous execution of over 200k computing tasks. Incorporating HPC resources into CMS computing represents firstly an integration challenge, as HPC centers are much more diverse compared to Grid sites. Secondly, evolving the present SI, dimensioned to harness the current CMS computing capacity, to reach the resource scales required for the HLLHC phase, while maintaining global flexibility and efficiency, will represent an additional challenge for the SI. To preventively address future potential scalability limits, the SI team regularly runs tests to explore the maximum reach of our infrastructure. In this note, the integration of HPC resources into CMS offline computing is summarized, the potential concerns for the SI derived from the increased scale of operations are described, and the most recent results of scalability test on the CMS SI are reported.
In the realm of Federated Learning (FL) applied to remote sensing image classification, this study introduces and assesses several innovative communication strategies. Our exploration includes feature-centric communication, pseudo-weight amalgamation, and a combined method utilizing both weights and features. Experiments conducted on two public scene classification datasets unveil the effectiveness of these strategies, showcasing accelerated convergence, heightened privacy, and reduced network information exchange. This research provides valuable insights into the implications of feature-centric communication in FL, offering potential applications tailored for remote sensing scenarios.
We study the asymptotic properties of Deshpande et al.\ (2019)'s multivariate spike-and-slab LASSO (mSSL) procedure for simultaneous variable and covariance selection in the sparse multivariate linear regression problem. In that problem, $q$ correlated responses are regressed onto $p$ covariates and the mSSL works by placing separate spike-and-slab priors on the entries in the matrix of marginal covariate effects and off-diagonal elements in the upper triangle of the residual precision matrix. Under mild assumptions about these matrices, we establish the posterior contraction rate for the mSSL posterior in the asymptotic regime where both $p$ and $q$ diverge with $n.$ By ``de-biasing'' the corresponding MAP estimates, we obtain confidence intervals for each covariate effect and residual partial correlation. In extensive simulation studies, these intervals displayed close-to-nominal frequentist coverage in finite sample settings but tended to be substantially longer than those obtained using a version of the Bayesian bootstrap that randomly re-weights the prior. We further show that the de-biased intervals for individual covariate effects are asymptotically valid.
We investigate analytically the behaviour of the penalized maximum partial likelihood estimator (PMPLE). Our results are derived for a generic separable regularization, but we focus on the elastic net. This penalization is routinely adopted for survival analysis in the high dimensional regime, where the Maximum Partial Likelihood estimator (no regularization) might not even exist. Previous theoretical results require that the number $s$ of non-zero association coefficients is $O(n^{\alpha})$, with $\alpha \in (0,1)$ and $n$ the sample size. Here we accurately characterize the behaviour of the PMPLE when $s$ is proportional to $n$ via the solution of a system of six non-linear equations that can be easily obtained by fixed point iteration. These equations are derived by means of the replica method and under the assumption that the covariates $\mathbf{X}\in \mathbb{R}^p$ follow a multivariate Gaussian law with covariance $\mathbf{I}_p/p$. The solution of the previous equations allows us to investigate the dependency of various metrics of interest and hence their dependency on the ratio $\zeta = p/n$, the fraction of true active components $\nu = s/p$, and the regularization strength. We validate our results by extensive numerical simulations.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.
The Evidential regression network (ENet) estimates a continuous target and its predictive uncertainty without costly Bayesian model averaging. However, it is possible that the target is inaccurately predicted due to the gradient shrinkage problem of the original loss function of the ENet, the negative log marginal likelihood (NLL) loss. In this paper, the objective is to improve the prediction accuracy of the ENet while maintaining its efficient uncertainty estimation by resolving the gradient shrinkage problem. A multi-task learning (MTL) framework, referred to as MT-ENet, is proposed to accomplish this aim. In the MTL, we define the Lipschitz modified mean squared error (MSE) loss function as another loss and add it to the existing NLL loss. The Lipschitz modified MSE loss is designed to mitigate the gradient conflict with the NLL loss by dynamically adjusting its Lipschitz constant. By doing so, the Lipschitz MSE loss does not disturb the uncertainty estimation of the NLL loss. The MT-ENet enhances the predictive accuracy of the ENet without losing uncertainty estimation capability on the synthetic dataset and real-world benchmarks, including drug-target affinity (DTA) regression. Furthermore, the MT-ENet shows remarkable calibration and out-of-distribution detection capability on the DTA benchmarks.