Nosocomial infections have important consequences for patients and hospital staff: they worsen patient outcomes and their management stresses already overburdened health systems. Accurate judgements of whether an infection is nosocomial helps staff make appropriate choices to protect other patients within the hospital. Nosocomiality cannot be properly assessed without considering whether the infected patient came into contact with high risk potential infectors within the hospital. We developed a Bayesian model that integrates epidemiological, contact and pathogen genetic data to determine how likely an infection is to be nosocomial and the probability of given infection candidates being the source of the infection.
The exponential growth in scientific publications poses a severe challenge for human researchers. It forces attention to more narrow sub-fields, which makes it challenging to discover new impactful research ideas and collaborations outside one's own field. While there are ways to predict a scientific paper's future citation counts, they need the research to be finished and the paper written, usually assessing impact long after the idea was conceived. Here we show how to predict the impact of onsets of ideas that have never been published by researchers. For that, we developed a large evolving knowledge graph built from more than 21 million scientific papers. It combines a semantic network created from the content of the papers and an impact network created from the historic citations of papers. Using machine learning, we can predict the dynamic of the evolving network into the future with high accuracy, and thereby the impact of new research directions. We envision that the ability to predict the impact of new ideas will be a crucial component of future artificial muses that can inspire new impactful and interesting scientific ideas.
With the growing prevalence of diabetes and the associated public health burden, it is crucial to identify modifiable factors that could improve patients' glycemic control. In this work, we seek to examine associations between medication usage, concurrent comorbidities, and glycemic control, utilizing data from continuous glucose monitor (CGMs). CGMs provide interstitial glucose measurements, but reducing data to simple statistical summaries is common in clinical studies, resulting in substantial information loss. Recent advancements in the Frechet regression framework allow to utilize more information by treating the full distributional representation of CGM data as the response, while sparsity regularization enables variable selection. However, the methodology does not scale to large datasets. Crucially, variable selection inference using subsampling methods is computationally infeasible. We develop a new algorithm for sparse distributional regression by deriving a new explicit characterization of the gradient and Hessian of the underlying objective function, while also utilizing rotations on the sphere to perform feasible updates. The updated method is up to 10000-fold faster than the original approach, opening the door for applying sparse distributional regression to large-scale datasets and enabling previously unattainable subsampling-based inference. Applying our method to CGM data from patients with type 2 diabetes and obstructive sleep apnea, we found a significant association between sulfonylurea medication and glucose variability without evidence of association with glucose mean. We also found that overnight oxygen desaturation variability showed a stronger association with glucose regulation than overall oxygen desaturation levels.
Variable selection has played a critical role in modern statistical learning and scientific discoveries. Numerous regularization and Bayesian variable selection methods have been developed in the past two decades for variable selection, but most of these methods consider selecting variables for only one response. As more data is being collected nowadays, it is common to analyze multiple related responses from the same study. Existing multivariate variable selection methods select variables for all responses without considering the possible heterogeneity across different responses, i.e. some features may only predict a subset of responses but not the rest. Motivated by the multi-trait fine mapping problem in genetics to identify the causal variants for multiple related traits, we developed a novel multivariate Bayesian variable selection method to select critical predictors from a large number of grouped predictors that target at multiple correlated and possibly heterogeneous responses. Our new method is featured by its selection at multiple levels, its incorporation of prior biological knowledge to guide selection and identification of best subset of responses predictors target at. We showed the advantage of our method via extensive simulations and a real fine mapping example to identify causal variants associated with different subsets of addictive behaviors.
Selecting an evaluation metric is fundamental to model development, but uncertainty remains about when certain metrics are preferable and why. This paper introduces the concept of resolving power to describe the ability of an evaluation metric to distinguish between binary classifiers of similar quality. This ability depends on two attributes: 1. The metric's response to improvements in classifier quality (its signal), and 2. The metric's sampling variability (its noise). The paper defines resolving power generically as a metric's sampling uncertainty scaled by its signal. The primary application of resolving power is to assess threshold-free evaluation metrics, such as the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). A simulation study compares the AUROC and the AUPRC in a variety of contexts. It finds that the AUROC generally has greater resolving power, but that the AUPRC is better when searching among high-quality classifiers applied to low prevalence outcomes. The paper concludes by proposing an empirical method to estimate resolving power that can be applied to any dataset and any initial classification model.
Artificial intelligence (AI) shows great promise in revolutionizing medical imaging, improving diagnosis, and refining treatment methods. However, the training of AI models relies on extensive multi-center datasets, presenting a potential challenge due to concerns about data privacy protection. Federated learning offers a solution by enabling a collaborative model across multiple centers without sharing raw data. In this study, we present a Federated Attention Contrastive Learning (FACL) framework designed to address challenges associated with large-scale pathological images and data heterogeneity. FACL improves model generalization by maximizing attention consistency between the local client and the server model. To enhance privacy and validate robustness, we incorporate differential privacy by introducing noise during parameter transfer. We assess the effectiveness of FACL in cancer diagnosis and Gleason grading tasks using 19,461 whole slide images of prostate cancer sourced from multiple centers. In the diagnosis task, FACL achieves an area under the curve (AUC) of 0.9718, outperforming seven centers whose average AUC is 0.9499 when categories are relatively balanced. In the Gleason grading task, FACL attained a Kappa score of 0.8463, surpassing the average Kappa score of 0.7379 from six centers. In conclusion, FACL offers a robust, accurate, and cost-effective AI training model for prostate cancer pathology while maintaining effective data safeguards.
Several subjective proposals have been made for interpreting the strength of evidence in likelihood ratios and Bayes factors. I identify a more objective scaling by modelling the effect of evidence on belief. The resulting scale with base 3.73 aligns with previous proposals and may partly explain intuitions.
Although metaheuristics have been widely recognized as efficient techniques to solve real-world optimization problems, implementing them from scratch remains difficult for domain-specific experts without programming skills. In this scenario, metaheuristic optimization frameworks are a practical alternative as they provide a variety of algorithms composed of customized elements, as well as experimental support. Recently, many engineering problems require to optimize multiple or even many objectives, increasing the interest in appropriate metaheuristic algorithms and frameworks that might integrate new specific requirements while maintaining the generality and reusability principles they were conceived for. Based on this idea, this paper introduces JCLEC-MO, a Java framework for both multi- and many-objective optimization that enables engineers to apply, or adapt, a great number of multi-objective algorithms with little coding effort. A case study is developed and explained to show how JCLEC-MO can be used to address many-objective engineering problems, often requiring the inclusion of domain-specific elements, and to analyze experimental outcomes by means of conveniently connected R utilities.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.