Objective: Function is increasingly recognized as an important indicator of whole-person health. This study evaluates the ability of publicly available large language models (LLMs) to accurately identify the presence of functioning information from clinical notes. We explore various strategies to improve the performance on this task. Materials and Methods: We collect a balanced binary classification dataset of 1000 sentences from the Mobility NER dataset, which was curated from n2c2 clinical notes. For evaluation, we construct zero-shot and few-shot prompts to query the LLMs whether a given sentence contains mobility functioning information. Two sampling techniques, random sampling and k-nearest neighbor (kNN)-based sampling, are used to select the few-shot examples. Furthermore, we apply a parameter-efficient prompt-based fine-tuning method to the LLMs and evaluate their performance under various training settings. Results: Flan-T5-xxl outperforms all other models in both zero-shot and few-shot settings, achieving a F1 score of 0.865 with a single demonstrative example selected by kNN sampling. In prompt-based fine-tuning experiments, this foundation model also demonstrates superior performance across all low-resource settings, particularly achieving an impressive F1 score of 0.922 using the full training dataset. The smaller model, Flan-T5-xl, requires fine-tuning with only 2.3M additional parameters to achieve comparable performance to the fully fine-tuned Gatortron-base model, both surpassing 0.9 F1 score. Conclusion: Open-source instruction-tuned LLMs demonstrate impressive in-context learning capability in the mobility functioning classification task. The performance of these models can be further improved by continuing fine-tuning on a task-specific dataset.
We propose a diffusion approximation method to the continuous-state Markov Decision Processes (MDPs) that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2D obstacle avoidance and 2.5D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.
Introduction: Even in effectively conducted randomised trials, the probability of a successful study remains relatively low. With recent advances in the next-generation sequencing technologies, there is a rapidly growing number of high-dimensional data, including genetic, molecular and phenotypic information, that have improved our understanding of driver genes, drug targets, and drug mechanisms of action. The leveraging of high-dimensional data holds promise for increased success of clinical trials. Methods: We provide an overview of methods for utilising high-dimensional data in clinical trials. We also investigate the use of these methods in practice through a review of recently published randomised clinical trials that utilise high-dimensional genetic data. The review includes articles that were published between 2019 and 2021, identified through the PubMed database. Results: Out of 174 screened articles, 100 (57.5%) were randomised clinical trials that collected high-dimensional data. The most common clinical area was oncology (30%), followed by chronic diseases (28%), nutrition and ageing (18%) and cardiovascular diseases (7%). The most common types of data analysed were gene expression data (70%), followed by DNA data (21%). The most common method of analysis (36.3%) was univariable analysis. Articles that described multivariable analyses used standard statistical methods. Most of the clinical trials had two arms. Discussion: New methodological approaches are required for more efficient analysis of the increasing amount of high-dimensional data collected in randomised clinical trials. We highlight the limitations and barriers to the current use of high-dimensional data in trials, and suggest potential avenues for improvement and future work.
Accurate diagnostic tests are crucial to ensure effective treatment, screening, and surveillance of diseases. However, the limited accuracy of individual biomarkers often hinders comprehensive screening. The heterogeneity of many diseases, particularly cancer, calls for the use of several biomarkers together into a composite diagnostic test. In this paper, we present a novel multivariate model that optimally combines multiple biomarkers using the likelihood ratio function. The model's parameters directly translate into computationally simple diagnostic accuracy measures. Additionally, our method allows for reliable predictions even in scenarios where specific biomarker measurements are unavailable and can guide the selection of biomarker combinations under resource constraints. We conduct simulation studies to compare the performance to popular classification and discriminant analysis methods. We utilize the approach to construct an optimal diagnostic test for hepatocellular carcinoma, a cancer type known for the absence of a single ideal marker. An accompanying R implementation is made available for reproducing all results.
Medical imaging papers often focus on methodology, but the quality of the algorithms and the validity of the conclusions are highly dependent on the datasets used. As creating datasets requires a lot of effort, researchers often use publicly available datasets, there is however no adopted standard for citing the datasets used in scientific papers, leading to difficulty in tracking dataset usage. In this work, we present two open-source tools we created that could help with the detection of dataset usage, a pipeline \url{//github.com/TheoSourget/Public_Medical_Datasets_References} using OpenAlex and full-text analysis, and a PDF annotation software \url{//github.com/TheoSourget/pdf_annotator} used in our study to manually label the presence of datasets. We applied both tools on a study of the usage of 20 publicly available medical datasets in papers from MICCAI and MIDL. We compute the proportion and the evolution between 2013 and 2023 of 3 types of presence in a paper: cited, mentioned in the full text, cited and mentioned. Our findings demonstrate the concentration of the usage of a limited set of datasets. We also highlight different citing practices, making the automation of tracking difficult.
In order to minimize the generalization error in neural networks, a novel technique to identify overfitting phenomena when training the learner is formally introduced. This enables support of a reliable and trustworthy early stopping condition, thus improving the predictive power of that type of modeling. Our proposal exploits the correlation over time in a collection of online indicators, namely characteristic functions for indicating if a set of hypotheses are met, associated with a range of independent stopping conditions built from a canary judgment to evaluate the presence of overfitting. That way, we provide a formal basis for decision making in terms of interrupting the learning process. As opposed to previous approaches focused on a single criterion, we take advantage of subsidiarities between independent assessments, thus seeking both a wider operating range and greater diagnostic reliability. With a view to illustrating the effectiveness of the halting condition described, we choose to work in the sphere of natural language processing, an operational continuum increasingly based on machine learning. As a case study, we focus on parser generation, one of the most demanding and complex tasks in the domain. The selection of cross-validation as a canary function enables an actual comparison with the most representative early stopping conditions based on overfitting identification, pointing to a promising start toward an optimal bias and variance control.
This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs to the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN) with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score.
The matched case-control design, up until recently mostly pertinent to epidemiological studies, is becoming customary in biomedical applications as well. For instance, in omics studies, it is quite common to compare cancer and healthy tissue from the same patient. Furthermore, researchers today routinely collect data from various and variable sources that they wish to relate to the case-control status. This highlights the need to develop and implement statistical methods that can take these tendencies into account. We present an R package penalizedclr, that provides an implementation of the penalized conditional logistic regression model for analyzing matched case-control studies. It allows for different penalties for different blocks of covariates, and it is therefore particularly useful in the presence of multi-source omics data. Both L1 and L2 penalties are implemented. Additionally, the package implements stability selection for variable selection in the considered regression model. The proposed method fills a gap in the available software for fitting high-dimensional conditional logistic regression model accounting for the matched design and block structure of predictors/features. The output consists of a set of selected variables that are significantly associated with case-control status. These features can then be investigated in terms of functional interpretation or validation in further, more targeted studies.
Most of the existing Mendelian randomization (MR) methods are limited by the assumption of linear causality between exposure and outcome, and the development of new non-linear MR methods is highly desirable. We introduce two-stage prediction estimation and control function estimation from econometrics to MR and extend them to non-linear causality. We give conditions for parameter identification and theoretically prove the consistency and asymptotic normality of the estimates. We compare the two methods theoretically under both linear and non-linear causality. We also extend the control function estimation to a more flexible semi-parametric framework without detailed parametric specifications of causality. Extensive simulations numerically corroborate our theoretical results. Application to UK Biobank data reveals non-linear causal relationships between sleep duration and systolic/diastolic blood pressure.
Breast cancer remains a global challenge, causing over 1 million deaths globally in 2018. To achieve earlier breast cancer detection, screening x-ray mammography is recommended by health organizations worldwide and has been estimated to decrease breast cancer mortality by 20-40%. Nevertheless, significant false positive and false negative rates, as well as high interpretation costs, leave opportunities for improving quality and access. To address these limitations, there has been much recent interest in applying deep learning to mammography; however, obtaining large amounts of annotated data poses a challenge for training deep learning models for this purpose, as does ensuring generalization beyond the populations represented in the training dataset. Here, we present an annotation-efficient deep learning approach that 1) achieves state-of-the-art performance in mammogram classification, 2) successfully extends to digital breast tomosynthesis (DBT; "3D mammography"), 3) detects cancers in clinically-negative prior mammograms of cancer patients, 4) generalizes well to a population with low screening rates, and 5) outperforms five-out-of-five full-time breast imaging specialists by improving absolute sensitivity by an average of 14%. Our results demonstrate promise towards software that can improve the accuracy of and access to screening mammography worldwide.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.