Intracranial hemorrhage (ICH) is a life-threatening medical emergency that requires timely and accurate diagnosis for effective treatment and improved patient survival rates. While deep learning techniques have emerged as the leading approach for medical image analysis and processing, the most commonly employed supervised learning often requires large, high-quality annotated datasets that can be costly to obtain, particularly for pixel/voxel-wise image segmentation. To address this challenge and facilitate ICH treatment decisions, we introduce a novel weakly supervised method for ICH segmentation, utilizing a Swin transformer trained on an ICH classification task with categorical labels. Our approach leverages a hierarchical combination of head-wise gradient-infused self-attention maps to generate accurate image segmentation. Additionally, we conducted an exploratory study on different learning strategies and showed that binary ICH classification has a more positive impact on self-attention maps compared to full ICH subtyping. With a mean Dice score of 0.44, our technique achieved similar ICH segmentation performance as the popular U-Net and Swin-UNETR models with full supervision and outperformed a similar weakly supervised approach using GradCAM, demonstrating the excellent potential of the proposed framework in challenging medical image segmentation tasks. Our code is available at //github.com/HealthX-Lab/HGI-SAM.
Estimating treatment effects over time is relevant in many real-world applications, such as precision medicine, epidemiology, economy, and marketing. Many state-of-the-art methods either assume the observations of all confounders or seek to infer the unobserved ones. We take a different perspective by assuming unobserved risk factors, i.e., adjustment variables that affect only the sequence of outcomes. Under unconfoundedness, we target the Individual Treatment Effect (ITE) estimation with unobserved heterogeneity in the treatment response due to missing risk factors. We address the challenges posed by time-varying effects and unobserved adjustment variables. Led by theoretical results over the validity of the learned adjustment variables and generalization bounds over the treatment effect, we devise Causal DVAE (CDVAE). This model combines a Dynamic Variational Autoencoder (DVAE) framework with a weighting strategy using propensity scores to estimate counterfactual responses. The CDVAE model allows for accurate estimation of ITE and captures the underlying heterogeneity in longitudinal data. Evaluations of our model show superior performance over state-of-the-art models.
Longitudinal analysis in medical imaging is crucial to investigate the progressive changes in anatomical structures or disease progression over time. In recent years, a novel class of algorithms has emerged with the goal of learning disease progression in a self-supervised manner, using either pairs of consecutive images or time series of images. By capturing temporal patterns without external labels or supervision, longitudinal self-supervised learning (LSSL) has become a promising avenue. To better understand this core method, we explore in this paper the LSSL algorithm under different scenarios. The original LSSL is embedded in an auto-encoder (AE) structure. However, conventional self-supervised strategies are usually implemented in a Siamese-like manner. Therefore, (as a first novelty) in this study, we explore the use of Siamese-like LSSL. Another new core framework named neural ordinary differential equation (NODE). NODE is a neural network architecture that learns the dynamics of ordinary differential equations (ODE) through the use of neural networks. Many temporal systems can be described by ODE, including modeling disease progression. We believe that there is an interesting connection to make between LSSL and NODE. This paper aims at providing a better understanding of those core algorithms for learning the disease progression with the mentioned change. In our different experiments, we employ a longitudinal dataset, named OPHDIAT, targeting diabetic retinopathy (DR) follow-up. Our results demonstrate the application of LSSL without including a reconstruction term, as well as the potential of incorporating NODE in conjunction with LSSL.
Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to attain human parity on several publicly available clean speech datasets. However, even state-of-the-art ASR systems experience performance degradation when confronted with adverse conditions, as a well-trained acoustic model is sensitive to variations in the speech domain, e.g., background noise. Intuitively, humans address this issue by relying on their linguistic knowledge: the meaning of ambiguous spoken terms is usually inferred from contextual cues thereby reducing the dependency on the auditory system. Inspired by this observation, we introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction, where N-best decoding hypotheses provide informative elements for true transcription prediction. This approach is a paradigm shift from the traditional language model rescoring strategy that can only select one candidate hypothesis as the output transcription. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses and corresponding accurate transcriptions across prevalent speech domains. Given this dataset, we examine three types of error correction techniques based on LLMs with varying amounts of labeled hypotheses-transcription pairs, which gains a significant word error rate (WER) reduction. Experimental evidence demonstrates the proposed technique achieves a breakthrough by surpassing the upper bound of traditional re-ranking based methods. More surprisingly, LLM with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list. We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new evaluation paradigm for ASR error correction with LLMs.
Recently, various Artificial Intelligence (AI) based optimization metaheuristics are proposed and applied for a variety of problems. Cohort Intelligence (CI) algorithm is a socio inspired optimization technique which is successfully applied for solving several unconstrained & constrained real-world problems from the domains such as design, manufacturing, supply chain, healthcare, etc. Generally, real-world problems are constrained in nature. Even though most of the Evolutionary Algorithms (EAs) can efficiently solve unconstrained problems, their performance degenerates when the constraints are involved. In this paper, two novel constraint handling approaches based on modulus and hyperbolic tangent probability distributions are proposed. Constrained CI algorithm with constraint handling approaches based on triangular, modulus and hyperbolic tangent is presented and applied for optimizing advanced manufacturing processes such as Water Jet Machining (WJM), Abrasive Jet Machining (AJM), Ultrasonic Machining (USM) and Grinding process. The solutions obtained using proposed CI algorithm are compared with contemporary algorithms such as Genetic Algorithm, Simulated Annealing, Teaching Learning Based Optimization, etc. The proposed approaches achieved 2%-127% maximization of material removal rate satisfying hard constraints. As compared to the GA, CI with Hyperbolic tangent probability distribution achieved 15%, 2%, 2%, 127%, and 4% improvement in MRR for AJMB, AJMD, WJM, USM, and Grinding processes, respectively contributing to the productivity improvement. The contributions in this paper have opened several avenues for further applicability of the proposed constraint handling approaches for solving complex constrained problems.
Visual representation learning hold great promise for robotics, but is severely hampered by the scarcity and homogeneity of robotics datasets. Recent works address this problem by pre-training visual representations on large-scale but out-of-domain data (e.g., videos of egocentric interactions) and then transferring them to target robotics tasks. While the field is heavily focused on developing better pre-training algorithms, we find that dataset choice is just as important to this paradigm's success. After all, the representation can only learn the structures or priors present in the pre-training dataset. To this end, we flip the focus on algorithms, and instead conduct a dataset centric analysis of robotic pre-training. Our findings call into question some common wisdom in the field. We observe that traditional vision datasets (like ImageNet, Kinetics and 100 Days of Hands) are surprisingly competitive options for visuo-motor representation learning, and that the pre-training dataset's image distribution matters more than its size. Finally, we show that common simulation benchmarks are not a reliable proxy for real world performance and that simple regularization strategies can dramatically improve real world policy learning. //data4robotics.github.io
Regarding the rising number of people suffering from mental health illnesses in today's society, the importance of mental health cannot be overstated. Wearable sensors, which are increasingly widely available, provide a potential way to track and comprehend mental health issues. These gadgets not only monitor everyday activities but also continuously record vital signs like heart rate, perhaps providing information on a person's mental state. Recent research has used these sensors in conjunction with machine learning methods to identify patterns relating to different mental health conditions, highlighting the immense potential of this data beyond simple activity monitoring. In this research, we present a novel algorithm called the Hybrid Random forest - Neural network that has been tailored to evaluate sensor data from depressed patients. Our method has a noteworthy accuracy of 80\% when evaluated on a special dataset that included both unipolar and bipolar depressive patients as well as healthy controls. The findings highlight the algorithm's potential for reliably determining a person's depression condition using sensor data, making a substantial contribution to the area of mental health diagnostics.
MET protein overexpression is a targetable event in non-small cell lung cancer (NSCLC) and is the subject of active drug development. Challenges in identifying patients for these therapies include lack of access to validated testing, such as standardized immunohistochemistry (IHC) assessment, and consumption of valuable tissue for a single gene/protein assay. Development of pre-screening algorithms using routinely available digitized hematoxylin and eosin (H&E)-stained slides to predict MET overexpression could promote testing for those who will benefit most. While assessment of MET expression using IHC is currently not routinely performed in NSCLC, next-generation sequencing is common and in some cases includes RNA expression panel testing. In this work, we leveraged a large database of matched H&E slides and RNA expression data to train a weakly supervised model to predict MET RNA overexpression directly from H&E images. This model was evaluated on an independent holdout test set of 300 over-expressed and 289 normal patients, demonstrating an ROC-AUC of 0.70 (95th percentile interval: 0.66 - 0.74) with stable performance characteristics across different patient clinical variables and robust to synthetic noise on the test set. These results suggest that H&E-based predictive models could be useful to prioritize patients for confirmatory testing of MET protein or MET gene expression status.
Zoonotic disease transmission between animals and humans is a growing risk and the agricultural context acts as a likely point of transition, with individual heterogeneity acting as an important contributor. Thus, understanding the dynamics of disease spread in the wildlife-livestock interface is crucial for mitigating these risks of transmission. Specifically, the interactions between pigeons and in-door cows at dairy farms can lead to significant disease transmission and economic losses for farmers; putting livestock, adjacent human populations, and other wildlife species at risk. In this paper, we propose a novel spatio-temporal multi-pathogen model with continuous spatial movement. The model expands on the Susceptible-Exposed-Infected-Recovered-Dead (SEIRD) framework and accounts for both within-species and cross-species transmission of pathogens, as well as the exploration-exploitation movement dynamics of pigeons, which play a critical role in the spread of infection agents. In addition to model formulation, we also implement it as an agent-based simulation approach and use empirical field data to investigate different biologically realistic scenarios, evaluating the effect of various parameters on the epidemic spread. Namely, in agreement with theoretical expectations, the model predicts that the heterogeneity of the pigeons' movement dynamics can drastically affect both the magnitude and stability of outbreaks. In addition, joint infection by multiple pathogens can have an interactive effect unobservable in single-pathogen SIR models, reflecting a non-intuitive inhibition of the outbreak. Our findings highlight the impact of heterogeneity in host behavior on their pathogens and allow realistic predictions of outbreak dynamics in the multi-pathogen wildlife-livestock interface with consequences to zoonotic diseases in various systems.
Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved r2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.
We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.