Although cancer patients survive years after oncologic therapy, they are plagued with long-lasting or permanent residual symptoms, whose severity, rate of development, and resolution after treatment vary largely between survivors. The analysis and interpretation of symptoms is complicated by their partial co-occurrence, variability across populations and across time, and, in the case of cancers that use radiotherapy, by further symptom dependency on the tumor location and prescribed treatment. We describe THALIS, an environment for visual analysis and knowledge discovery from cancer therapy symptom data, developed in close collaboration with oncology experts. Our approach leverages unsupervised machine learning methodology over cohorts of patients, and, in conjunction with custom visual encodings and interactions, provides context for new patients based on patients with similar diagnostic features and symptom evolution. We evaluate this approach on data collected from a cohort of head and neck cancer patients. Feedback from our clinician collaborators indicates that THALIS supports knowledge discovery beyond the limits of machines or humans alone, and that it serves as a valuable tool in both the clinic and symptom research.
Over the last year, the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and its variants have highlighted the importance of screening tools with high diagnostic accuracy for new illnesses such as COVID-19. To that regard, deep learning approaches have proven as effective solutions for pneumonia classification, especially when considering chest-x-rays images. However, this lung infection can also be caused by other viral, bacterial or fungi pathogens. Consequently, efforts are being poured toward distinguishing the infection source to help clinicians to diagnose the correct disease origin. Following this tendency, this study further explores the effectiveness of established neural network architectures on the pneumonia classification task through the transfer learning paradigm. To present a comprehensive comparison, 12 well-known ImageNet pre-trained models were fine-tuned and used to discriminate among chest-x-rays of healthy people, and those showing pneumonia symptoms derived from either a viral (i.e., generic or SARS-CoV-2) or bacterial source. Furthermore, since a common public collection distinguishing between such categories is currently not available, two distinct datasets of chest-x-rays images, describing the aforementioned sources, were combined and employed to evaluate the various architectures. The experiments were performed using a total of 6330 images split between train, validation and test sets. For all models, common classification metrics were computed (e.g., precision, f1-score) and most architectures obtained significant performances, reaching, among the others, up to 84.46% average f1-score when discriminating the 4 identified classes. Moreover, confusion matrices and activation maps computed via the Grad-CAM algorithm were also reported to present an informed discussion on the networks classifications.
While digital divide studies primarily focused on access to information and communications technology (ICT) in the past, its influence on other associated dimensions such as privacy is becoming critical with a far-reaching impact on the people and society. For example, the various levels of government legislation and compliance on information privacy worldwide have created a new era of digital divide in the privacy preservation domain. In this article, the concept "digital privacy divide (DPD)" is introduced to describe the perceived gap in the privacy preservation of individuals based on the geopolitical location of different countries. To better understand the DPD phenomenon, we created an online questionnaire and collected answers from more than 700 respondents from four different countries (the United States, Germany, Bangladesh, and India) who come from two distinct cultural orientations as per Hofstede's individualist vs. collectivist society. However, our results revealed some interesting findings. DPD does not depend on Hofstede's cultural orientation of the countries. For example, individuals residing in Germany and Bangladesh share similar privacy concerns, while there is a significant similarity among individuals residing in the United States and India. Moreover, while most respondents acknowledge the importance of privacy legislation to protect their digital privacy, they do not mind their governments to allow domestic companies and organizations collecting personal data on individuals residing outside their countries, if there are economic, employment, and crime prevention benefits. These results suggest a social dilemma in the perceived privacy preservation, which could be dependent on many other contextual factors beyond government legislation and countries' cultural orientation.
In health cohort studies, repeated measures of markers are often used to describe the natural history of a disease. Joint models allow to study their evolution by taking into account the possible informative dropout usually due to clinical events. However, joint modeling developments mostly focused on continuous Gaussian markers while, in an increasing number of studies, the actual marker of interest is non-directly measurable; it consitutes a latent quantity evaluated by a set of observed indicators from questionnaires or measurement scales. Classical examples include anxiety, fatigue, cognition. In this work, we explain how joint models can be extended to the framework of a latent quantity measured over time by markers of different nature (e.g. continuous, binary, ordinal). The longitudinal submodel describes the evolution over time of the quantity of interest defined as a latent process in a structural mixed model, and links the latent process to each marker repeated observation through appropriate measurement models. Simultaneously, the risk of multi-cause event is modelled via a proportional cause-specific hazard model that includes a function of the mixed model elements as linear predictor to take into account the association between the latent process and the risk of event. Estimation, carried out in the maximum likelihood framework and implemented in the R-package JLPM, has been validated by simulations. The methodology is illustrated in the French cohort on Multiple-System Atrophy (MSA), a rare and fatal neurodegenerative disease, with the study of dysphagia progression over time truncated by the occurrence of death.
Background: Several studies have highlighted the importance of considering sex differences in the diagnosis and treatment of Acute Coronary Syndrome (ACS). However, the identification of sex-specific risk markers in ACS sub-populations has been scarcely studied. The goal of this paper is to identify in-hospital mortality markers for women and men in ACS sub-populations from a public database of electronic health records (EHR) using machine learning methods. Methods: From the MIMIC-III database, we extracted 1,299 patients with ST-elevation myocardial infarction and 2,820 patients with Non-ST-elevation myocardial infarction. We trained and validated mortality prediction models and used an interpretability technique based on Shapley values to identify sex-specific markers for each sub-population. Results: The models based on eXtreme Gradient Boosting achieved the highest performance: AUC=0.94 (95\% CI:0.84-0.96) for STEMI and AUC=0.94 (95\% CI:0.80-0.90) for NSTEMI. For STEMI, the top markers in women are chronic kidney failure, high heart rate, and age over 70 years, while for men are acute kidney failure, high troponin T levels, and age over 75 years. In contrast, for NSTEMI, the top markers in women are low troponin levels, high urea level, and age over 80 years, and for men are high heart rate and creatinine levels, and age over 70 years. Conclusions: Our results show that it is possible to find significant and coherent sex-specific risk markers of different ACS sub-populations by interpreting machine learning mortality models trained on EHRs. Differences are observed in the identified risk markers between women and men, which highlight the importance of considering sex-specific markers to have more appropriate treatment strategies and better clinical outcomes.
Longitudinal fMRI datasets hold great promise for the study of neurodegenerative diseases, but realizing their potential depends on extracting accurate fMRI-based brain measures in individuals over time. This is especially true for rare, heterogeneous and/or rapidly progressing diseases, which often involve small samples whose functional features may vary dramatically across subjects and over time, making traditional group-difference analyses of limited utility. One such disease is ALS, which results in extreme motor function loss and eventual death. Here, we analyze a rich longitudinal dataset containing 190 motor task fMRI scans from 16 ALS patients and 22 age-matched HCs. We propose a novel longitudinal extension to our cortical surface-based spatial Bayesian GLM, which has high power and precision to detect activations in individuals. Using a series of longitudinal mixed-effects models to subsequently study the relationship between activation and disease progression, we observe an inverted U-shaped trajectory: at relatively mild disability we observe enlarging activations, while at higher disability we observe severely diminished activation, reflecting progression toward complete motor function loss. We observe distinct trajectories depending on clinical progression rate, with faster progressors exhibiting more extreme hyper-activation and subsequent hypo-activation. These differential trajectories suggest that initial hyper-activation is likely attributable to loss of inhibitory neurons. By contrast, earlier studies employing more limited sampling designs and using traditional group-difference analysis approaches were only able to observe the initial hyper-activation, which was assumed to be due to a compensatory process. This study provides a first example of how surface-based spatial Bayesian modeling furthers scientific understanding of neurodegenerative disease.
While the 10-year survival rate for localized prostate cancer patients is very good (>98%), side effects of treatment may limit quality of life significantly. Erectile dysfunction (ED) is a common burden associated with increasing age as well as prostate cancer treatment. Although many studies have investigated the factors affecting erectile dysfunction (ED) after prostate cancer treatment, only limited studies have investigated whether ED can be predicted before the start of treatment. The advent of machine learning (ML) based prediction tools in oncology offers a promising approach to improve accuracy of prediction and quality of care. Predicting ED may help aid shared decision making by making the advantages and disadvantages of certain treatments clear, so that a tailored treatment for an individual patient can be chosen. This study aimed to predict ED at 1-year and 2-year post-diagnosis based on patient demographics, clinical data and patient-reported outcomes (PROMs) measured at diagnosis.
The potential diagnostic applications of magnet-actuated capsules have been greatly increased in recent years. For most of these potential applications, accurate position control of the capsule have been highly demanding. However, the friction between the robot and the environment as well as the drag force from the tether play a significant role during the motion control of the capsule. Moreover, these forces especially the friction force are typically hard to model beforehand. In this paper, we first designed a magnet-actuated tethered capsule robot, where the driving magnet is mounted on the end of a robotic arm. Then, we proposed a learning-based approach to model the friction force between the capsule and the environment, with the goal of increasing the control accuracy of the whole system. Finally, several real robot experiments are demonstrated to showcase the effectiveness of our proposed approach.
Camera trapping is increasingly used to monitor wildlife, but this technology typically requires extensive data annotation. Recently, deep learning has significantly advanced automatic wildlife recognition. However, current methods are hampered by a dependence on large static data sets when wildlife data is intrinsically dynamic and involves long-tailed distributions. These two drawbacks can be overcome through a hybrid combination of machine learning and humans in the loop. Our proposed iterative human and automated identification approach is capable of learning from wildlife imagery data with a long-tailed distribution. Additionally, it includes self-updating learning that facilitates capturing the community dynamics of rapidly changing natural systems. Extensive experiments show that our approach can achieve a ~90% accuracy employing only ~20% of the human annotations of existing approaches. Our synergistic collaboration of humans and machines transforms deep learning from a relatively inefficient post-annotation tool to a collaborative on-going annotation tool that vastly relieves the burden of human annotation and enables efficient and constant model updates.
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.
Background: Social media has the capacity to afford the healthcare industry with valuable feedback from patients who reveal and express their medical decision-making process, as well as self-reported quality of life indicators both during and post treatment. In prior work, [Crannell et. al.], we have studied an active cancer patient population on Twitter and compiled a set of tweets describing their experience with this disease. We refer to these online public testimonies as "Invisible Patient Reported Outcomes" (iPROs), because they carry relevant indicators, yet are difficult to capture by conventional means of self-report. Methods: Our present study aims to identify tweets related to the patient experience as an additional informative tool for monitoring public health. Using Twitter's public streaming API, we compiled over 5.3 million "breast cancer" related tweets spanning September 2016 until mid December 2017. We combined supervised machine learning methods with natural language processing to sift tweets relevant to breast cancer patient experiences. We analyzed a sample of 845 breast cancer patient and survivor accounts, responsible for over 48,000 posts. We investigated tweet content with a hedonometric sentiment analysis to quantitatively extract emotionally charged topics. Results: We found that positive experiences were shared regarding patient treatment, raising support, and spreading awareness. Further discussions related to healthcare were prevalent and largely negative focusing on fear of political legislation that could result in loss of coverage. Conclusions: Social media can provide a positive outlet for patients to discuss their needs and concerns regarding their healthcare coverage and treatment needs. Capturing iPROs from online communication can help inform healthcare professionals and lead to more connected and personalized treatment regimens.