Paralytic Ileus (PI) patients are at high risk of death when admitted to the Intensive care unit (ICU), with mortality as high as 40\%. There is minimal research concerning PI patient mortality prediction. There is a need for more accurate prediction modeling for ICU patients diagnosed with PI. This paper demonstrates performance improvements in predicting the mortality of ICU patients diagnosed with PI after 24 hours of being admitted. The proposed framework, PMPI(Process Mining Model to predict mortality of PI patients), is a modification of the work used for prediction of in-hospital mortality for ICU patients with diabetes. PMPI demonstrates similar if not better performance with an Area under the ROC Curve (AUC) score of 0.82 compared to the best results of the existing literature. PMPI uses patient medical history, the time related to the events, and demographic information for prediction. The PMPI prediction framework has the potential to help medical teams in making better decisions for treatment and care for ICU patients with PI to increase their life expectancy.
Background: Several studies have highlighted the importance of considering sex differences in the diagnosis and treatment of Acute Coronary Syndrome (ACS). However, the identification of sex-specific risk markers in ACS sub-populations has been scarcely studied. The goal of this paper is to identify in-hospital mortality markers for women and men in ACS sub-populations from a public database of electronic health records (EHR) using machine learning methods. Methods: From the MIMIC-III database, we extracted 1,299 patients with ST-elevation myocardial infarction and 2,820 patients with Non-ST-elevation myocardial infarction. We trained and validated mortality prediction models and used an interpretability technique based on Shapley values to identify sex-specific markers for each sub-population. Results: The models based on eXtreme Gradient Boosting achieved the highest performance: AUC=0.94 (95\% CI:0.84-0.96) for STEMI and AUC=0.94 (95\% CI:0.80-0.90) for NSTEMI. For STEMI, the top markers in women are chronic kidney failure, high heart rate, and age over 70 years, while for men are acute kidney failure, high troponin T levels, and age over 75 years. In contrast, for NSTEMI, the top markers in women are low troponin levels, high urea level, and age over 80 years, and for men are high heart rate and creatinine levels, and age over 70 years. Conclusions: Our results show that it is possible to find significant and coherent sex-specific risk markers of different ACS sub-populations by interpreting machine learning mortality models trained on EHRs. Differences are observed in the identified risk markers between women and men, which highlight the importance of considering sex-specific markers to have more appropriate treatment strategies and better clinical outcomes.
Recently, there is great interest to investigate the application of deep learning models for the prediction of clinical events using electronic health records (EHR) data. In EHR data, a patient's history is often represented as a sequence of visits, and each visit contains multiple events. As a result, deep learning models developed for sequence modeling, like recurrent neural networks (RNNs) are common architecture for EHR-based clinical events predictive models. While a large variety of RNN models were proposed in the literature, it is unclear if complex architecture innovations will offer superior predictive performance. In order to move this field forward, a rigorous evaluation of various methods is needed. In this study, we conducted a thorough benchmark of RNN architectures in modeling EHR data. We used two prediction tasks: the risk for developing heart failure and the risk of early readmission for inpatient hospitalization. We found that simple gated RNN models, including GRUs and LSTMs, often offer competitive results when properly tuned with Bayesian Optimization, which is in line with similar to findings in the natural language processing (NLP) domain. For reproducibility, Our codebase is shared at //github.com/ZhiGroup/pytorch_ehr.
Despite an increasing reliance on fully-automated algorithmic decision-making in our day-to-day lives, human beings still make highly consequential decisions. As frequently seen in business, healthcare, and public policy, recommendations produced by algorithms are provided to human decision-makers to guide their decisions. While there exists a fast-growing literature evaluating the bias and fairness of such algorithmic recommendations, an overlooked question is whether they help humans make better decisions. We develop a statistical methodology for experimentally evaluating the causal impacts of algorithmic recommendations on human decisions. We also show how to examine whether algorithmic recommendations improve the fairness of human decisions and derive the optimal decision rules under various settings. We apply the proposed methodology to preliminary data from the first-ever randomized controlled trial that evaluates the pretrial Public Safety Assessment (PSA) in the criminal justice system. A goal of the PSA is to help judges decide which arrested individuals should be released. On the basis of the preliminary data available, we find that providing the PSA to the judge has little overall impact on the judge's decisions and subsequent arrestee behavior. However, our analysis yields some potentially suggestive evidence that the PSA may help avoid unnecessarily harsh decisions for female arrestees regardless of their risk levels while it encourages the judge to make stricter decisions for male arrestees who are deemed to be risky. In terms of fairness, the PSA appears to increase the gender bias against males while having little effect on any existing racial differences in judges' decision. Finally, we find that the PSA's recommendations might be unnecessarily severe unless the cost of a new crime is sufficiently high.
While the 10-year survival rate for localized prostate cancer patients is very good (>98%), side effects of treatment may limit quality of life significantly. Erectile dysfunction (ED) is a common burden associated with increasing age as well as prostate cancer treatment. Although many studies have investigated the factors affecting erectile dysfunction (ED) after prostate cancer treatment, only limited studies have investigated whether ED can be predicted before the start of treatment. The advent of machine learning (ML) based prediction tools in oncology offers a promising approach to improve accuracy of prediction and quality of care. Predicting ED may help aid shared decision making by making the advantages and disadvantages of certain treatments clear, so that a tailored treatment for an individual patient can be chosen. This study aimed to predict ED at 1-year and 2-year post-diagnosis based on patient demographics, clinical data and patient-reported outcomes (PROMs) measured at diagnosis.
This transformation of food delivery businesses to online platforms has gained high attention in recent years. This due to the availability of customizing ordering experiences, easy payment methods, fast delivery, and others. The competition between online food delivery providers has intensified to attain a wider range of customers. Hence, they should have a better understanding of their customers' needs and predict their purchasing decisions. Machine learning has a significant impact on companies' bottom line. They are used to construct models and strategies in industries that rely on big data and need a system to evaluate it fast and effectively. Predictive modeling is a type of machine learning that uses various regression algorithms, analytics, and statistics to estimate the probability of an occurrence. The incorporation of predictive models helps online food delivery providers to understand their customers. In this study, a dataset collected from 388 consumers in Bangalore, India was provided to predict their purchasing decisions. Four prediction models are considered: CART and C4.5 decision trees, random forest, and rule-based classifiers, and their accuracies in providing the correct class label are evaluated. The findings show that all models perform similarly, but the C4.5 outperforms them all with an accuracy of 91.67%.
There are many ways machine learning and big data analytics are used in the fight against the COVID-19 pandemic, including predictions, risk management, diagnostics, and prevention. This study focuses on predicting COVID-19 patient shielding -- identifying and protecting patients who are clinically extremely vulnerable from coronavirus. This study focuses on techniques used for the multi-label classification of medical text. Using the information published by the United Kingdom NHS and the World Health Organisation, we present a novel approach to predicting COVID-19 patient shielding as a multi-label classification problem. We use publicly available, de-identified ICU medical text data for our experiments. The labels are derived from the published COVID-19 patient shielding data. We present an extensive comparison across 12 multi-label classifiers from the simple binary relevance to neural networks and the most recent transformers. To the best of our knowledge this is the first comprehensive study, where such a range of multi-label classifiers for medical text are considered. We highlight the benefits of various approaches, and argue that, for the task at hand, both predictive accuracy and processing time are essential.
While there exist scores of natural languages, each with its unique features and idiosyncrasies, they all share a unifying theme: enabling human communication. We may thus reasonably predict that human cognition shapes how these languages evolve and are used. Assuming that the capacity to process information is roughly constant across human populations, we expect a surprisal--duration trade-off to arise both across and within languages. We analyse this trade-off using a corpus of 600 languages and, after controlling for several potential confounds, we find strong supporting evidence in both settings. Specifically, we find that, on average, phones are produced faster in languages where they are less surprising, and vice versa. Further, we confirm that more surprising phones are longer, on average, in 319 languages out of the 600. We thus conclude that there is strong evidence of a surprisal--duration trade-off in operation, both across and within the world's languages.
In clinical trials, response-adaptive randomization (RAR) has the appealing ability to assign more subjects to better-performing treatments based on interim results. The traditional RAR strategy alters the randomization ratio on a patient-by-patient basis; this has been heavily criticized for bias due to time-trends. An alternate approach is blocked RAR, which groups patients together in blocks and recomputes the randomization ratio in a block-wise fashion; the final analysis is then stratified by block. However, the typical blocked RAR design divides patients into equal-sized blocks, which is not generally optimal. This paper presents TrialMDP, an algorithm that designs two-armed blocked RAR clinical trials. Our method differs from past approaches in that it optimizes the size and number of blocks as well as their treatment allocations. That is, the algorithm yields a policy that adaptively chooses the size and composition of the next block, based on results seen up to that point in the trial. TrialMDP is related to past works that compute optimal trial designs via dynamic programming. The algorithm maximizes a utility function balancing (i) statistical power, (ii) patient outcomes, and (iii) the number of blocks. We show that it attains significant improvements in utility over a suite of baseline designs, and gives useful control over the tradeoff between statistical power and patient outcomes. It is well suited for small trials that assign high cost to failures. We provide TrialMDP as an R package on GitHub: //github.com/dpmerrell/TrialMDP
Prediction over tabular data is an essential task in many data science applications such as recommender systems, online advertising, medical treatment, etc. Tabular data is structured into rows and columns, with each row as a data sample and each column as a feature attribute. Both the columns and rows of the tabular data carry useful patterns that could improve the model prediction performance. However, most existing models focus on the cross-column patterns yet overlook the cross-row patterns as they deal with single samples independently. In this work, we propose a general learning framework named Retrieval & Interaction Machine (RIM) that fully exploits both cross-row and cross-column patterns among tabular data. Specifically, RIM first leverages search engine techniques to efficiently retrieve useful rows of the table to assist the label prediction of the target row, then uses feature interaction networks to capture the cross-column patterns among the target row and the retrieved rows so as to make the final label prediction. We conduct extensive experiments on 11 datasets of three important tasks, i.e., CTR prediction (classification), top-n recommendation (ranking) and rating prediction (regression). Experimental results show that RIM achieves significant improvements over the state-of-the-art and various baselines, demonstrating the superiority and efficacy of RIM.
Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.