We explore the effect of nursing home status on healthcare outcomes such as hospitalisation, mortality and in-hospital mortality during the COVID-19 pandemic. Some claim that in specific Autonomous Communities (geopolitical divisions) in Spain, elderly people in nursing homes had restrictions on access to hospitals and treatments, which raised a public outcry about the fairness of such measures. In this work, the case of the Basque Country is studied under a rigorous statistical approach and a physician's perspective. As fairness/unfairness is hard to model mathematically and has strong real-world implications, this work concentrates on the following simplification: establishing if the nursing home status had a direct effect on healthcare outcomes once accounted for other meaningful patients' information such as age, health status and period of the pandemic, among others. The methods followed here are a combination of established techniques as well as new proposals from the fields of causality and fair learning. The current analysis suggests that as a group, people in nursing homes were significantly less likely to be hospitalised, and considerably more likely to die, even in hospitals, compared to their non-residents counterparts during most of the pandemic. Further data collection and analysis are needed to guarantee that this is solely/mainly due to nursing home status.
Informal reasoning ability is the ability to reason based on common sense, experience, and intuition.Humans use informal reasoning every day to extract the most influential elements for their decision-making from a large amount of life-like information.With the rapid development of language models, the realization of general artificial intelligence has emerged with hope. Given the outstanding informal reasoning ability of humans, how much informal reasoning ability language models have has not been well studied by scholars.In order to explore the gap between humans and language models in informal reasoning ability, this paper constructs a Detective Reasoning Benchmark, which is an assembly of 1,200 questions gathered from accessible online resources, aims at evaluating the model's informal reasoning ability in real-life context.Considering the improvement of the model's informal reasoning ability restricted by the lack of benchmark, we further propose a Self-Question Prompt Framework that mimics human thinking to enhance the model's informal reasoning ability.The goals of self-question are to find key elements, deeply investigate the connections between these elements, encourage the relationship between each element and the problem, and finally, require the model to reasonably answer the problem.The experimental results show that human performance greatly outperforms the SoTA Language Models in Detective Reasoning Benchmark.Besides, Self-Question is proven to be the most effective prompt engineering in improving GPT-4's informal reasoning ability, but it still does not even surpass the lowest score made by human participants.Upon acceptance of the paper, the source code for the benchmark will be made publicly accessible.
Deep learning methods have shown outstanding classification accuracy in medical imaging problems, which is largely attributed to the availability of large-scale datasets manually annotated with clean labels. However, given the high cost of such manual annotation, new medical imaging classification problems may need to rely on machine-generated noisy labels extracted from radiology reports. Indeed, many Chest X-ray (CXR) classifiers have already been modelled from datasets with noisy labels, but their training procedure is in general not robust to noisy-label samples, leading to sub-optimal models. Furthermore, CXR datasets are mostly multi-label, so current noisy-label learning methods designed for multi-class problems cannot be easily adapted. In this paper, we propose a new method designed for the noisy multi-label CXR learning, which detects and smoothly re-labels samples from the dataset, which is then used to train common multi-label classifiers. The proposed method optimises a bag of multi-label descriptors (BoMD) to promote their similarity with the semantic descriptors produced by BERT models from the multi-label image annotation. Our experiments on diverse noisy multi-label training sets and clean testing sets show that our model has state-of-the-art accuracy and robustness in many CXR multi-label classification benchmarks.
Precisely reconstructing and manipulating crumpled cloths is challenging due to the high dimensionality of the cloth model, as well as the limited observation at self-occluded regions. We leverage the recent progress in the field of single-view human body reconstruction to template-based reconstruct the crumpled cloths from their top-view depth observations only, with our proposed sim-real registration protocols. In contrast to previous implicit cloth representations, our reconstruction mesh explicitly indicates the positions and visibilities of the entire cloth mesh vertices, enabling more efficient dual-arm and single-arm target-oriented manipulations. Experiments demonstrate that our template-based reconstruction and target-oriented manipulation (TRTM) system can be applied to daily cloths with similar topologies as our template mesh, but have different shapes, sizes, patterns, and physical properties. Videos, datasets, pre-trained models, and code can be downloaded from our project website: //wenbwa.github.io/TRTM/.
The abilities to understand the social interaction behaviors between a vehicle and its surroundings while predicting its trajectory in an urban environment are critical for road safety in autonomous driving. Social interactions are hard to explain because of their uncertainty. In recent years, neural network-based methods have been widely used for trajectory prediction and have been shown to outperform hand-crafted methods. However, these methods suffer from their lack of interpretability. In order to overcome this limitation, we combine the interpretability of a discrete choice model with the high accuracy of a neural network-based model for the task of vehicle trajectory prediction in an interactive environment. We implement and evaluate our model using the INTERACTION dataset and demonstrate the effectiveness of our proposed architecture to explain its predictions without compromising the accuracy.
Health outcomes, such as body mass index and cholesterol levels, are known to be dependent on age and exhibit varying effects with their associated risk factors. In this paper, we propose a novel framework for dynamic modeling of the associations between health outcomes and risk factors using varying-coefficients (VC) regional quantile regression via K-nearest neighbors (KNN) fused Lasso, which captures the time-varying effects of age. The proposed method has strong theoretical properties, including a tight estimation error bound and the ability to detect exact clustered patterns under certain regularity conditions. To efficiently solve the resulting optimization problem, we develop an alternating direction method of multipliers (ADMM) algorithm. Our empirical results demonstrate the efficacy of the proposed method in capturing the complex age-dependent associations between health outcomes and their risk factors.
Gender disparities in health outcomes have garnered significant attention, prompting investigations into their underlying causes. Glioblastoma (GBM), a devastating and highly aggressive form of brain tumor, serves as a case for such inquiries. Despite the mounting evidence on gender disparities in GBM outcomes, investigations specific at the molecular level remain scarce and often limited by confounding biases in observational studies. In this study, I aimed to investigate the gender-related differences in GBM outcomes using propensity score matching (PSM) to control for potential confounding variables. The data used was accessed from the Cancer Genome Atlas (TCGA), encompassing factors such as gender, age, molecular characteristics and different glioma grades. Propensity scores were calculated for each patient using logistic regression, representing the likelihood of being male based on the baseline characteristics. Subsequently, patients were matched using the nearest-neighbor (with a restricted caliper) matching to create a balanced male-female group. After PSM, 303 male-female pairs were identified, with similar baseline characteristics in terms of age and molecular features. The analysis revealed a higher incidence of GBM in males compared to females, after adjusting for potential confounding factors. This study contributes to the discourse on gender equity in health, paving the way for targeted interventions and improved outcomes, and may guide efforts to improve gender-specific treatment strategies for GBM patients. However, further investigations and prospective studies are warranted to validate these findings and explore additional factors that might contribute to the observed gender-based differences in GBM outcomes aside from the molecular characteristics.
Promoting sustainable transportation options is increasingly crucial in the pursuit of environmentally friendly and efficient campus mobility systems. Among these options, bike-sharing programs have garnered substantial attention for their capacity to mitigate traffic congestion, decrease carbon emissions, and enhance overall campus sustainability. However, improper selection of bike-sharing sites has led to the growing problems of unsustainable practices in campus, including the disorderly parking and indiscriminate placement of bike-sharing. To this end, this paper proposes a novel sustainable development-oriented campus bike-sharing site evaluation model integrating the improved Delphi and fuzzy comprehensive evaluation approaches. Fourteen evaluation metrics are firstly selected from four dimensions: the user features, implementation and usage characteristics of parking spots, environmental sustainability, and social sustainability, through the combination of expert experience and the improved Delphi method. Then, the analytic hierarchy process and the entropy weight method are employed to determine the weights of the evaluation indices, ensuring a robust and objective assessment framework. The fuzzy comprehensive evaluation method is finally implemented to evaluate the quality of location selection. South Campus of Henan Polytechnic University is selected as a case study using the proposed evaluation system. This work contributes to the existing body of knowledge by presenting a comprehensive location selection evaluation system for campus bike-sharing, informed by the principles of sustainable development.
In freeze drying, thermal radiation has a significant effect on the drying process of vials located near the corner and edge of the trays, resulting in non-uniformity of the products. Understanding and being able to predict the impact of thermal radiation are therefore critical to accurate determination of the drying process endpoint given the variation in heat transfer of each vial. This article presents a new mechanistic model that describes complex thermal radiation during primary drying in conventional, microwave-assisted, and hybrid freeze drying. Modeling of thermal radiation employs the diffuse gray surface model and radiation network approach, which systematically and accurately incorporates simultaneous radiation exchange between every surface including the chamber wall and vials, allowing the framework to be seamlessly applied for analyzing various freeze-dryer designs. Model validation with data from the literature shows accurate prediction of the drying times for all vials, including inner, edge, and corner vials. The validated model is demonstrated for thermal radiation analysis and parametric studies to guide the design and optimization of freeze dryers.
The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.