Despite advancements in medical care, hip fractures impose a significant burden on individuals and healthcare systems. This paper focuses on the prediction of hip fracture risk in older and middle-aged adults, where falls and compromised bone quality are predominant factors. We propose a novel staged model that combines advanced imaging and clinical data to improve predictive performance. By using CNNs to extract features from hip DXA images, along with clinical variables, shape measurements, and texture features, our method provides a comprehensive framework for assessing fracture risk. A staged machine learning-based model was developed using two ensemble models: Ensemble 1 (clinical variables only) and Ensemble 2 (clinical variables and DXA imaging features). This staged approach used uncertainty quantification from Ensemble 1 to decide if DXA features are necessary for further prediction. Ensemble 2 exhibited the highest performance, achieving an AUC of 0.9541, an accuracy of 0.9195, a sensitivity of 0.8078, and a specificity of 0.9427. The staged model also performed well, with an AUC of 0.8486, an accuracy of 0.8611, a sensitivity of 0.5578, and a specificity of 0.9249, outperforming Ensemble 1, which had an AUC of 0.5549, an accuracy of 0.7239, a sensitivity of 0.1956, and a specificity of 0.8343. Furthermore, the staged model suggested that 54.49% of patients did not require DXA scanning. It effectively balanced accuracy and specificity, offering a robust solution when DXA data acquisition is not always feasible. Statistical tests confirmed significant differences between the models, highlighting the advantages of the advanced modeling strategies. Our staged approach could identify individuals at risk with a high accuracy but reduce the unnecessary DXA scanning. It has great promise to guide interventions to prevent hip fractures with reduced cost and radiation.
Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information. Deploying LLMs for medical question answering necessitates reliable uncertainty estimation (UE) methods to detect hallucinations. In this work, we benchmark popular UE methods with different model sizes on medical question-answering datasets. Our results show that current approaches generally perform poorly in this domain, highlighting the challenge of UE for medical applications. We also observe that larger models tend to yield better results, suggesting a correlation between model size and the reliability of UE. To address these challenges, we propose Two-phase Verification, a probability-free Uncertainty Estimation approach. First, an LLM generates a step-by-step explanation alongside its initial answer, followed by formulating verification questions to check the factual claims in the explanation. The model then answers these questions twice: first independently, and then referencing the explanation. Inconsistencies between the two sets of answers measure the uncertainty in the original response. We evaluate our approach on three biomedical question-answering datasets using Llama 2 Chat models and compare it against the benchmarked baseline methods. The results show that our Two-phase Verification method achieves the best overall accuracy and stability across various datasets and model sizes, and its performance scales as the model size increases.
Deep Vein Thrombosis (DVT) is a common yet potentially fatal condition, often leading to critical complications like pulmonary embolism. DVT is commonly diagnosed using Ultrasound (US) imaging, which can be inconsistent due to its high dependence on the operator's skill. Robotic US Systems (RUSs) aim to improve diagnostic test consistency but face challenges with the complex scanning pattern needed for DVT assessment, where precise control over US probe pressure is crucial for indirectly detecting occlusions. This work introduces an imitation learning method, based on Kernelized Movement Primitives (KMP), to standardize DVT US exams by training an autonomous robotic controller using sonographer demonstrations. A new recording device design enhances demonstration ergonomics, integrating with US probes and enabling seamless force and position data recording. KMPs are used to capture scanning skills, linking scan trajectory and force, enabling generalization beyond the demonstrations. Our approach, evaluated on synthetic models and volunteers, shows that the KMP-based RUS can replicate an expert's force control and image quality in DVT US examination. It outperforms previous methods using manually defined force profiles, improving exam standardization and reducing reliance on specialized sonographers.
Efficient finetuning of vision-language models (VLMs) like CLIP for specific downstream tasks is gaining significant attention. Previous works primarily focus on prompt learning to adapt the CLIP into a variety of downstream tasks, however, suffering from task overfitting when finetuned on a small data set. In this paper, we introduce an orthogonal finetuning method for efficiently updating pretrained weights which enhances robustness and generalization, while a cross-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed \textbf{\textit{OrthCR}}. Specifically, trainable orthogonal matrices are injected seamlessly into the transformer architecture and enforced with orthogonality constraint using Cayley parameterization, benefiting from the norm-preserving property and thus leading to stable and faster convergence. To alleviate deviation from orthogonal constraint during training, a cross-regularization strategy is further employed with initial pretrained weights within a bypass manner. In addition, to enrich the sample diversity for downstream tasks, we first explore Cutout data augmentation to boost the efficient finetuning and comprehend how our approach improves the specific downstream performance and maintains the generalizability in the perspective of Orthogonality Learning. Beyond existing prompt learning techniques, we conduct extensive experiments to demonstrate that our method explicitly steers pretrained weight space to represent the task-specific knowledge and presents competitive generalizability under \textit{base-to-base/base-to-new}, \textit{cross-dataset transfer} and \textit{domain generalization} evaluations.
The estimation and management of motor temperature are important for the continuous movements of robots. In this study, we propose an online learning method of thermal model parameters of motors for an accurate estimation of motor core temperature. Also, we propose a management method of motor core temperature using the updated model and anomaly detection method of motors. Finally, we apply this method to the muscles of the musculoskeletal humanoid and verify the ability of continuous movements.
Cybersecurity issues in medical devices threaten patient safety and can cause harm if exploited. Standards and regulations therefore require vendors of such devices to provide an assessment of the cybersecurity risks as well as a description of their mitigation. Security assurance cases (SACs) capture these elements as a structured argument. Compiling an SAC requires taking domain-specific regulations and requirements as well as the way of working into account. In this case study, we evaluate CASCADE, an approach for building SAC in the context of a large medical device manufacturer with an established agile development workflow. We investigate the regulatory context as well as the adaptations needed in the development process. Our results show the suitability of SACs in the medical device industry. We identified 17 use cases in which an SAC supports internal and external needs. The connection to safety assurance can be achieved by incorporating information from the risk assessment matrix into the SAC. Integration into the development process can be achieved by introducing a new role and rules for the design review and the release to production as well as additional criteria for the definition of done. We also show that SACs built with CASCADE fulfill the requirements of relevant standards in the medical domain such as ISO 14971.
Context: To effectively defend against ever-evolving cybersecurity threats, software systems should be made as secure as possible. To achieve this, software developers should understand potential vulnerabilities and apply secure coding practices. To prepare these skilled professionals, it is important that cybersecurity concepts are included in programming courses taught at universities. Objective: To present a comprehensive and unbiased literature review on teaching of cybersecurity concepts in programming courses taught at universities. Method: We perform a Systematic Mapping Study. We present six research questions, define our selection criteria, and develop a classification scheme. Results and Conclusions: We select 24 publications. Our results show a wide range of research contributions. We also outline guidelines and identify opportunities for future studies. The guidelines include coverage of security knowledge categories and evaluation of contributions. We suggest that future studies should cover security issues, negative impacts, and countermeasures, as well as apply evaluation techniques that examine students' knowledge. The opportunities for future studies are related to advanced courses, security knowledge frameworks, and programming environments. Furthermore, there is a need of a holistic security framework that covers the security concepts identified in this study and is suitable for education.
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
Deep convolutional neural networks (CNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and sharing will be described at the beginning, after that the other techniques will be introduced. For each scheme, we provide insightful analysis regarding the performance, related applications, advantages, and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrix, the main datasets used for evaluating the model performance and recent benchmarking efforts. Finally, we conclude this paper, discuss remaining challenges and possible directions on this topic.