Arthroplasty is commonly performed to treat joint osteoarthritis, reducing pain and improving mobility. While arthroplasty has known several technical improvements, a significant share of patients are still unsatisfied with their surgery. Personalised arthroplasty improves surgical outcomes however current solutions require delays, making it difficult to integrate in clinical routine. We propose a fully automated workflow to design patient-specific implants, presented for total knee arthroplasty, the most widely performed arthroplasty in the world nowadays. The proposed pipeline first uses artificial neural networks to segment the proximal and distal extremities of the femur and tibia. Then the full bones are reconstructed using augmented statistical shape models, combining shape and landmarks information. Finally, 77 morphological parameters are computed to design patient-specific implants. The developed workflow has been trained using 91 CT scans of lower limb and evaluated on 41 CT scans manually segmented, in terms of accuracy and execution time. The workflow accuracy was $0.4\pm0.2mm$ for the segmentation, $1.2\pm0.4mm$ for the full bones reconstruction, and $2.8\pm2.2mm$ for the anatomical landmarks determination. The custom implants fitted the patients' anatomy with $0.6\pm0.2mm$ accuracy. The whole process from segmentation to implants' design lasted about 5 minutes. The proposed workflow allows for a fast and reliable personalisation of knee implants, directly from the patient CT image without requiring any manual intervention. It establishes a patient-specific pre-operative planning for TKA in a very short time making it easily available for all patients. Combined with efficient implant manufacturing techniques, this solution could help answer the growing number of arthroplasties while reducing complications and improving the patients' satisfaction.
Publication bias (PB) is one of the serious issues in meta-analysis. Many existing methods dealing with PB are based on the normal-normal (NN) random-effects model assuming normal models in both the within-study and the between-study levels. For rare-event meta-analysis where the data contain rare occurrences of event, the standard NN random-effects model may perform poorly. Instead, the generalized linear mixed effects model (GLMM) using the exact within-study model is recommended. However, no method has been proposed for dealing with PB in rare-event meta-analysis using the GLMM. In this paper, we propose sensitivity analysis methods for evaluating the impact of PB on the GLMM based on the famous Copas-Heckman-type selection model. The proposed methods can be easily implemented with the standard software coring the nonlinear mixed-effects model. We use a real-world example to show how the usefulness of the proposed methods in evaluating the potential impact of PB in meta-analysis of the log-transformed odds ratio based on the GLMM using the non-central hypergeometric or binomial distribution as the within-study model. An extension of the proposed method is also introduced for evaluating PB in meta-analysis of proportion based on the GLMM with the binomial within-study model.
The rapid advancement in artificial intelligence (AI), particularly through deep neural networks, has catalyzed significant progress in fields such as vision and text processing. Nonetheless, the pursuit of AI systems that exhibit human-like reasoning and interpretability continues to pose a substantial challenge. The Neural-Symbolic paradigm, which integrates the deep learning prowess of neural networks with the reasoning capabilities of symbolic systems, presents a promising pathway toward developing more transparent and comprehensible AI systems. Within this paradigm, the Knowledge Graph (KG) emerges as a crucial element, offering a structured and dynamic method for representing knowledge through interconnected entities and relationships, predominantly utilizing the triple (subject, predicate, object). This paper explores recent advancements in neural-symbolic integration based on KG, elucidating how KG underpins this integration across three key categories: enhancing the reasoning and interpretability of neural networks through the incorporation of symbolic knowledge (Symbol for Neural), refining the completeness and accuracy of symbolic systems via neural network methodologies (Neural for Symbol), and facilitating their combined application in Hybrid Neural-Symbolic Integration. It highlights current trends and proposes directions for future research in the domain of Neural-Symbolic AI.
The unique solvability and error analysis of the original Lagrange multiplier approach proposed in [8] for gradient flows is studied in this paper. We identify a necessary and sufficient condition that must be satisfied for the nonlinear algebraic equation arising from the original Lagrange multiplier approach to admit a unique solution in the neighborhood of its exact solution, and propose a modified Lagrange multiplier approach so that the computation can continue even if the aforementioned condition is not satisfied. Using Cahn-Hilliard equation as an example, we prove rigorously the unique solvability and establish optimal error estimates of a second-order Lagrange multiplier scheme assuming this condition and that the time step is sufficient small. We also present numerical results to demonstrate that the modified Lagrange multiplier approach is much more robust and can use much larger time step than the original Lagrange multiplier approach.
The evaluation of text-generative vision-language models is a challenging yet crucial endeavor. By addressing the limitations of existing Visual Question Answering (VQA) benchmarks and proposing innovative evaluation methodologies, our research seeks to advance our understanding of these models' capabilities. We propose a novel VQA benchmark based on well-known visual classification datasets which allows a granular evaluation of text-generative vision-language models and their comparison with discriminative vision-language models. To improve the assessment of coarse answers on fine-grained classification tasks, we suggest using the semantic hierarchy of the label space to ask automatically generated follow-up questions about the ground-truth category. Finally, we compare traditional NLP and LLM-based metrics for the problem of evaluating model predictions given ground-truth answers. We perform a human evaluation study upon which we base our decision on the final metric. We apply our benchmark to a suite of vision-language models and show a detailed comparison of their abilities on object, action, and attribute classification. Our contributions aim to lay the foundation for more precise and meaningful assessments, facilitating targeted progress in the exciting field of vision-language modeling.
Rectified Linear Units (ReLU) have become the main model for the neural units in current deep learning systems. This choice has been originally suggested as a way to compensate for the so called vanishing gradient problem which can undercut stochastic gradient descent (SGD) learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural networks with either binary or real-valued weights. We study the problem of storing an extensive number of random patterns and find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases, at odds with the case of threshold units in which the capacity diverges. Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure: while the majority of solutions are close in distance but still isolated, there exist rare regions of solutions which are much more dense than the similar ones in the case of threshold units. These solutions are robust to perturbations of the weights and can tolerate large perturbations of the inputs. The analytical results are corroborated by numerical findings.
Designing protein nanomaterials of predefined shape and characteristics has the potential to dramatically impact the medical industry. Machine learning (ML) has proven successful in protein design, reducing the need for expensive wet lab experiment rounds. However, challenges persist in efficiently exploring the protein fitness landscapes to identify optimal protein designs. In response, we propose the use of AlphaZero to generate protein backbones, meeting shape and structural scoring requirements. We extend an existing Monte Carlo tree search (MCTS) framework by incorporating a novel threshold-based reward and secondary objectives to improve design precision. This innovation considerably outperforms existing approaches, leading to protein backbones that better respect structural scores. The application of AlphaZero is novel in the context of protein backbone design and demonstrates promising performance. AlphaZero consistently surpasses baseline MCTS by more than 100% in top-down protein design tasks. Additionally, our application of AlphaZero with secondary objectives uncovers further promising outcomes, indicating the potential of model-based reinforcement learning (RL) in navigating the intricate and nuanced aspects of protein design
While analogies are a common way to evaluate word embeddings in NLP, it is also of interest to investigate whether or not analogical reasoning is a task in itself that can be learned. In this paper, we test several ways to learn basic analogical reasoning, specifically focusing on analogies that are more typical of what is used to evaluate analogical reasoning in humans than those in commonly used NLP benchmarks. Our experiments find that models are able to learn analogical reasoning, even with a small amount of data. We additionally compare our models to a dataset with a human baseline, and find that after training, models approach human performance.
Prompt engineering is crucial for harnessing the potential of large language models (LLMs), especially in the medical domain where specialized terminology and phrasing is used. However, the efficacy of prompt engineering in the medical domain remains to be explored. In this work, 114 recent studies (2022-2024) applying prompt engineering in medicine, covering prompt learning (PL), prompt tuning (PT), and prompt design (PD) are reviewed. PD is the most prevalent (78 articles). In 12 papers, PD, PL, and PT terms were used interchangeably. ChatGPT is the most commonly used LLM, with seven papers using it for processing sensitive clinical data. Chain-of-Thought emerges as the most common prompt engineering technique. While PL and PT articles typically provide a baseline for evaluating prompt-based approaches, 64% of PD studies lack non-prompt-related baselines. We provide tables and figures summarizing existing work, and reporting recommendations to guide future research contributions.
Linkage methods are among the most popular algorithms for hierarchical clustering. Despite their relevance the current knowledge regarding the quality of the clustering produced by these methods is limited. Here, we improve the currently available bounds on the maximum diameter of the clustering obtained by complete-link for metric spaces. One of our new bounds, in contrast to the existing ones, allows us to separate complete-link from single-link in terms of approximation for the diameter, which corroborates the common perception that the former is more suitable than the latter when the goal is producing compact clusters. We also show that our techniques can be employed to derive upper bounds on the cohesion of a class of linkage methods that includes the quite popular average-link.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.