Metal artifact correction is a challenging problem in cone beam computed tomography (CBCT) scanning. Metal implants inserted into the anatomy cause severe artifacts in reconstructed images. Widely used inpainting-based metal artifact reduction (MAR) methods require segmentation of metal traces in the projections as a first step, which is a challenging task. One approach is to use a deep learning method to segment metals in the projections. However, the success of deep learning methods is limited by the availability of realistic training data. It is laborious and time consuming to get reliable ground truth annotations due to unclear implant boundaries and large numbers of projections. We propose to use X-ray simulations to generate synthetic metal segmentation training dataset from clinical CBCT scans. We compare the effect of simulations with different numbers of photons and also compare several training strategies to augment the available data. We compare our model's performance on real clinical scans with conventional region growing threshold-based MAR, moving metal artifact reduction method, and a recent deep learning method. We show that simulations with relatively small number of photons are suitable for the metal segmentation task and that training the deep learning model with full size and cropped projections together improves the robustness of the model. We show substantial improvement in the image quality affected by severe motion, voxel size under-sampling, and out-of-FOV metals. Our method can be easily integrated into the existing projection-based MAR pipeline to get improved image quality. This method can provide a novel paradigm to accurately segment metals in CBCT projections.
Sequential neural posterior estimation (SNPE) techniques have been recently proposed for dealing with simulation-based models with intractable likelihoods. Unlike approximate Bayesian computation, SNPE techniques learn the posterior from sequential simulation using neural network-based conditional density estimators by minimizing a specific loss function. The SNPE method proposed by Lueckmann et al. (2017) used a calibration kernel to boost the sample weights around the observed data, resulting in a concentrated loss function. However, the use of calibration kernels may increase the variances of both the empirical loss and its gradient, making the training inefficient. To improve the stability of SNPE, this paper proposes to use an adaptive calibration kernel and several variance reduction techniques. The proposed method greatly speeds up the process of training, and provides a better approximation of the posterior than the original SNPE method and some existing competitors as confirmed by numerical experiments.
Appendicitis is among the most frequent reasons for pediatric abdominal surgeries. Previous decision support systems for appendicitis have focused on clinical, laboratory, scoring, and computed tomography data and have ignored abdominal ultrasound, despite its noninvasive nature and widespread availability. In this work, we present interpretable machine learning models for predicting the diagnosis, management and severity of suspected appendicitis using ultrasound images. Our approach utilizes concept bottleneck models (CBM) that facilitate interpretation and interaction with high-level concepts understandable to clinicians. Furthermore, we extend CBMs to prediction problems with multiple views and incomplete concept sets. Our models were trained on a dataset comprising 579 pediatric patients with 1709 ultrasound images accompanied by clinical and laboratory data. Results show that our proposed method enables clinicians to utilize a human-understandable and intervenable predictive model without compromising performance or requiring time-consuming image annotation when deployed. For predicting the diagnosis, the extended multiview CBM attained an AUROC of 0.80 and an AUPR of 0.92, performing comparably to similar black-box neural networks trained and tested on the same dataset.
We resurrect the infamous harmonic mean estimator for computing the marginal likelihood (Bayesian evidence) and solve its problematic large variance. The marginal likelihood is a key component of Bayesian model selection to evaluate model posterior probabilities; however, its computation is challenging. The original harmonic mean estimator, first proposed by Newton and Raftery in 1994, involves computing the harmonic mean of the likelihood given samples from the posterior. It was immediately realised that the original estimator can fail catastrophically since its variance can become very large (possibly not finite). A number of variants of the harmonic mean estimator have been proposed to address this issue although none have proven fully satisfactory. We present the \emph{learnt harmonic mean estimator}, a variant of the original estimator that solves its large variance problem. This is achieved by interpreting the harmonic mean estimator as importance sampling and introducing a new target distribution. The new target distribution is learned to approximate the optimal but inaccessible target, while minimising the variance of the resulting estimator. Since the estimator requires samples of the posterior only, it is agnostic to the sampling strategy used. We validate the estimator on a variety of numerical experiments, including a number of pathological examples where the original harmonic mean estimator fails catastrophically. We also consider a cosmological application, where our approach leads to $\sim$ 3 to 6 times more samples than current state-of-the-art techniques in 1/3 of the time. In all cases our learnt harmonic mean estimator is shown to be highly accurate. The estimator is computationally scalable and can be applied to problems of dimension $O(10^3)$ and beyond. Code implementing the learnt harmonic mean estimator is made publicly available
Previous researchers conducting Just-In-Time (JIT) defect prediction tasks have primarily focused on the performance of individual pre-trained models, without exploring the relationship between different pre-trained models as backbones. In this study, we build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We systematically explore the differences and connections between these models. Specifically, we investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution among these six models. Additionally, we conduct an ablation experiment to explore the sensitivity of each model to inputs. Furthermore, we investigate how the models perform in zero-shot and few-shot scenarios. Our findings indicate that each model based on different backbones shows improvements, and when the backbone's pre-training model is similar, the training resources that need to be consumed are much more closer. We also observe that Commit code plays a significant role in defect detection, and different pre-trained models demonstrate better defect detection ability with a balanced dataset under few-shot scenarios. These results provide new insights for optimizing JIT defect prediction tasks using pre-trained models and highlight the factors that require more attention when constructing such models. Additionally, CodeGPTJIT and GPT2JIT achieved better performance than DeepJIT and CC2Vec on the two datasets respectively under 2000 training samples. These findings emphasize the effectiveness of transformer-based pre-trained models in JIT defect prediction tasks, especially in scenarios with limited training data.
We propose and compare methods for the analysis of extreme events in complex systems governed by PDEs that involve random parameters, in situations where we are interested in quantifying the probability that a scalar function of the system's solution is above a threshold. If the threshold is large, this probability is small and its accurate estimation is challenging. To tackle this difficulty, we blend theoretical results from large deviation theory (LDT) with numerical tools from PDE-constrained optimization. Our methods first compute parameters that minimize the LDT-rate function over the set of parameters leading to extreme events, using adjoint methods to compute the gradient of this rate function. The minimizers give information about the mechanism of the extreme events as well as estimates of their probability. We then propose a series of methods to refine these estimates, either via importance sampling or geometric approximation of the extreme event sets. Results are formulated for general parameter distributions and detailed expressions are provided when Gaussian distributions. We give theoretical and numerical arguments showing that the performance of our methods is insensitive to the extremeness of the events we are interested in. We illustrate the application of our approach to quantify the probability of extreme tsunami events on shore. Tsunamis are typically caused by a sudden, unpredictable change of the ocean floor elevation during an earthquake. We model this change as a random process, which takes into account the underlying physics. We use the one-dimensional shallow water equation to model tsunamis numerically. In the context of this example, we present a comparison of our methods for extreme event probability estimation, and find which type of ocean floor elevation change leads to the largest tsunamis on shore.
Mediation analysis assesses the extent to which the exposure affects the outcome indirectly through a mediator and the extent to which it operates directly through other pathways. As the most popular method in empirical mediation analysis, the Baron-Kenny approach estimates the indirect and direct effects of the exposure on the outcome based on linear structural equation models. However, when the exposure and the mediator are not randomized, the estimates may be biased due to unmeasured confounding among the exposure, mediator, and outcome. Building on Cinelli and Hazlett (2020), we derive general omitted-variable bias formulas in linear regressions with vector responses and regressors. We then use the formulas to develop a sensitivity analysis method for the Baron-Kenny approach to mediation in the presence of unmeasured confounding. To ensure interpretability, we express the sensitivity parameters to correspond to the natural factorization of the joint distribution of the direct acyclic graph for mediation analysis. They measure the partial correlation between the unmeasured confounder and the exposure, mediator, outcome, respectively. With the sensitivity parameters, we propose a novel measure called the "robustness value for mediation" or simply the "robustness value", to assess the robustness of results based on the Baron-Kenny approach with respect to unmeasured confounding. Intuitively, the robustness value measures the minimum value of the maximum proportion of variability explained by the unmeasured confounding, for the exposure, mediator and outcome, to overturn the results of the point estimate or confidence interval for the direct and indirect effects. Importantly, we prove that all our sensitivity bounds are attainable and thus sharp.
Magnetic Resonance Imaging (MRI) is a non-invasive diagnostic and radiotherapy (RT) planning tool, offering detailed insights into the anatomy of the human body. The extensive scan time is stressful for patients, who must remain motionless in a prolonged imaging procedure that prioritizes reduction of imaging artifacts. This is challenging for pediatric patients who may require measures for managing voluntary motions such as anesthesia. Several computational approaches reduce scan time (fast MRI), by recording fewer measurements and digitally recovering full information via post-acquisition reconstruction. However, most fast MRI approaches were developed for diagnostic imaging, without addressing reconstruction challenges specific to RT planning. In this work, we developed a deep learning-based method (DeepMRIRec) for MRI reconstruction from undersampled data acquired with RT-specific receiver coil arrangements. We evaluated our method against fully sampled data of T1-weighted MR images acquired from 73 children with brain tumors/surgical beds using loop and posterior coils (12 channels), with and without applying virtual compression of coil elements. DeepMRIRec reduced scanning time by a factor of four producing a structural similarity score surpassing the evaluated state-of-the-art method (0.960 vs 0.896), thereby demonstrating its potential for accelerating MRI scanning for RT planning.
As a surrogate for computationally intensive meso-scale simulation of woven composites, this article presents Recurrent Neural Network (RNN) models. Leveraging the power of transfer learning, the initialization challenges and sparse data issues inherent in cyclic shear strain loads are addressed in the RNN models. A mean-field model generates a comprehensive data set representing elasto-plastic behavior. In simulations, arbitrary six-dimensional strain histories are used to predict stresses under random walking as the source task and cyclic loading conditions as the target task. Incorporating sub-scale properties enhances RNN versatility. In order to achieve accurate predictions, the model uses a grid search method to tune network architecture and hyper-parameter configurations. The results of this study demonstrate that transfer learning can be used to effectively adapt the RNN to varying strain conditions, which establishes its potential as a useful tool for modeling path-dependent responses in woven composites.
Artificial intelligence has made significant progress in the Close World problem, being able to accurately recognize old knowledge through training and classification. However, AI faces significant challenges in the Open World problem, as it involves a new and unknown exploration journey. AI is not inherently proactive in exploration, and its challenge lies in not knowing how to approach and adapt to the unknown world. How do humans acquire knowledge of the unknown world. Humans identify new knowledge through intrinsic cognition. In the process of recognizing new colors, the cognitive cues are different from known color features and involve hue, saturation, brightness, and other characteristics. When AI encounters objects with different features in the new world, it faces another challenge: where are the distinguishing features between influential features of new and old objects? AI often mistakes a new world's brown bear for a known dog because it has not learned the differences in feature distributions between knowledge systems. This is because things in the new and old worlds have different units and dimensions for their features. This paper proposes an open-world model and elemental feature system that focuses on fundamentally recognizing the distribution differences in objective features between the new and old worlds. The quantum tunneling effect of learning ability in the new and old worlds is realized through the tractive force of meta-characteristic. The outstanding performance of the model system in learning new knowledge (using pedestrian re-identification datasets as an example) demonstrates that AI has acquired the ability to recognize the new world with an accuracy of $96.71\%$ at most and has gained the capability to explore new knowledge, similar to humans.
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.