In contrast to 2-D ultrasound (US) for uniaxial plane imaging, a 3-D US imaging system can visualize a volume along three axial planes. This allows for a full view of the anatomy, which is useful for gynecological (GYN) and obstetrical (OB) applications. Unfortunately, the 3-D US has an inherent limitation in resolution compared to the 2-D US. In the case of 3-D US with a 3-D mechanical probe, for example, the image quality is comparable along the beam direction, but significant deterioration in image quality is often observed in the other two axial image planes. To address this, here we propose a novel unsupervised deep learning approach to improve 3-D US image quality. In particular, using {\em unmatched} high-quality 2-D US images as a reference, we trained a recently proposed switchable CycleGAN architecture so that every mapping plane in 3-D US can learn the image quality of 2-D US images. Thanks to the switchable architecture, our network can also provide real-time control of image enhancement level based on user preference, which is ideal for a user-centric scanner setup. Extensive experiments with clinical evaluation confirm that our method offers significantly improved image quality as well user-friendly flexibility.
Biomechanical and clinical gait research observes muscles and tendons in limbs to study their functions and behaviour. Therefore, movements of distinct anatomical landmarks, such as muscle-tendon junctions, are frequently measured. We propose a reliable and time efficient machine-learning approach to track these junctions in ultrasound videos and support clinical biomechanists in gait analysis. In order to facilitate this process, a method based on deep-learning was introduced. We gathered an extensive dataset, covering 3 functional movements, 2 muscles, collected on 123 healthy and 38 impaired subjects with 3 different ultrasound systems, and providing a total of 66864 annotated ultrasound images in our network training. Furthermore, we used data collected across independent laboratories and curated by researchers with varying levels of experience. For the evaluation of our method a diverse test-set was selected that is independently verified by four specialists. We show that our model achieves similar performance scores to the four human specialists in identifying the muscle-tendon junction position. Our method provides time-efficient tracking of muscle-tendon junctions, with prediction times of up to 0.078 seconds per frame (approx. 100 times faster than manual labeling). All our codes, trained models and test-set were made publicly available and our model is provided as a free-to-use online service on //deepmtj.org/.
Semi-supervised learning (SSL), which aims at leveraging a few labeled images and a large number of unlabeled images for network training, is beneficial for relieving the burden of data annotation in medical image segmentation. According to the experience of medical imaging experts, local attributes such as texture, luster and smoothness are very important factors for identifying target objects like lesions and polyps in medical images. Motivated by this, we propose a cross-level constrastive learning scheme to enhance representation capacity for local features in semi-supervised medical image segmentation. Compared to existing image-wise, patch-wise and point-wise constrastive learning algorithms, our devised method is capable of exploring more complex similarity cues, namely the relational characteristics between global point-wise and local patch-wise representations. Additionally, for fully making use of cross-level semantic relations, we devise a novel consistency constraint that compares the predictions of patches against those of the full image. With the help of the cross-level contrastive learning and consistency constraint, the unlabelled data can be effectively explored to improve segmentation performance on two medical image datasets for polyp and skin lesion segmentation respectively. Code of our approach is available.
Inspired by the success of BERT, several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. In particular, LXMERT and UNITER adopt visual region feature regression and label classification as pretext tasks. However, they tend to suffer from the problems of noisy labels and sparse semantic annotations, based on the visual features having been pretrained on a crowdsourced dataset with limited and inconsistent semantic labeling. To overcome these issues, we propose unbiased Dense Contrastive Visual-Linguistic Pretraining (DCVLP), which replaces the region regression and classification with cross-modality region contrastive learning that requires no annotations. Two data augmentation strategies (Mask Perturbation and Intra-/Inter-Adversarial Perturbation) are developed to improve the quality of negative samples used in contrastive learning. Overall, DCVLP allows cross-modality dense region contrastive learning in a self-supervised setting independent of any object annotations. We compare our method against prior visual-linguistic pretraining frameworks to validate the superiority of dense contrastive learning on multimodal representation learning.
Unpaired image-to-image translation has been applied successfully to natural images but has received very little attention for manifold-valued data such as in diffusion tensor imaging (DTI). The non-Euclidean nature of DTI prevents current generative adversarial networks (GANs) from generating plausible images and has mainly limited their application to diffusion MRI scalar maps, such as fractional anisotropy (FA) or mean diffusivity (MD). Even if these scalar maps are clinically useful, they mostly ignore fiber orientations and therefore have limited applications for analyzing brain fibers. Here, we propose a manifold-aware CycleGAN that learns the generation of high-resolution DTI from unpaired T1w images. We formulate the objective as a Wasserstein distance minimization problem of data distributions on a Riemannian manifold of symmetric positive definite 3x3 matrices SPD(3), using adversarial and cycle-consistency losses. To ensure that the generated diffusion tensors lie on the SPD(3) manifold, we exploit the theoretical properties of the exponential and logarithm maps of the Log-Euclidean metric. We demonstrate that, unlike standard GANs, our method is able to generate realistic high-resolution DTI that can be used to compute diffusion-based metrics and potentially run fiber tractography algorithms. To evaluate our model's performance, we compute the cosine similarity between the generated tensors principal orientation and their ground-truth orientation, the mean squared error (MSE) of their derived FA values and the Log-Euclidean distance between the tensors. We demonstrate that our method produces 2.5 times better FA MSE than a standard CycleGAN and up to 30% better cosine similarity than a manifold-aware Wasserstein GAN while synthesizing sharp high-resolution DTI.
Deep learning has become the most widely used approach for cardiac image segmentation in recent years. In this paper, we provide a review of over 100 cardiac image segmentation papers using deep learning, which covers common imaging modalities including magnetic resonance imaging (MRI), computed tomography (CT), and ultrasound (US) and major anatomical structures of interest (ventricles, atria and vessels). In addition, a summary of publicly available cardiac image datasets and code repositories are included to provide a base for encouraging reproducible research. Finally, we discuss the challenges and limitations with current deep learning-based approaches (scarcity of labels, model generalizability across different domains, interpretability) and suggest potential directions for future research.
In this work, we study the problem of training deep networks for semantic image segmentation using only a fraction of annotated images, which may significantly reduce human annotation efforts. Particularly, we propose a strategy that exploits the unpaired image style transfer capabilities of CycleGAN in semi-supervised segmentation. Unlike recent works using adversarial learning for semi-supervised segmentation, we enforce cycle consistency to learn a bidirectional mapping between unpaired images and segmentation masks. This adds an unsupervised regularization effect that boosts the segmentation performance when annotated data is limited. Experiments on three different public segmentation benchmarks (PASCAL VOC 2012, Cityscapes and ACDC) demonstrate the effectiveness of the proposed method. The proposed model achieves 2-4% of improvement with respect to the baseline and outperforms recent approaches for this task, particularly in low labeled data regime.
Existing image inpainting methods typically fill holes by borrowing information from surrounding image regions. They often produce unsatisfactory results when the holes overlap with or touch foreground objects due to lack of information about the actual extent of foreground and background regions within the holes. These scenarios, however, are very important in practice, especially for applications such as distracting object removal. To address the problem, we propose a foreground-aware image inpainting system that explicitly disentangles structure inference and content completion. Specifically, our model learns to predict the foreground contour first, and then inpaints the missing region using the predicted contour as guidance. We show that by this disentanglement, the contour completion model predicts reasonable contours of objects, and further substantially improves the performance of image inpainting. Experiments show that our method significantly outperforms existing methods and achieves superior inpainting results on challenging cases with complex compositions.
Deep neural network models used for medical image segmentation are large because they are trained with high-resolution three-dimensional (3D) images. Graphics processing units (GPUs) are widely used to accelerate the trainings. However, the memory on a GPU is not large enough to train the models. A popular approach to tackling this problem is patch-based method, which divides a large image into small patches and trains the models with these small patches. However, this method would degrade the segmentation quality if a target object spans multiple patches. In this paper, we propose a novel approach for 3D medical image segmentation that utilizes the data-swapping, which swaps out intermediate data from GPU memory to CPU memory to enlarge the effective GPU memory size, for training high-resolution 3D medical images without patching. We carefully tuned parameters in the data-swapping method to obtain the best training performance for 3D U-Net, a widely used deep neural network model for medical image segmentation. We applied our tuning to train 3D U-Net with full-size images of 192 x 192 x 192 voxels in brain tumor dataset. As a result, communication overhead, which is the most important issue, was reduced by 17.1%. Compared with the patch-based method for patches of 128 x 128 x 128 voxels, our training for full-size images achieved improvement on the mean Dice score by 4.48% and 5.32 % for detecting whole tumor sub-region and tumor core sub-region, respectively. The total training time was reduced from 164 hours to 47 hours, resulting in 3.53 times of acceleration.
Tumor detection in biomedical imaging is a time-consuming process for medical professionals and is not without errors. Thus in recent decades, researchers have developed algorithmic techniques for image processing using a wide variety of mathematical methods, such as statistical modeling, variational techniques, and machine learning. In this paper, we propose a semi-automatic method for liver segmentation of 2D CT scans into three labels denoting healthy, vessel, or tumor tissue based on graph cuts. First, we create a feature vector for each pixel in a novel way that consists of the 59 intensity values in the time series data and propose a simplified perimeter cost term in the energy functional. We normalize the data and perimeter terms in the functional to expedite the graph cut without having to optimize the scaling parameter $\lambda$. In place of a training process, predetermined tissue means are computed based on sample regions identified by expert radiologists. The proposed method also has the advantage of being relatively simple to implement computationally. It was evaluated against the ground truth on a clinical CT dataset of 10 tumors and yielded segmentations with a mean Dice similarity coefficient (DSC) of .77 and mean volume overlap error (VOE) of 36.7%. The average processing time was 1.25 minutes per slice.
We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled a variety of applications, but the results are often limited to low-resolution and still far from realistic. In this work, we generate 2048x1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features. First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category. Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively. Human opinion studies demonstrate that our method significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.