In this paper, we target self-supervised representation learning for zero-shot tumor segmentation. We make the following contributions: First, we advocate a zero-shot setting, where models from pre-training should be directly applicable for the downstream task, without using any manual annotations. Second, we take inspiration from "layer-decomposition", and innovate on the training regime with simulated tumor data. Third, we conduct extensive ablation studies to analyse the critical components in data simulation, and validate the necessity of different proxy tasks. We demonstrate that, with sufficient texture randomization in simulation, model trained on synthetic data can effortlessly generalise to segment real tumor data. Forth, our approach achieves superior results for zero-shot tumor segmentation on different downstream datasets, BraTS2018 for brain tumor segmentation and LiTS2017 for liver tumor segmentation. While evaluating the model transferability for tumor segmentation under a low-annotation regime, the proposed approach also outperforms all existing self-supervised approaches, opening up the usage of self-supervised learning in practical scenarios.
We present a novel approach to unsupervised learning for video object segmentation (VOS). Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime. We rely on uniform grid sampling to extract a set of anchors and train our model to disambiguate between them on both inter- and intra-video levels. However, a naive scheme to train such a model results in a degenerate solution. We propose to prevent this with a simple regularisation scheme, accommodating the equivariance property of the segmentation task to similarity transformations. Our training objective admits efficient implementation and exhibits fast training convergence. On established VOS benchmarks, our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
While self-training has advanced semi-supervised semantic segmentation, it severely suffers from the long-tailed class distribution on real-world semantic segmentation datasets that make the pseudo-labeled data bias toward majority classes. In this paper, we present a simple and yet effective Distribution Alignment and Random Sampling (DARS) method to produce unbiased pseudo labels that match the true class distribution estimated from the labeled data. Besides, we also contribute a progressive data augmentation and labeling strategy to facilitate model training with pseudo-labeled data. Experiments on both Cityscapes and PASCAL VOC 2012 datasets demonstrate the effectiveness of our approach. Albeit simple, our method performs favorably in comparison with state-of-the-art approaches. Code will be available at //github.com/CVMI-Lab/DARS.
The usage of smartphone-collected respiratory sound, trained with deep learning models, for detecting and classifying COVID-19 becomes popular recently. It removes the need for in-person testing procedures especially for rural regions where related medical supplies, experienced workers, and equipment are limited. However, existing sound-based diagnostic approaches are trained in a fully supervised manner, which requires large scale well-labelled data. It is critical to discover new methods to leverage unlabelled respiratory data, which can be obtained more easily. In this paper, we propose a novel self-supervised learning enabled framework for COVID-19 cough classification. A contrastive pre-training phase is introduced to train a Transformer-based feature encoder with unlabelled data. Specifically, we design a random masking mechanism to learn robust representations of respiratory sounds. The pre-trained feature encoder is then fine-tuned in the downstream phase to perform cough classification. In addition, different ensembles with varied random masking rates are also explored in the downstream phase. Through extensive evaluations, we demonstrate that the proposed contrastive pre-training, the random masking mechanism, and the ensemble architecture contribute to improving cough classification performance.
Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes -- a crucial component in CL -- remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structures and attributes of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation.
Deep learning-based semi-supervised learning (SSL) algorithms have led to promising results in medical images segmentation and can alleviate doctors' expensive annotations by leveraging unlabeled data. However, most of the existing SSL algorithms in literature tend to regularize the model training by perturbing networks and/or data. Observing that multi/dual-task learning attends to various levels of information which have inherent prediction perturbation, we ask the question in this work: can we explicitly build task-level regularization rather than implicitly constructing networks- and/or data-level perturbation-and-transformation for SSL? To answer this question, we propose a novel dual-task-consistency semi-supervised framework for the first time. Concretely, we use a dual-task deep network that jointly predicts a pixel-wise segmentation map and a geometry-aware level set representation of the target. The level set representation is converted to an approximated segmentation map through a differentiable task transform layer. Simultaneously, we introduce a dual-task consistency regularization between the level set-derived segmentation maps and directly predicted segmentation maps for both labeled and unlabeled data. Extensive experiments on two public datasets show that our method can largely improve the performance by incorporating the unlabeled data. Meanwhile, our framework outperforms the state-of-the-art semi-supervised medical image segmentation methods. Code is available at: //github.com/Luoxd1996/DTC
Few-shot image classification aims to classify unseen classes with limited labeled samples. Recent works benefit from the meta-learning process with episodic tasks and can fast adapt to class from training to testing. Due to the limited number of samples for each task, the initial embedding network for meta learning becomes an essential component and can largely affects the performance in practice. To this end, many pre-trained methods have been proposed, and most of them are trained in supervised way with limited transfer ability for unseen classes. In this paper, we proposed to train a more generalized embedding network with self-supervised learning (SSL) which can provide slow and robust representation for downstream tasks by learning from the data itself. We evaluate our work by extensive comparisons with previous baseline methods on two few-shot classification datasets ({\em i.e.,} MiniImageNet and CUB). Based on the evaluation results, the proposed method achieves significantly better performance, i.e., improve 1-shot and 5-shot tasks by nearly \textbf{3\%} and \textbf{4\%} on MiniImageNet, by nearly \textbf{9\%} and \textbf{3\%} on CUB. Moreover, the proposed method can gain the improvement of (\textbf{15\%}, \textbf{13\%}) on MiniImageNet and (\textbf{15\%}, \textbf{8\%}) on CUB by pretraining using more unlabeled data. Our code will be available at \hyperref[//github.com/phecy/SSL-FEW-SHOT.]{//github.com/phecy/ssl-few-shot.}
In this paper, we aim to improve the performance of semantic image segmentation in a semi-supervised setting in which training is effectuated with a reduced set of annotated images and additional non-annotated images. We present a method based on an ensemble of deep segmentation models. Each model is trained on a subset of the annotated data, and uses the non-annotated images to exchange information with the other models, similar to co-training. Even if each model learns on the same non-annotated images, diversity is preserved with the use of adversarial samples. Our results show that this ability to simultaneously train models, which exchange knowledge while preserving diversity, leads to state-of-the-art results on two challenging medical image datasets.
In this work, we study the problem of training deep networks for semantic image segmentation using only a fraction of annotated images, which may significantly reduce human annotation efforts. Particularly, we propose a strategy that exploits the unpaired image style transfer capabilities of CycleGAN in semi-supervised segmentation. Unlike recent works using adversarial learning for semi-supervised segmentation, we enforce cycle consistency to learn a bidirectional mapping between unpaired images and segmentation masks. This adds an unsupervised regularization effect that boosts the segmentation performance when annotated data is limited. Experiments on three different public segmentation benchmarks (PASCAL VOC 2012, Cityscapes and ACDC) demonstrate the effectiveness of the proposed method. The proposed model achieves 2-4% of improvement with respect to the baseline and outperforms recent approaches for this task, particularly in low labeled data regime.
Biomedical image segmentation is an important task in many medical applications. Segmentation methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling datasets of medical images requires significant expertise and time, and is infeasible at large scales. To tackle the lack of labeled data, researchers use techniques such as hand-engineered preprocessing steps, hand-tuned architectures, and data augmentation. However, these techniques involve costly engineering efforts, and are typically dataset-specific. We present an automated data augmentation method for medical images. We demonstrate our method on the task of segmenting magnetic resonance imaging (MRI) brain scans, focusing on the one-shot segmentation scenario -- a practical challenge in many medical applications. Our method requires only a single segmented scan, and leverages other unlabeled scans in a semi-supervised approach. We learn a model of transforms from the images, and use the model along with the labeled example to synthesize additional labeled training examples for supervised segmentation. Each transform is comprised of a spatial deformation field and an intensity change, enabling the synthesis of complex effects such as variations in anatomy and image acquisition procedures. Augmenting the training of a supervised segmenter with these new examples provides significant improvements over state-of-the-art methods for one-shot biomedical image segmentation. Our code is available at //github.com/xamyzhao/brainstorm.
We propose an Active Learning approach to image segmentation that exploits geometric priors to streamline the annotation process. We demonstrate this for both background-foreground and multi-class segmentation tasks in 2D images and 3D image volumes. Our approach combines geometric smoothness priors in the image space with more traditional uncertainty measures to estimate which pixels or voxels are most in need of annotation. For multi-class settings, we additionally introduce two novel criteria for uncertainty. In the 3D case, we use the resulting uncertainty measure to show the annotator voxels lying on the same planar patch, which makes batch annotation much easier than if they were randomly distributed in the volume. The planar patch is found using a branch-and-bound algorithm that finds a patch with the most informative instances. We evaluate our approach on Electron Microscopy and Magnetic Resonance image volumes, as well as on regular images of horses and faces. We demonstrate a substantial performance increase over state-of-the-art approaches.