Vision transformer-based methods are advancing the field of medical artificial intelligence and cancer imaging, including lung cancer applications. Recently, many researchers have developed vision transformer-based AI methods for lung cancer diagnosis and prognosis. This scoping review aims to identify the recent developments on vision transformer-based AI methods for lung cancer imaging applications. It provides key insights into how vision transformers complemented the performance of AI and deep learning methods for lung cancer. Furthermore, the review also identifies the datasets that contributed to advancing the field. Of the 314 retrieved studies, this review included 34 studies published from 2020 to 2022. The most commonly addressed task in these studies was the classification of lung cancer types, such as lung squamous cell carcinoma versus lung adenocarcinoma, and identifying benign versus malignant pulmonary nodules. Other applications included survival prediction of lung cancer patients and segmentation of lungs. The studies lacked clear strategies for clinical transformation. SWIN transformer was a popular choice of the researchers; however, many other architectures were also reported where vision transformer was combined with convolutional neural networks or UNet model. It can be concluded that vision transformer-based models are increasingly in popularity for developing AI methods for lung cancer applications. However, their computational complexity and clinical relevance are important factors to be considered for future research work. This review provides valuable insights for researchers in the field of AI and healthcare to advance the state-of-the-art in lung cancer diagnosis and prognosis. We provide an interactive dashboard on lung-cancer.onrender.com/.
Estimating the generalization error (GE) of machine learning models is fundamental, with resampling methods being the most common approach. However, in non-standard settings, particularly those where observations are not independently and identically distributed, resampling using simple random data divisions may lead to biased GE estimates. This paper strives to present well-grounded guidelines for GE estimation in various such non-standard settings: clustered data, spatial data, unequal sampling probabilities, concept drift, and hierarchically structured outcomes. Our overview combines well-established methodologies with other existing methods that, to our knowledge, have not been frequently considered in these particular settings. A unifying principle among these techniques is that the test data used in each iteration of the resampling procedure should reflect the new observations to which the model will be applied, while the training data should be representative of the entire data set used to obtain the final model. Beyond providing an overview, we address literature gaps by conducting simulation studies. These studies assess the necessity of using GE-estimation methods tailored to the respective setting. Our findings corroborate the concern that standard resampling methods often yield biased GE estimates in non-standard settings, underscoring the importance of tailored GE estimation.
Parametric reduced-order modelling often serves as a surrogate method for hemodynamics simulations to improve the computational efficiency in many-query scenarios or to perform real-time simulations. However, the snapshots of the method require to be collected from the same discretisation, which is a straightforward process for physical parameters, but becomes challenging for geometrical problems, especially for those domains featuring unparameterised and unique shapes, e.g. patient-specific geometries. In this work, a data-driven surrogate model is proposed for the efficient prediction of blood flow simulations on similar but distinct domains. The proposed surrogate model leverages group surface registration to parameterise those shapes and formulates corresponding hemodynamics information into geometry-informed snapshots by the diffeomorphisms constructed between a reference domain and original domains. A non-intrusive reduced-order model for geometrical parameters is subsequently constructed using proper orthogonal decomposition, and a radial basis function interpolator is trained for predicting the reduced coefficients of the reduced-order model based on compressed geometrical parameters of the shape. Two examples of blood flowing through a stenosis and a bifurcation are presented and analysed. The proposed surrogate model demonstrates its accuracy and efficiency in hemodynamics prediction and shows its potential application toward real-time simulation or uncertainty quantification for complex patient-specific scenarios.
This paper proposes the innovative concept of "human factors science" to characterize engineering psychology, human factors engineering, human-computer interaction, and other similar fields. Although the perspectives in these fields differ, they share a common approach: "human-centered design." In the AI era, the human-machine relationship presents a trans-era evolution to "human-AI teaming." The change has raised challenges for human factors science, compelling us to re-examine current research paradigms and agendas. Based on our previous work, this paper proposes three research paradigms: (1) human-AI joint cognitive systems: this regards an intelligent agent as a cognitive agent with a certain level of cognitive capabilities. A human-AI system can be characterized as a joint cognitive system in which humans and intelligent agents work as teammates for collaboration; (2) human-AI joint cognitive ecosystems: an intelligent ecosystem with multiple human-AI systems can be represented as a human-AI joint cognitive ecosystem. The overall performance of the ecosystem depends on optima collaboration and design across the multiple human-AI systems; (3) intelligent sociotechnical systems (iSTS): human-AI systems are design, developed, and deployed in an iSTS environment. The successful design, development, and deployment of a human-AI system within an iSTS environment depends on the synergistic optimization between the subsystems. This paper looks forward to the future research agenda of human factors science from three aspects: human-AI interaction, intelligent human-machine interface, and human-AI teaming. Analyses show that the three new research paradigms will benefit future research in human factors science. We believe the proposed research paradigms and the future research agenda will mutually promote each other, further advancing human factors science in the AI era.
Spatiotemporal dynamic medical imaging is critical in clinical applications, such as tomographic imaging of the heart or lung. To address such kind of spatiotemporal imaging problems, essentially, a time-dependent dynamic inverse problem, the variational model with intensity, edge feature and topology preservations was proposed for joint image reconstruction and motion estimation in the previous paper [C. Chen, B. Gris, and O. \"Oktem, SIAM J. Imaging Sci., 12 (2019), pp. 1686--1719], which is suitable to invert the time-dependent sparse sampling data for the motion target with large diffeomorphic deformations. However, the existence of solution to the model has not been given yet. In order to preserve its topological structure and edge feature of the motion target, the unknown velocity field in the model is restricted into the admissible Hilbert space, and the unknown template image is modeled in the space of bounded variation functions. Under this framework, this paper analyzes and proves the solution existence of its time-discretized version from the point view of optimal control. Specifically, there exists a constraint of transport equation in the equivalent optimal control model. We rigorously demonstrate the closure of the equation, including the solution existence and uniqueness, the stability of the associated nonlinear solution operator, and the convergence. Finally, the solution existence of that model can be concluded.
The algorithms available for retail forecasting have increased in complexity. Newer methods, such as machine learning, are inherently complex. The more traditional families of forecasting models, such as exponential smoothing and autoregressive integrated moving averages, have expanded to contain multiple possible forms and forecasting profiles. We question complexity in forecasting and the need to consider such large families of models. Our argument is that parsimoniously identifying suitable subsets of models will not decrease forecasting accuracy nor will it reduce the ability to estimate forecast uncertainty. We propose a framework that balances forecasting performance versus computational cost, resulting in the consideration of only a reduced set of models. We empirically demonstrate that a reduced set performs well. Finally, we translate computational benefits to monetary cost savings and environmental impact and discuss the implications of our results in the context of large retailers.
Recently efforts have been made by social media platforms as well as researchers to detect hateful or toxic language using large language models. However, none of these works aim to use explanation, additional context and victim community information in the detection process. We utilise different prompt variation, input information and evaluate large language models in zero shot setting (without adding any in-context examples). We select three large language models (GPT-3.5, text-davinci and Flan-T5) and three datasets - HateXplain, implicit hate and ToxicSpans. We find that on average including the target information in the pipeline improves the model performance substantially (~20-30%) over the baseline across the datasets. There is also a considerable effect of adding the rationales/explanations into the pipeline (~10-20%) over the baseline across the datasets. In addition, we further provide a typology of the error cases where these large language models fail to (i) classify and (ii) explain the reason for the decisions they take. Such vulnerable points automatically constitute 'jailbreak' prompts for these models and industry scale safeguard techniques need to be developed to make the models robust against such prompts.
In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only the reasons for the predictions themselves, but also the model's level of confidence in those predictions. In this paper, we show how existing methods in explainability can be extended to uncertainty-aware models and how such extensions can be used to understand the sources of uncertainty in a model's predictive distribution. In particular, by adapting permutation feature importance, partial dependence plots, and individual conditional expectation plots, we demonstrate that novel insights into model behaviour may be obtained and that these methods can be used to measure the impact of features on both the entropy of the predictive distribution and the log-likelihood of the ground truth labels under that distribution. With experiments using both synthetic and real-world data, we demonstrate the utility of these approaches in understanding both the sources of uncertainty and their impact on model performance.
Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions. In this scoping review, we aim to provide a comprehensive overview of the current state of the field and identify key concepts, types of studies, and research gaps with a focus on biomedical images and texts joint learning, mainly because these two were the most commonly available data types in MDL research. This study reviewed the current uses of multimodal deep learning on five tasks: (1) Report generation, (2) Visual question answering, (3) Cross-modal retrieval, (4) Computer-aided diagnosis, and (5) Semantic segmentation. Our results highlight the diverse applications and potential of MDL and suggest directions for future research in the field. We hope our review will facilitate the collaboration of natural language processing (NLP) and medical imaging communities and support the next generation of decision-making and computer-assisted diagnostic system development.
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.
A key requirement for the success of supervised deep learning is a large labeled dataset - a condition that is difficult to meet in medical image analysis. Self-supervised learning (SSL) can help in this regard by providing a strategy to pre-train a neural network with unlabeled data, followed by fine-tuning for a downstream task with limited annotations. Contrastive learning, a particular variant of SSL, is a powerful technique for learning image-level representations. In this work, we propose strategies for extending the contrastive learning framework for segmentation of volumetric medical images in the semi-supervised setting with limited annotations, by leveraging domain-specific and problem-specific cues. Specifically, we propose (1) novel contrasting strategies that leverage structural similarity across volumetric medical images (domain-specific cue) and (2) a local version of the contrastive loss to learn distinctive representations of local regions that are useful for per-pixel segmentation (problem-specific cue). We carry out an extensive evaluation on three Magnetic Resonance Imaging (MRI) datasets. In the limited annotation setting, the proposed method yields substantial improvements compared to other self-supervision and semi-supervised learning techniques. When combined with a simple data augmentation technique, the proposed method reaches within 8% of benchmark performance using only two labeled MRI volumes for training, corresponding to only 4% (for ACDC) of the training data used to train the benchmark.