Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential data like respiratory sounds is less explored. In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder. We also demonstrate a simple yet effective adversarial fine-tuning method to align features between the synthetic and real respiratory sound samples to improve respiratory sound classification performance. Our experimental results on the ICBHI dataset demonstrate that the proposed adversarial fine-tuning is effective, while only using the conventional augmentation method shows performance degradation. Moreover, our method outperforms the baseline by 2.24% on the ICBHI Score and improves the accuracy of the minority classes up to 26.58%. For the supplementary material, we provide the code at //github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound.
The engineering design process often relies on mathematical modeling that can describe the underlying dynamic behavior. In this work, we present a data-driven methodology for modeling the dynamics of nonlinear systems. To simplify this task, we aim to identify a coordinate transformation that allows us to represent the dynamics of nonlinear systems using a common, simple model structure. The advantage of a common simple model is that customized design tools developed for it can be applied to study a large variety of nonlinear systems. The simplest common model -- one can think of -- is linear, but linear systems often fall short in accurately capturing the complex dynamics of nonlinear systems. In this work, we propose using quadratic systems as the common structure, inspired by the lifting principle. According to this principle, smooth nonlinear systems can be expressed as quadratic systems in suitable coordinates without approximation errors. However, finding these coordinates solely from data is challenging. Here, we leverage deep learning to identify such lifted coordinates using only data, enabling a quadratic dynamical system to describe the system's dynamics. Additionally, we discuss the asymptotic stability of these quadratic dynamical systems. We illustrate the approach using data collected from various numerical examples, demonstrating its superior performance with the existing well-known techniques.
Wearable-based Human Activity Recognition (HAR) is a key task in human-centric machine learning due to its fundamental understanding of human behaviours. Due to the dynamic nature of human behaviours, continual learning promises HAR systems that are tailored to users' needs. However, because of the difficulty in collecting labelled data with wearable sensors, existing approaches that focus on supervised continual learning have limited applicability, while unsupervised continual learning methods only handle representation learning while delaying classifier training to a later stage. This work explores the adoption and adaptation of CaSSLe, a continual self-supervised learning model, and Kaizen, a semi-supervised continual learning model that balances representation learning and down-stream classification, for the task of wearable-based HAR. These schemes re-purpose contrastive learning for knowledge retention and, Kaizen combines that with self-training in a unified scheme that can leverage unlabelled and labelled data for continual learning. In addition to comparing state-of-the-art self-supervised continual learning schemes, we further investigated the importance of different loss terms and explored the trade-off between knowledge retention and learning from new tasks. In particular, our extensive evaluation demonstrated that the use of a weighting factor that reflects the ratio between learned and new classes achieves the best overall trade-off in continual learning.
Hybrid model predictive control with both continuous and discrete variables is widely applicable to robotic control tasks, especially those involving contact with the environment. Due to the combinatorial complexity, the solving speed of hybrid MPC can be insufficient for real-time applications. In this paper, we proposed a hybrid MPC solver based on Generalized Benders Decomposition (GBD). The algorithm enumerates and stores cutting planes online inside a finite buffer. After a short cold-start phase, the stored cuts provide warm-starts for the new problem instances to enhance the solving speed. Despite the disturbance and randomly changing environment, the solving speed maintains. Leveraging on the sparsity of feasibility cuts, we also propose a fast algorithm for Benders master problems. Our solver is validated through controlling a cart-pole system with randomly moving soft contact walls, and a free-flying robot navigating around obstacles. The results show that with significantly less data than previous works, the solver reaches competitive speeds to the off-the-shelf solver Gurobi despite the Python overhead.
A critical factor in trustworthy machine learning is to develop robust representations of the training data. Only under this guarantee methods are legitimate to artificially generate data, for example, to counteract imbalanced datasets or provide counterfactual explanations for blackbox decision-making systems. In recent years, Generative Adversarial Networks (GANs) have shown considerable results in forming stable representations and generating realistic data. While many applications focus on generating image data, less effort has been made in generating time series data, especially multivariate signals. In this work, a Transformer-based autoencoder is proposed that is regularized using an adversarial training scheme to generate artificial multivariate time series signals. The representation is evaluated using t-SNE visualizations, Dynamic Time Warping (DTW) and Entropy scores. Our results indicate that the generated signals exhibit higher similarity to an exemplary dataset than using a convolutional network approach.
Background: Identifying and characterising the longitudinal patterns of multimorbidity associated with stroke is needed to better understand patients' needs and inform new models of care. Methods: We used an unsupervised patient-oriented clustering approach to analyse primary care electronic health records (EHR) of 30 common long-term conditions (LTC), in patients with stroke aged over 18, registered in 41 general practices in south London between 2005 and 2021. Results: Of 849,968 registered patients, 9,847 (1.16%) had a record of stroke, 46.5% were female and median age at record was 65.0 year (IQR: 51.5 to 77.0). The median number of LTCs in addition to stroke was 3 (IQR: from 2 to 5). Patients were stratified in eight clusters. These clusters revealed contrasted patterns of multimorbidity, socio-demographic characteristics (age, gender and ethnicity) and risk factors. Beside a core of 3 clusters associated with conventional stroke risk-factors, minor clusters exhibited less common but recurrent combinations of LTCs including mental health conditions, asthma, osteoarthritis and sickle cell anaemia. Importantly, complex profiles combining mental health conditions, infectious diseases and substance dependency emerged. Conclusion: This patient-oriented approach to EHRs uncovers the heterogeneity of profiles of multimorbidity and socio-demographic characteristics associated with stroke. It highlights the importance of conventional stroke risk factors as well as the association of mental health conditions in complex profiles of multimorbidity displayed in a significant proportion of patients. These results address the need for a better understanding of stroke-associated multimorbidity and complexity to inform more efficient and patient-oriented healthcare models.
We challenge the perceived consensus that the application of deep learning to solve the automated driving planning task necessarily requires huge amounts of real-world data or highly realistic simulation. Focusing on a roundabout scenario, we show that this requirement can be relaxed in favour of targeted, simplistic simulated data. A benefit is that such data can be easily generated for critical scenarios that are typically underrepresented in realistic datasets. By applying vanilla behavioural cloning almost exclusively to lightweight simulated data, we achieve reliable and comfortable driving in a real-world test vehicle. We leverage an incremental development approach that includes regular in-vehicle testing to identify sim-to-real gaps, targeted data augmentation, and training scenario variations. In addition to a detailed description of the methodology, we share our lessons learned, touching upon scenario generation, simulation features, and evaluation metrics.
Deep generative models have been demonstrated as problematic in the unsupervised out-of-distribution (OOD) detection task, where they tend to assign higher likelihoods to OOD samples. Previous studies on this issue are usually not applicable to the Variational Autoencoder (VAE). As a popular subclass of generative models, the VAE can be effective with a relatively smaller model size and be more stable and faster in training and inference, which can be more advantageous in real-world applications. In this paper, We propose a novel VAE-based score called Error Reduction (ER) for OOD detection, which is based on a VAE that takes a lossy version of the training set as inputs and the original set as targets. Experiments are carried out on various datasets to show the effectiveness of our method, we also present the effect of design choices with ablation experiments. Our code is available at: //github.com/ZJLAB-AMMI/VAE4OOD.
When deploying machine learning models in high-stakes robotics applications, the ability to detect unsafe situations is crucial. Early warning systems can provide alerts when an unsafe situation is imminent (in the absence of corrective action). To reliably improve safety, these warning systems should have a provable false negative rate; i.e. of the situations that are unsafe, fewer than $\epsilon$ will occur without an alert. In this work, we present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics, in order to tune warning systems to provably achieve an $\epsilon$ false negative rate using as few as $1/\epsilon$ data points. We apply our framework to a driver warning system and a robotic grasping application, and empirically demonstrate guaranteed false negative rate while also observing low false detection (positive) rate.
As a scene graph compactly summarizes the high-level content of an image in a structured and symbolic manner, the similarity between scene graphs of two images reflects the relevance of their contents. Based on this idea, we propose a novel approach for image-to-image retrieval using scene graph similarity measured by graph neural networks. In our approach, graph neural networks are trained to predict the proxy image relevance measure, computed from human-annotated captions using a pre-trained sentence similarity model. We collect and publish the dataset for image relevance measured by human annotators to evaluate retrieval algorithms. The collected dataset shows that our method agrees well with the human perception of image similarity than other competitive baselines.
Graph Neural Networks (GNNs), which generalize deep neural networks to graph-structured data, have drawn considerable attention and achieved state-of-the-art performance in numerous graph related tasks. However, existing GNN models mainly focus on designing graph convolution operations. The graph pooling (or downsampling) operations, that play an important role in learning hierarchical representations, are usually overlooked. In this paper, we propose a novel graph pooling operator, called Hierarchical Graph Pooling with Structure Learning (HGP-SL), which can be integrated into various graph neural network architectures. HGP-SL incorporates graph pooling and structure learning into a unified module to generate hierarchical representations of graphs. More specifically, the graph pooling operation adaptively selects a subset of nodes to form an induced subgraph for the subsequent layers. To preserve the integrity of graph's topological information, we further introduce a structure learning mechanism to learn a refined graph structure for the pooled graph at each layer. By combining HGP-SL operator with graph neural networks, we perform graph level representation learning with focus on graph classification task. Experimental results on six widely used benchmarks demonstrate the effectiveness of our proposed model.