Clinical data is often affected by clinically irrelevant factors such as discrepancies between measurement devices or differing processing methods between sites. In the field of machine learning (ML), these factors are known as domains and the distribution differences they cause in the data are known as domain shifts. ML models trained using data from one domain often perform poorly when applied to data from another domain, potentially leading to wrong predictions. As such, developing machine learning models that can generalise well across multiple domains is a challenging yet essential task in the successful application of ML in clinical practice. In this paper, we propose a novel disentangled autoencoder (Dis-AE) neural network architecture that can learn domain-invariant data representations for multi-label classification of medical measurements even when the data is influenced by multiple interacting domain shifts at once. The model utilises adversarial training to produce data representations from which the domain can no longer be determined. We evaluate the model's domain generalisation capabilities on synthetic datasets and full blood count (FBC) data from blood donors as well as primary and secondary care patients, showing that Dis-AE improves model generalisation on multiple domains simultaneously while preserving clinically relevant information.
Generation and exploration of approximate circuits and accelerators has been a prominent research domain to achieve energy-efficiency and/or performance improvements. This research has predominantly focused on ASICs, while not achieving similar gains when deployed for FPGA-based accelerator systems, due to the inherent architectural differences between the two. In this work, we propose a novel framework, Xel-FPGAs, which leverages statistical or machine learning models to effectively explore the architecture-space of state-of-the-art ASIC-based approximate circuits to cater them for FPGA-based systems given a simple RTL description of the target application. We have also evaluated the scalability of our framework on a multi-stage application using a hierarchical search strategy. The Xel-FPGAs framework is capable of reducing the exploration time by up to 95%, when compared to the default synthesis, place, and route approaches, while identifying an improved set of Pareto-optimal designs for a given application, when compared to the state-of-the-art. The complete framework is open-source and available online at //github.com/ehw-fit/xel-fpgas.
Multi-modal recommendation systems, which integrate diverse types of information, have gained widespread attention in recent years. However, compared to traditional collaborative filtering-based multi-modal recommendation systems, research on multi-modal sequential recommendation is still in its nascent stages. Unlike traditional sequential recommendation models that solely rely on item identifier (ID) information and focus on network structure design, multi-modal recommendation models need to emphasize item representation learning and the fusion of heterogeneous data sources. This paper investigates the impact of item representation learning on downstream recommendation tasks and examines the disparities in information fusion at different stages. Empirical experiments are conducted to demonstrate the need to design a framework suitable for collaborative learning and fusion of diverse information. Based on this, we propose a new model-agnostic framework for multi-modal sequential recommendation tasks, called Online Distillation-enhanced Multi-modal Transformer (ODMT), to enhance feature interaction and mutual learning among multi-source input (ID, text, and image), while avoiding conflicts among different features during training, thereby improving recommendation accuracy. To be specific, we first introduce an ID-aware Multi-modal Transformer module in the item representation learning stage to facilitate information interaction among different features. Secondly, we employ an online distillation training strategy in the prediction optimization stage to make multi-source data learn from each other and improve prediction robustness. Experimental results on a video content recommendation dataset and three e-commerce recommendation datasets demonstrate the effectiveness of the proposed two modules, which is approximately 10% improvement in performance compared to baseline models.
Ultrasound (US) is one of the most widely used modalities for clinical intervention and diagnosis due to the merits of providing non-invasive, radiation-free, and real-time images. However, free-hand US examinations are highly operator-dependent. Robotic US System (RUSS) aims at overcoming this shortcoming by offering reproducibility, while also aiming at improving dexterity, and intelligent anatomy and disease-aware imaging. In addition to enhancing diagnostic outcomes, RUSS also holds the potential to provide medical interventions for populations suffering from the shortage of experienced sonographers. In this paper, we categorize RUSS as teleoperated or autonomous. Regarding teleoperated RUSS, we summarize their technical developments, and clinical evaluations, respectively. This survey then focuses on the review of recent work on autonomous robotic US imaging. We demonstrate that machine learning and artificial intelligence present the key techniques, which enable intelligent patient and process-specific, motion and deformation-aware robotic image acquisition. We also show that the research on artificial intelligence for autonomous RUSS has directed the research community toward understanding and modeling expert sonographers' semantic reasoning and action. Here, we call this process, the recovery of the "language of sonography". This side result of research on autonomous robotic US acquisitions could be considered as valuable and essential as the progress made in the robotic US examination itself. This article will provide both engineers and clinicians with a comprehensive understanding of RUSS by surveying underlying techniques.
Gender disparities in health outcomes have garnered significant attention, prompting investigations into their underlying causes. Glioblastoma (GBM), a devastating and highly aggressive form of brain tumor, serves as a case for such inquiries. Despite the mounting evidence on gender disparities in GBM outcomes, investigations specific at the molecular level remain scarce and often limited by confounding biases in observational studies. In this study, I aimed to investigate the gender-related differences in GBM outcomes using propensity score matching (PSM) to control for potential confounding variables. The data used was accessed from the Cancer Genome Atlas (TCGA), encompassing factors such as gender, age, molecular characteristics and different glioma grades. Propensity scores were calculated for each patient using logistic regression, representing the likelihood of being male based on the baseline characteristics. Subsequently, patients were matched using the nearest-neighbor (with a restricted caliper) matching to create a balanced male-female group. After PSM, 303 male-female pairs were identified, with similar baseline characteristics in terms of age and molecular features. The analysis revealed a higher incidence of GBM in males compared to females, after adjusting for potential confounding factors. This study contributes to the discourse on gender equity in health, paving the way for targeted interventions and improved outcomes, and may guide efforts to improve gender-specific treatment strategies for GBM patients. However, further investigations and prospective studies are warranted to validate these findings and explore additional factors that might contribute to the observed gender-based differences in GBM outcomes aside from the molecular characteristics.
The efficacy of availability poisoning, a method of poisoning data by injecting imperceptible perturbations to prevent its use in model training, has been a hot subject of investigation. Previous research suggested that it was difficult to effectively counteract such poisoning attacks. However, the introduction of various defense methods has challenged this notion. Due to the rapid progress in this field, the performance of different novel methods cannot be accurately validated due to variations in experimental setups. To further evaluate the attack and defense capabilities of these poisoning methods, we have developed a benchmark -- APBench for assessing the efficacy of adversarial poisoning. APBench consists of 9 state-of-the-art availability poisoning attacks, 8 defense algorithms, and 4 conventional data augmentation techniques. We also have set up experiments with varying different poisoning ratios, and evaluated the attacks on multiple datasets and their transferability across model architectures. We further conducted a comprehensive evaluation of 2 additional attacks specifically targeting unsupervised models. Our results reveal the glaring inadequacy of existing attacks in safeguarding individual privacy. APBench is open source and available to the deep learning community: //github.com/lafeat/apbench.
Data-driven approaches have been proven effective in solving combinatorial optimization problems over graphs such as the traveling salesman problems and the vehicle routing problem. The rationale behind such methods is that the input instances may follow distributions with salient patterns that can be leveraged to overcome the worst-case computational hardness. For optimization problems over graphs, the common practice of neural combinatorial solvers consumes the inputs in the form of adjacency matrices. In this paper, we explore a vision-based method that is conceptually novel: can neural models solve graph optimization problems by \textit{taking a look at the graph pattern}? Our results suggest that the performance of such vision-based methods is not only non-trivial but also comparable to the state-of-the-art matrix-based methods, which opens a new avenue for developing data-driven optimization solvers.
The process of matching patients with suitable clinical trials is essential for advancing medical research and providing optimal care. However, current approaches face challenges such as data standardization, ethical considerations, and a lack of interoperability between Electronic Health Records (EHRs) and clinical trial criteria. In this paper, we explore the potential of large language models (LLMs) to address these challenges by leveraging their advanced natural language generation capabilities to improve compatibility between EHRs and clinical trial descriptions. We propose an innovative privacy-aware data augmentation approach for LLM-based patient-trial matching (LLM-PTM), which balances the benefits of LLMs while ensuring the security and confidentiality of sensitive patient data. Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%. Additionally, we present case studies to further illustrate the effectiveness of our approach and provide a deeper understanding of its underlying principles.
Panoptic segmentation is one of the most challenging scene parsing tasks, combining the tasks of semantic segmentation and instance segmentation. While much progress has been made, few works focus on the real-time application of panoptic segmentation methods. In this paper, we revisit the recently introduced K-Net architecture. We propose vital changes to the architecture, training, and inference procedure, which massively decrease latency and improve performance. Our resulting RT-K-Net sets a new state-of-the-art performance for real-time panoptic segmentation methods on the Cityscapes dataset and shows promising results on the challenging Mapillary Vistas dataset. On Cityscapes, RT-K-Net reaches 60.2 % PQ with an average inference time of 32 ms for full resolution 1024x2048 pixel images on a single Titan RTX GPU. On Mapillary Vistas, RT-K-Net reaches 33.2 % PQ with an average inference time of 69 ms. Source code is available at //github.com/markusschoen/RT-K-Net.
Deception detection in conversations is a challenging yet important task, having pivotal applications in many fields such as credibility assessment in business, multimedia anti-frauds, and custom security. Despite this, deception detection research is hindered by the lack of high-quality deception datasets, as well as the difficulties of learning multimodal features effectively. To address this issue, we introduce DOLOS\footnote {The name ``DOLOS" comes from Greek mythology.}, the largest gameshow deception detection dataset with rich deceptive conversations. DOLOS includes 1,675 video clips featuring 213 subjects, and it has been labeled with audio-visual feature annotations. We provide train-test, duration, and gender protocols to investigate the impact of different factors. We benchmark our dataset on previously proposed deception detection approaches. To further improve the performance by fine-tuning fewer parameters, we propose Parameter-Efficient Crossmodal Learning (PECL), where a Uniform Temporal Adapter (UT-Adapter) explores temporal attention in transformer-based architectures, and a crossmodal fusion module, Plug-in Audio-Visual Fusion (PAVF), combines crossmodal information from audio-visual features. Based on the rich fine-grained audio-visual annotations on DOLOS, we also exploit multi-task learning to enhance performance by concurrently predicting deception and audio-visual features. Experimental results demonstrate the desired quality of the DOLOS dataset and the effectiveness of the PECL. The DOLOS dataset and the source codes are available at //github.com/NMS05/Audio-Visual-Deception-Detection-DOLOS-Dataset-and-Parameter-Efficient-Crossmodal-Learning/tree/main.
Multi-modal fusion is a fundamental task for the perception of an autonomous driving system, which has recently intrigued many researchers. However, achieving a rather good performance is not an easy task due to the noisy raw data, underutilized information, and the misalignment of multi-modal sensors. In this paper, we provide a literature review of the existing multi-modal-based methods for perception tasks in autonomous driving. Generally, we make a detailed analysis including over 50 papers leveraging perception sensors including LiDAR and camera trying to solve object detection and semantic segmentation tasks. Different from traditional fusion methodology for categorizing fusion models, we propose an innovative way that divides them into two major classes, four minor classes by a more reasonable taxonomy in the view of the fusion stage. Moreover, we dive deep into the current fusion methods, focusing on the remaining problems and open-up discussions on the potential research opportunities. In conclusion, what we expect to do in this paper is to present a new taxonomy of multi-modal fusion methods for the autonomous driving perception tasks and provoke thoughts of the fusion-based techniques in the future.