Background: Federated learning methods offer the possibility of training machine learning models on privacy-sensitive data sets, which cannot be easily shared. Multiple regulations pose strict requirements on the storage and usage of healthcare data, leading to data being in silos (i.e. locked-in at healthcare facilities). The application of federated algorithms on these datasets could accelerate disease diagnostic, drug development, as well as improve patient care. Methods: We present an extensive evaluation of the impact of different federation and differential privacy techniques when training models on the open-source MIMIC-III dataset. We analyze a set of parameters influencing a federated model performance, namely data distribution (homogeneous and heterogeneous), communication strategies (communication rounds vs. local training epochs), federation strategies (FedAvg vs. FedProx). Furthermore, we assess and compare two differential privacy (DP) techniques during model training: a stochastic gradient descent-based differential privacy algorithm (DP-SGD), and a sparse vector differential privacy technique (DP-SVT). Results: Our experiments show that extreme data distributions across sites (imbalance either in the number of patients or the positive label ratios between sites) lead to a deterioration of model performance when trained using the FedAvg strategy. This issue is resolved when using FedProx with the use of appropriate hyperparameter tuning. Furthermore, the results show that both differential privacy techniques can reach model performances similar to those of models trained without DP, however at the expense of a large quantifiable privacy leakage. Conclusions: We evaluate empirically the benefits of two federation strategies and propose optimal strategies for the choice of parameters when using differential privacy techniques.
Crypto-ransomware attacks have been a growing threat over the last few years. The goal of every ransomware strain is encrypting user data, such that attackers can later demand users a ransom for unlocking their data. To maximise their earning chances, attackers equip their ransomware with strong encryption which produce files with high entropy values. Davies et al. proposed Differential Area Analysis (DAA), a technique that analyses files headers to differentiate compressed, regularly encrypted, and ransomware-encrypted files. In this paper, first we propose three different attacks to perform malicious header manipulation and bypass DAA detection. Then, we propose three countermeasures, namely 2-Fragments (2F), 3-Fragments (3F), and 4-Fragments (4F), which can be applied equally against each of the three attacks we propose. We conduct a number of experiments to analyse the ability of our countermeasures to detect ransomware-encrypted files, whether implementing our proposed attacks or not. Last, we test the robustness of our own countermeasures by analysing the performance, in terms of files per second analysed and resilience to extensive injection of low-entropy data. Our results show that our detection countermeasures are viable and deployable alternatives to DAA.
We study the problem of federated stochastic multi-arm contextual bandits with unknown contexts, in which M agents are faced with different bandits and collaborate to learn. The communication model consists of a central server and the agents share their estimates with the central server periodically to learn to choose optimal actions in order to minimize the total regret. We assume that the exact contexts are not observable and the agents observe only a distribution of the contexts. Such a situation arises, for instance, when the context itself is a noisy measurement or based on a prediction mechanism. Our goal is to develop a distributed and federated algorithm that facilitates collaborative learning among the agents to select a sequence of optimal actions so as to maximize the cumulative reward. By performing a feature vector transformation, we propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions. Finally, we validated the performance of our algorithm and compared it with another baseline approach using numerical simulations on synthetic data and on the real-world movielens dataset.
In the context of simulation-based methods, multiple challenges arise, two of which are considered in this work. As a first challenge, problems including time-dependent phenomena with complex domain deformations, potentially even with changes in the domain topology, need to be tackled appropriately. The second challenge arises when computational resources and the time for evaluating the model become critical in so-called many query scenarios for parametric problems. For example, these problems occur in optimization, uncertainty quantification (UQ), or automatic control and using highly resolved full-order models (FOMs) may become impractical. To address both types of complexity, we present a novel projection-based model order reduction (MOR) approach for deforming domain problems that takes advantage of the time-continuous space-time formulation. We apply it to two examples that are relevant for engineering or biomedical applications and conduct an error and performance analysis. In both cases, we are able to drastically reduce the computational expense for a model evaluation and, at the same time, to maintain an adequate accuracy level. All in all, this work indicates the effectiveness of the presented MOR approach for deforming domain problems taking advantage of a time-continuous space-time setting.
Wildlife camera trap images are being used extensively to investigate animal abundance, habitat associations, and behavior, which is complicated by the fact that experts must first classify the images manually. Artificial intelligence systems can take over this task but usually need a large number of already-labeled training images to achieve sufficient performance. This requirement necessitates human expert labor and poses a particular challenge for projects with few cameras or short durations. We propose a label-efficient learning strategy that enables researchers with small or medium-sized image databases to leverage the potential of modern machine learning, thus freeing crucial resources for subsequent analyses. Our methodological proposal is two-fold: (1) We improve current strategies of combining object detection and image classification by tuning the hyperparameters of both models. (2) We provide an active learning (AL) system that allows training deep learning models very efficiently in terms of required human-labeled training images. We supply a software package that enables researchers to use these methods directly and thereby ensure the broad applicability of the proposed framework in ecological practice. We show that our tuning strategy improves predictive performance. We demonstrate how the AL pipeline reduces the amount of pre-labeled data needed to achieve a specific predictive performance and that it is especially valuable for improving out-of-sample predictive performance. We conclude that the combination of tuning and AL increases predictive performance substantially. Furthermore, we argue that our work can broadly impact the community through the ready-to-use software package provided. Finally, the publication of our models tailored to European wildlife data enriches existing model bases mostly trained on data from Africa and North America.
Many real-world dynamical systems are associated with first integrals (a.k.a. invariant quantities), which are quantities that remain unchanged over time. The discovery and understanding of first integrals are fundamental and important topics both in the natural sciences and in industrial applications. First integrals arise from the conservation laws of system energy, momentum, and mass, and from constraints on states; these are typically related to specific geometric structures of the governing equations. Existing neural networks designed to ensure such first integrals have shown excellent accuracy in modeling from data. However, these models incorporate the underlying structures, and in most situations where neural networks learn unknown systems, these structures are also unknown. This limitation needs to be overcome for scientific discovery and modeling of unknown systems. To this end, we propose first integral-preserving neural differential equation (FINDE). By leveraging the projection method and the discrete gradient method, FINDE finds and preserves first integrals from data, even in the absence of prior knowledge about underlying structures. Experimental results demonstrate that FINDE can predict future states of target systems much longer and find various quantities consistent with well-known first integrals in a unified manner.
Federated learning (FL) has been proposed to protect data privacy and virtually assemble the isolated data silos by cooperatively training models among organizations without breaching privacy and security. However, FL faces heterogeneity from various aspects, including data space, statistical, and system heterogeneity. For example, collaborative organizations without conflict of interest often come from different areas and have heterogeneous data from different feature spaces. Participants may also want to train heterogeneous personalized local models due to non-IID and imbalanced data distribution and various resource-constrained devices. Therefore, heterogeneous FL is proposed to address the problem of heterogeneity in FL. In this survey, we comprehensively investigate the domain of heterogeneous FL in terms of data space, statistical, system, and model heterogeneity. We first give an overview of FL, including its definition and categorization. Then, We propose a precise taxonomy of heterogeneous FL settings for each type of heterogeneity according to the problem setting and learning objective. We also investigate the transfer learning methodologies to tackle the heterogeneity in FL. We further present the applications of heterogeneous FL. Finally, we highlight the challenges and opportunities and envision promising future research directions toward new framework design and trustworthy approaches.
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.
We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.
Federated learning enables multiple parties to collaboratively train a machine learning model without communicating their local data. A key challenge in federated learning is to handle the heterogeneity of local data distribution across parties. Although many studies have been proposed to address this challenge, we find that they fail to achieve high performance in image datasets with deep learning models. In this paper, we propose MOON: model-contrastive federated learning. MOON is a simple and effective federated learning framework. The key idea of MOON is to utilize the similarity between model representations to correct the local training of individual parties, i.e., conducting contrastive learning in model-level. Our extensive experiments show that MOON significantly outperforms the other state-of-the-art federated learning algorithms on various image classification tasks.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.