Federated learning aims to infer a shared model from private and decentralized data stored locally by multiple clients. Personalized federated learning (PFL) goes one step further by adapting the global model to each client, enhancing the model's fit for different clients. A significant level of personalization is required for highly heterogeneous clients, but can be challenging to achieve especially when they have small datasets. To address this problem, we propose a PFL algorithm named PAC-PFL for learning probabilistic models within a PAC-Bayesian framework that utilizes differential privacy to handle data-dependent priors. Our algorithm collaboratively learns a shared hyper-posterior and regards each client's posterior inference as the personalization step. By establishing and minimizing a generalization bound on the average true risk of clients, PAC-PFL effectively combats over-fitting. PACPFL achieves accurate and well-calibrated predictions, supported by experiments on a dataset of photovoltaic panel power generation, FEMNIST dataset (Caldas et al., 2019), and Dirichlet-partitioned EMNIST dataset (Cohen et al., 2017).
Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deception and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is guaranteed to result in policies that deceptively inflate their performance, overjustify their behavior to make an impression, or both. To help address these issues, we mathematically characterize how partial observability of the environment translates into (lack of) ambiguity in the learned return function. In some cases, accounting for partial observability makes it theoretically possible to recover the return function and thus the optimal policy, while in other cases, there is irreducible ambiguity. We caution against blindly applying RLHF in partially observable settings and propose research directions to help tackle these challenges.
Membership inference attacks (MIA) aim to detect if a particular data point was used in training a machine learning model. Recent strong attacks have high computational costs and inconsistent performance under varying conditions, rendering them unreliable for practical privacy risk assessment. We design a novel, efficient, and robust membership inference attack (RMIA) which accurately differentiates between population data and training data of a model, with minimal computational overhead. We achieve this by a more accurate modeling of the null hypothesis setting in our likelihood ratio tests, and effectively leveraging both reference models and reference data samples from the population. Our algorithm exhibits superior test power (true-positive rate) compared to prior methods, throughout the TPR-FPR curve including at extremely low false-positive rates (as low as 0). Under computation constraints, where only a limited number of pre-trained reference models (as few as 1) are available, and also when we vary other elements of the attack, our method performs exceptionally well, unlike some prior attacks that approach random guessing. RMIA outperforms the prior work in all configurations of the attack setup. RMIA lays the algorithmic groundwork for practical yet accurate and reliable privacy risk analysis in machine learning.
Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modalities are absent at test time. To enable robustness to missing modalities, we propose a simple and parameter-efficient adaptation procedure for pretrained multimodal networks. In particular, we exploit modulation of intermediate features to compensate for the missing modalities. We demonstrate that such adaptation can partially bridge performance drop due to missing modalities and outperform independent, dedicated networks trained for the available modality combinations in some cases. The proposed adaptation requires extremely small number of parameters (e.g., fewer than 0.7% of the total parameters) and applicable to a wide range of modality combinations and tasks. We conduct a series of experiments to highlight the missing modality robustness of our proposed method on 5 different datasets for multimodal semantic segmentation, multimodal material segmentation, and multimodal sentiment analysis tasks. Our proposed method demonstrates versatility across various tasks and datasets, and outperforms existing methods for robust multimodal learning with missing modalities.
Federated learning encounters substantial challenges with heterogeneous data, leading to performance degradation and convergence issues. While considerable progress has been achieved in mitigating such an impact, the reliability aspect of federated models has been largely disregarded. In this study, we conduct extensive experiments to investigate the reliability of both generic and personalized federated models. Our exploration uncovers a significant finding: \textbf{federated models exhibit unreliability when faced with heterogeneous data}, demonstrating poor calibration on in-distribution test data and low uncertainty levels on out-of-distribution data. This unreliability is primarily attributed to the presence of biased projection heads, which introduce miscalibration into the federated models. Inspired by this observation, we propose the "Assembled Projection Heads" (APH) method for enhancing the reliability of federated models. By treating the existing projection head parameters as priors, APH randomly samples multiple initialized parameters of projection heads from the prior and further performs targeted fine-tuning on locally available data under varying learning rates. Such a head ensemble introduces parameter diversity into the deterministic model, eliminating the bias and producing reliable predictions via head averaging. We evaluate the effectiveness of the proposed APH method across three prominent federated benchmarks. Experimental results validate the efficacy of APH in model calibration and uncertainty estimation. Notably, APH can be seamlessly integrated into various federated approaches but only requires less than 30\% additional computation cost for 100$\times$ inferences within large models.
Inverse reinforcement learning (IRL) aims to recover the reward function of an expert agent from demonstrations of behavior. It is well known that the IRL problem is fundamentally ill-posed, i.e., many reward functions can explain the demonstrations. For this reason, IRL has been recently reframed in terms of estimating the feasible reward set, thus, postponing the selection of a single reward. However, so far, the available formulations and algorithmic solutions have been proposed and analyzed mainly for the online setting, where the learner can interact with the environment and query the expert at will. This is clearly unrealistic in most practical applications, where the availability of an offline dataset is a much more common scenario. In this paper, we introduce a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting and we analyze the complexity of its estimation. This requires the introduction an original learning framework that copes with the intrinsic difficulty of the setting, for which the data coverage is not under control. Then, we propose two computationally and statistically efficient algorithms, IRLO and PIRLO, for addressing the problem. In particular, the latter adopts a specific form of pessimism to enforce the novel desirable property of inclusion monotonicity of the delivered feasible set. With this work, we aim to provide a panorama of the challenges of the offline IRL problem and how they can be fruitfully addressed.
Neural models that do not rely on pre-training have excelled in the keyphrase generation task with large annotated datasets. Meanwhile, new approaches have incorporated pre-trained language models (PLMs) for their data efficiency. However, there lacks a systematic study of how the two types of approaches compare and how different design choices can affect the performance of PLM-based models. To fill in this knowledge gap and facilitate a more informed use of PLMs for keyphrase extraction and keyphrase generation, we present an in-depth empirical study. Formulating keyphrase extraction as sequence labeling and keyphrase generation as sequence-to-sequence generation, we perform extensive experiments in three domains. After showing that PLMs have competitive high-resource performance and state-of-the-art low-resource performance, we investigate important design choices including in-domain PLMs, PLMs with different pre-training objectives, using PLMs with a parameter budget, and different formulations for present keyphrases. Further results show that (1) in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models; (2) with a fixed parameter budget, prioritizing model depth over width and allocating more layers in the encoder leads to better encoder-decoder models; and (3) introducing four in-domain PLMs, we achieve a competitive performance in the news domain and the state-of-the-art performance in the scientific domain.
Object detection is a fundamental task in computer vision and image processing. Current deep learning based object detectors have been highly successful with abundant labeled data. But in real life, it is not guaranteed that each object category has enough labeled samples for training. These large object detectors are easy to overfit when the training data is limited. Therefore, it is necessary to introduce few-shot learning and zero-shot learning into object detection, which can be named low-shot object detection together. Low-Shot Object Detection (LSOD) aims to detect objects from a few or even zero labeled data, which can be categorized into few-shot object detection (FSOD) and zero-shot object detection (ZSD), respectively. This paper conducts a comprehensive survey for deep learning based FSOD and ZSD. First, this survey classifies methods for FSOD and ZSD into different categories and discusses the pros and cons of them. Second, this survey reviews dataset settings and evaluation metrics for FSOD and ZSD, then analyzes the performance of different methods on these benchmarks. Finally, this survey discusses future challenges and promising directions for FSOD and ZSD.
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in many tasks. One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data. In this survey, we frame DA methods into three categories based on the diversity of augmented data, including paraphrasing, noising, and sampling. Our paper sets out to analyze DA methods in detail according to the above categories. Further, we also introduce their applications in NLP tasks as well as the challenges.
In contrast to batch learning where all training data is available at once, continual learning represents a family of methods that accumulate knowledge and learn continuously with data available in sequential order. Similar to the human learning process with the ability of learning, fusing, and accumulating new knowledge coming at different time steps, continual learning is considered to have high practical significance. Hence, continual learning has been studied in various artificial intelligence tasks. In this paper, we present a comprehensive review of the recent progress of continual learning in computer vision. In particular, the works are grouped by their representative techniques, including regularization, knowledge distillation, memory, generative replay, parameter isolation, and a combination of the above techniques. For each category of these techniques, both its characteristics and applications in computer vision are presented. At the end of this overview, several subareas, where continuous knowledge accumulation is potentially helpful while continual learning has not been well studied, are discussed.
Deep learning applies multiple processing layers to learn representations of data with multiple levels of feature extraction. This emerging technique has reshaped the research landscape of face recognition since 2014, launched by the breakthroughs of Deepface and DeepID methods. Since then, deep face recognition (FR) technique, which leverages the hierarchical architecture to learn discriminative face representation, has dramatically improved the state-of-the-art performance and fostered numerous successful real-world applications. In this paper, we provide a comprehensive survey of the recent developments on deep FR, covering the broad topics on algorithms, data, and scenes. First, we summarize different network architectures and loss functions proposed in the rapid evolution of the deep FR methods. Second, the related face processing methods are categorized into two classes: `one-to-many augmentation' and `many-to-one normalization'. Then, we summarize and compare the commonly used databases for both model training and evaluation. Third, we review miscellaneous scenes in deep FR, such as cross-factor, heterogenous, multiple-media and industry scenes. Finally, potential deficiencies of the current methods and several future directions are highlighted.