In Class-Incremental Learning (CIL) an image classification system is exposed to new classes in each learning session and must be updated incrementally. Methods approaching this problem have updated both the classification head and the feature extractor body at each session of CIL. In this work, we develop a baseline method, First Session Adaptation (FSA), that sheds light on the efficacy of existing CIL approaches and allows us to assess the relative performance contributions from head and body adaption. FSA adapts a pre-trained neural network body only on the first learning session and fixes it thereafter; a head based on linear discriminant analysis (LDA), is then placed on top of the adapted body, allowing exact updates through CIL. FSA is replay-free i.e.~it does not memorize examples from previous sessions of continual learning. To empirically motivate FSA, we first consider a diverse selection of 22 image-classification datasets, evaluating different heads and body adaptation techniques in high/low-shot offline settings. We find that the LDA head performs well and supports CIL out-of-the-box. We also find that Featurewise Layer Modulation (FiLM) adapters are highly effective in the few-shot setting, and full-body adaption in the high-shot setting. Second, we empirically investigate various CIL settings including high-shot CIL and few-shot CIL, including settings that have previously been used in the literature. We show that FSA significantly improves over the state-of-the-art in 15 of the 16 settings considered. FSA with FiLM adapters is especially performant in the few-shot setting. These results indicate that current approaches to continuous body adaptation are not working as expected. Finally, we propose a measure that can be applied to a set of unlabelled inputs which is predictive of the benefits of body adaptation.
The Adaptive Large Neighborhood Search (ALNS) algorithm has shown considerable success in solving complex combinatorial optimization problems (COPs). ALNS selects various heuristics adaptively during the search process, leveraging their strengths to find good solutions for optimization problems. However, the effectiveness of ALNS depends on the proper configuration of its selection and acceptance parameters. To address this limitation, we propose a Deep Reinforcement Learning (DRL) approach that selects heuristics, adjusts parameters, and controls the acceptance criteria during the search process. The proposed method aims to learn, based on the state of the search, how to configure the next iteration of the ALNS to obtain good solutions to the underlying optimization problem. We evaluate the proposed method on a time-dependent orienteering problem with stochastic weights and time windows, used in an IJCAI competition. The results show that our approach outperforms vanilla ALNS and ALNS tuned with Bayesian Optimization. In addition, it obtained better solutions than two state-of-the-art DRL approaches, which are the winning methods of the competition, with much fewer observations required for training. The implementation of our approach will be made publicly available.
Modern biomedical datasets are increasingly high dimensional and exhibit complex correlation structures. Generalized Linear Mixed Models (GLMMs) have long been employed to account for such dependencies. However, proper specification of the fixed and random effects in GLMMs is increasingly difficult in high dimensions, and computational complexity grows with increasing dimension of the random effects. We present a novel reformulation of the GLMM using a factor model decomposition of the random effects, enabling scalable computation of GLMMs in high dimensions by reducing the latent space from a large number of random effects to a smaller set of latent factors. We also extend our prior work to estimate model parameters using a modified Monte Carlo Expectation Conditional Minimization algorithm, allowing us to perform variable selection on both the fixed and random effects simultaneously. We show through simulation that through this factor model decomposition, our method can fit high dimensional penalized GLMMs faster than comparable methods and more easily scale to larger dimensions not previously seen in existing approaches.
Optimal transport (OT) is a powerful geometric tool used to compare and align probability measures following the least effort principle. Among many successful applications of OT in machine learning (ML), domain adaptation (DA) -- a field of study where the goal is to transfer a classifier from one labelled domain to another similar, yet different unlabelled or scarcely labelled domain -- has been historically among the most investigated ones. This success is due to the ability of OT to provide both a meaningful discrepancy measure to assess the similarity of two domains' distributions and a mapping that can project source domain data onto the target one. In this paper, we propose a principally new OT-based approach applied to DA that uses the closed-form solution of the OT problem given by an affine mapping and learns an embedding space for which this solution is optimal and computationally less complex. We show that our approach works in both homogeneous and heterogeneous DA settings and outperforms or is on par with other famous baselines based on both traditional OT and OT in incomparable spaces. Furthermore, we show that our proposed method vastly reduces computational complexity.
Regression methods assume that accurate labels are available for training. However, in certain scenarios, obtaining accurate labels may not be feasible, and relying on multiple specialists with differing opinions becomes necessary. Existing approaches addressing noisy labels often impose restrictive assumptions on the regression function. In contrast, this paper presents a novel, more flexible approach. Our method consists of two steps: estimating each labeler's expertise and combining their opinions using learned weights. We then regress the weighted average against the input features to build the prediction model. The proposed method is formally justified and empirically demonstrated to outperform existing techniques on simulated and real data. Furthermore, its flexibility enables the utilization of any machine learning technique in both steps. In summary, this method offers a simple, fast, and effective solution for training regression models with noisy labels derived from diverse expert opinions.
This paper introduces a novel Bayesian approach to detect changes in the variance of a Gaussian sequence model, focusing on quantifying the uncertainty in the change point locations and providing a scalable algorithm for inference. Such a measure of uncertainty is necessary when change point methods are deployed in sensitive applications, for example, when one is interested in determining whether an organ is viable for transplant. The key of our proposal is framing the problem as a product of multiple single changes in the scale parameter. We fit the model through an iterative procedure similar to what is done for additive models. The novelty is that each iteration returns a probability distribution on time instances, which captures the uncertainty in the change point location. Leveraging a recent result in the literature, we can show that our proposal is a variational approximation of the exact model posterior distribution. We study the algorithm's convergence and the change point localization rate. Extensive experiments in simulation studies illustrate the performance of our method and the possibility of generalizing it to more complex data-generating mechanisms. We apply the new model to an experiment involving a novel technique to assess the viability of a liver and oceanographic data.
Under partial-label learning (PLL) where, for each training instance, only a set of ambiguous candidate labels containing the unknown true label is accessible, contrastive learning has recently boosted the performance of PLL on vision tasks, attributed to representations learned by contrasting the same/different classes of entities. Without access to true labels, positive points are predicted using pseudo-labels that are inherently noisy, and negative points often require large batches or momentum encoders, resulting in unreliable similarity information and a high computational overhead. In this paper, we rethink a state-of-the-art contrastive PLL method PiCO[24], inspiring the design of a simple framework termed PaPi (Partial-label learning with a guided Prototypical classifier), which demonstrates significant scope for improvement in representation learning, thus contributing to label disambiguation. PaPi guides the optimization of a prototypical classifier by a linear classifier with which they share the same feature encoder, thus explicitly encouraging the representation to reflect visual similarity between categories. It is also technically appealing, as PaPi requires only a few components in PiCO with the opposite direction of guidance, and directly eliminates the contrastive learning module that would introduce noise and consume computational resources. We empirically demonstrate that PaPi significantly outperforms other PLL methods on various image classification tasks.
Providing online adaptive lightweight time series anomaly detection without human intervention and domain knowledge is highly valuable. Several such anomaly detection approaches have been introduced in the past years, but all of them were only implemented in one deep learning library. With the development of deep learning libraries, it is unclear how different deep learning libraries impact these anomaly detection approaches since there is no such evaluation available. Randomly choosing a deep learning library to implement an anomaly detection approach might not be able to show the true performance of the approach. It might also mislead users in believing one approach is better than another. Therefore, in this paper, we investigate the impact of deep learning libraries on online adaptive lightweight time series anomaly detection by implementing two state-of-the-art anomaly detection approaches in three well-known deep learning libraries and evaluating how these two approaches are individually affected by the three deep learning libraries. A series of experiments based on four real-world open-source time series datasets were conducted. The results provide a good reference to select an appropriate deep learning library for online adaptive lightweight anomaly detection.
Existing few-shot learning (FSL) methods assume that there exist sufficient training samples from source classes for knowledge transfer to target classes with few training samples. However, this assumption is often invalid, especially when it comes to fine-grained recognition. In this work, we define a new FSL setting termed few-shot fewshot learning (FSFSL), under which both the source and target classes have limited training samples. To overcome the source class data scarcity problem, a natural option is to crawl images from the web with class names as search keywords. However, the crawled images are inevitably corrupted by large amount of noise (irrelevant images) and thus may harm the performance. To address this problem, we propose a graph convolutional network (GCN)-based label denoising (LDN) method to remove the irrelevant images. Further, with the cleaned web images as well as the original clean training images, we propose a GCN-based FSL method. For both the LDN and FSL tasks, a novel adaptive aggregation GCN (AdarGCN) model is proposed, which differs from existing GCN models in that adaptive aggregation is performed based on a multi-head multi-level aggregation module. With AdarGCN, how much and how far information carried by each graph node is propagated in the graph structure can be determined automatically, therefore alleviating the effects of both noisy and outlying training samples. Extensive experiments show the superior performance of our AdarGCN under both the new FSFSL and the conventional FSL settings.
The goal of few-shot learning is to learn a classifier that generalizes well even when trained with a limited number of training instances per class. The recently introduced meta-learning approaches tackle this problem by learning a generic classifier across a large number of multiclass classification tasks and generalizing the model to a new task. Yet, even with such meta-learning, the low-data problem in the novel classification task still remains. In this paper, we propose Transductive Propagation Network (TPN), a novel meta-learning framework for transductive inference that classifies the entire test set at once to alleviate the low-data problem. Specifically, we propose to learn to propagate labels from labeled instances to unlabeled test instances, by learning a graph construction module that exploits the manifold structure in the data. TPN jointly learns both the parameters of feature embedding and the graph construction in an end-to-end manner. We validate TPN on multiple benchmark datasets, on which it largely outperforms existing few-shot learning approaches and achieves the state-of-the-art results.
The key issue of few-shot learning is learning to generalize. In this paper, we propose a large margin principle to improve the generalization capacity of metric based methods for few-shot learning. To realize it, we develop a unified framework to learn a more discriminative metric space by augmenting the softmax classification loss function with a large margin distance loss function for training. Extensive experiments on two state-of-the-art few-shot learning models, graph neural networks and prototypical networks, show that our method can improve the performance of existing models substantially with very little computational overhead, demonstrating the effectiveness of the large margin principle and the potential of our method.