Behavioral models enable the analysis of the functionality of software product lines (SPL), e.g., model checking and model-based testing. Model learning aims at constructing behavioral models for software systems in some form of a finite state machine. Due to the commonalities among the products of an SPL, it is possible to reuse the previously learned models during the model learning process. In this paper, an adaptive approach (the $\text{PL}^*$ method) for learning the product models of an SPL is presented based on the well-known $L^*$ algorithm. In this method, after model learning of each product, the sequences in the final observation table are stored in a repository which will be used to initialize the observation table of the remaining products to be learned. The proposed algorithm is evaluated on two open-source SPLs and the total learning cost is measured in terms of the number of rounds, the total number of resets and input symbols. The results show that for complex SPLs, the total learning cost for the $\text{PL}^*$ method is significantly lower than that of the non-adaptive learning method in terms of all three metrics. Furthermore, it is observed that the order in which the products are learned affects the efficiency of the $\text{PL}^*$ method. Based on this observation, we introduced a heuristic to determine an ordering which reduces the total cost of adaptive learning in both case studies.
The aim of Few-Shot learning methods is to train models which can easily adapt to previously unseen tasks, based on small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the general weights of the meta-model, which are further adapted to specific problems in a small number of gradient steps. However, the model's main limitation lies in the fact that the update procedure is realized by gradient-based optimisation. In consequence, MAML cannot always modify weights to the essential level in one or even a few gradient iterations. On the other hand, using many gradient steps results in a complex and time-consuming optimization procedure, which is hard to train in practice, and may lead to overfitting. In this paper, we propose HyperMAML, a novel generalization of MAML, where the training of the update procedure is also part of the model. Namely, in HyperMAML, instead of updating the weights with gradient descent, we use for this purpose a trainable Hypernetwork. Consequently, in this framework, the model can generate significant updates whose range is not limited to a fixed number of gradient steps. Experiments show that HyperMAML consistently outperforms MAML and performs comparably to other state-of-the-art techniques in a number of standard Few-Shot learning benchmarks.
As deep reinforcement learning (RL) showcases its strengths in networking and systems, its pitfalls also come to the public's attention--when trained to handle a wide range of network workloads and previously unseen deployment environments, RL policies often manifest suboptimal performance and poor generalizability. To tackle these problems, we present Genet, a new training framework for learning better RL-based network adaptation algorithms. Genet is built on the concept of curriculum learning, which has proved effective against similar issues in other domains where RL is extensively employed. At a high level, curriculum learning gradually presents more difficult environments to the training, rather than choosing them randomly, so that the current RL model can make meaningful progress in training. However, applying curriculum learning in networking is challenging because it remains unknown how to measure the "difficulty" of a network environment. Instead of relying on handcrafted heuristics to determine the environment's difficulty level, our insight is to utilize traditional rule-based (non-RL) baselines: If the current RL model performs significantly worse in a network environment than the baselines, then the model's potential to improve when further trained in this environment is substantial. Therefore, Genet automatically searches for the environments where the current model falls significantly behind a traditional baseline scheme and iteratively promotes these environments as the training progresses. Through evaluating Genet on three use cases--adaptive video streaming, congestion control, and load balancing, we show that Genet produces RL policies which outperform both regularly trained RL policies and traditional baselines in each context, not only under synthetic workloads but also in real environments.
In this paper, we present a predictor-corrector strategy for constructing rank-adaptive dynamical low-rank approximations (DLRAs) of matrix-valued ODE systems. The strategy is a compromise between (i) low-rank step-truncation approaches that alternately evolve and compress solutions and (ii) strict DLRA approaches that augment the low-rank manifold using subspaces generated locally in time by the DLRA integrator. The strategy is based on an analysis of the error between a forward temporal update into the ambient full-rank space, which is typically computed in a step-truncation approach before re-compressing, and the standard DLRA update, which is forced to live in a low-rank manifold. We use this error, without requiring its full-rank representation, to correct the DLRA solution. A key ingredient for maintaining a low-rank representation of the error is a randomized singular value decomposition (SVD), which introduces some degree of stochastic variability into the implementation. The strategy is formulated and implemented in the context of discontinuous Galerkin spatial discretizations of partial differential equations and applied to several versions of DLRA methods found in the literature, as well as a new variant. Numerical experiments comparing the predictor-corrector strategy to other methods demonstrate robustness to overcome short-comings of step truncation or strict DLRA approaches: the former may require more memory than is strictly needed while the latter may miss transients solution features that cannot be recovered. The effect of randomization, tolerances, and other implementation parameters is also explored.
This work introduces a formulation of model predictive control (MPC) which adaptively reasons about the complexity of the model based on the task while maintaining feasibility and stability guarantees. Existing MPC implementations often handle computational complexity by shortening prediction horizons or simplifying models, both of which can result in instability. Inspired by related approaches in behavioral economics, motion planning, and biomechanics, our method solves MPC problems with a simple model for dynamics and constraints over regions of the horizon where such a model is feasible and a complex model where it is not. The approach leverages an interleaving of planning and execution to iteratively identify these regions, which can be safely simplified if they satisfy an exact template/anchor relationship. We show that this method does not compromise the stability and feasibility properties of the system, and measure performance in simulation experiments on a quadrupedal robot executing agile behaviors over terrains of interest. We find that this adaptive method enables more agile motion and expands the range of executable tasks compared to fixed-complexity implementations.
Recent advances in legged locomotion have enabled quadrupeds to walk on challenging terrains. However, bipedal robots are inherently more unstable and hence it's harder to design walking controllers for them. In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots. Similar to existing works, we start with a base policy which produces actions while taking as input an estimated extrinsics vector from an adaptation module. This extrinsics vector contains information about the environment and enables the walking controller to rapidly adapt online. However, the extrinsics estimator could be imperfect, which might lead to poor performance of the base policy which expects a perfect estimator. In this paper, we propose A-RMA (Adapting RMA), which additionally adapts the base policy for the imperfect extrinsics estimator by finetuning it using model-free RL. We demonstrate that A-RMA outperforms a number of RL-based baseline controllers and model-based controllers in simulation, and show zero-shot deployment of a single A-RMA policy to enable a bipedal robot, Cassie, to walk in a variety of different scenarios in the real world beyond what it has seen during training. Videos and results at //ashish-kmr.github.io/a-rma/
Class Incremental Learning (CIL) aims at learning a multi-class classifier in a phase-by-phase manner, in which only data of a subset of the classes are provided at each phase. Previous works mainly focus on mitigating forgetting in phases after the initial one. However, we find that improving CIL at its initial phase is also a promising direction. Specifically, we experimentally show that directly encouraging CIL Learner at the initial phase to output similar representations as the model jointly trained on all classes can greatly boost the CIL performance. Motivated by this, we study the difference between a na\"ively-trained initial-phase model and the oracle model. Specifically, since one major difference between these two models is the number of training classes, we investigate how such difference affects the model representations. We find that, with fewer training classes, the data representations of each class lie in a long and narrow region; with more training classes, the representations of each class scatter more uniformly. Inspired by this observation, we propose Class-wise Decorrelation (CwD) that effectively regularizes representations of each class to scatter more uniformly, thus mimicking the model jointly trained with all classes (i.e., the oracle model). Our CwD is simple to implement and easy to plug into existing methods. Extensive experiments on various benchmark datasets show that CwD consistently and significantly improves the performance of existing state-of-the-art methods by around 1\% to 3\%. Code will be released.
While recent studies on semi-supervised learning have shown remarkable progress in leveraging both labeled and unlabeled data, most of them presume a basic setting of the model is randomly initialized. In this work, we consider semi-supervised learning and transfer learning jointly, leading to a more practical and competitive paradigm that can utilize both powerful pre-trained models from source domain as well as labeled/unlabeled data in the target domain. To better exploit the value of both pre-trained weights and unlabeled target examples, we introduce adaptive consistency regularization that consists of two complementary components: Adaptive Knowledge Consistency (AKC) on the examples between the source and target model, and Adaptive Representation Consistency (ARC) on the target model between labeled and unlabeled examples. Examples involved in the consistency regularization are adaptively selected according to their potential contributions to the target task. We conduct extensive experiments on several popular benchmarks including CUB-200-2011, MIT Indoor-67, MURA, by fine-tuning the ImageNet pre-trained ResNet-50 model. Results show that our proposed adaptive consistency regularization outperforms state-of-the-art semi-supervised learning techniques such as Pseudo Label, Mean Teacher, and MixMatch. Moreover, our algorithm is orthogonal to existing methods and thus able to gain additional improvements on top of MixMatch and FixMatch. Our code is available at //github.com/SHI-Labs/Semi-Supervised-Transfer-Learning.
Recent advances in maximizing mutual information (MI) between the source and target have demonstrated its effectiveness in text generation. However, previous works paid little attention to modeling the backward network of MI (i.e., dependency from the target to the source), which is crucial to the tightness of the variational information maximization lower bound. In this paper, we propose Adversarial Mutual Information (AMI): a text generation framework which is formed as a novel saddle point (min-max) optimization aiming to identify joint interactions between the source and target. Within this framework, the forward and backward networks are able to iteratively promote or demote each other's generated instances by comparing the real and synthetic data distributions. We also develop a latent noise sampling strategy that leverages random variations at the high-level semantic space to enhance the long term dependency in the generation process. Extensive experiments based on different text generation tasks demonstrate that the proposed AMI framework can significantly outperform several strong baselines, and we also show that AMI has potential to lead to a tighter lower bound of maximum mutual information for the variational information maximization problem.
We propose a new method for event extraction (EE) task based on an imitation learning framework, specifically, inverse reinforcement learning (IRL) via generative adversarial network (GAN). The GAN estimates proper rewards according to the difference between the actions committed by the expert (or ground truth) and the agent among complicated states in the environment. EE task benefits from these dynamic rewards because instances and labels yield to various extents of difficulty and the gains are expected to be diverse -- e.g., an ambiguous but correctly detected trigger or argument should receive high gains -- while the traditional RL models usually neglect such differences and pay equal attention on all instances. Moreover, our experiments also demonstrate that the proposed framework outperforms state-of-the-art methods, without explicit feature engineering.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.