We study the problem of efficiently generating high-quality and diverse content in games. Previous work on automated deckbuilding in Hearthstone shows that the quality diversity algorithm MAP-Elites can generate a collection of high-performing decks with diverse strategic gameplay. However, MAP-Elites requires a large number of expensive evaluations to discover a diverse collection of decks. We propose assisting MAP-Elites with a deep surrogate model trained online to predict game outcomes with respect to candidate decks. MAP-Elites discovers a diverse dataset to improve the surrogate model accuracy, while the surrogate model helps guide MAP-Elites towards promising new content. In a Hearthstone deckbuilding case study, we show that our approach improves the sample efficiency of MAP-Elites and outperforms a model trained offline with random decks, as well as a linear surrogate model baseline, setting a new state-of-the-art for quality diversity approaches in automated Hearthstone deckbuilding. We include the source code for all the experiments at: //github.com/icaros-usc/EvoStone2.
Physical motion models offer interpretable predictions for the motion of vehicles. However, some model parameters, such as those related to aero- and hydrodynamics, are expensive to measure and are often only roughly approximated reducing prediction accuracy. Recurrent neural networks achieve high prediction accuracy at low cost, as they can use cheap measurements collected during routine operation of the vehicle, but their results are hard to interpret. To precisely predict vehicle states without expensive measurements of physical parameters, we propose a hybrid approach combining deep learning and physical motion models including a novel two-phase training procedure. We achieve interpretability by restricting the output range of the deep neural network as part of the hybrid model, which limits the uncertainty introduced by the neural network to a known quantity. We have evaluated our approach for the use case of ship and quadcopter motion. The results show that our hybrid model can improve model interpretability with no decrease in accuracy compared to existing deep learning approaches.
Image classifiers often rely overly on peripheral attributes that have a strong correlation with the target class (i.e., dataset bias) when making predictions. Recently, a myriad of studies focus on mitigating such dataset bias, the task of which is referred to as debiasing. However, these debiasing methods often have inconsistent experimental settings (e.g., datasets and neural network architectures). Additionally, most of the previous studies in debiasing do not specify how they select their model parameters which involve early stopping and hyper-parameter tuning. The goal of this paper is to standardize the inconsistent experimental settings and propose a consistent model parameter selection criterion for debiasing. Based on such unified experimental settings and model parameter selection criterion, we build a benchmark named DebiasBench which includes five datasets and seven debiasing methods. We carefully conduct extensive experiments in various aspects and show that different state-of-the-art methods work best in different datasets, respectively. Even, the vanilla method, the method with no debiasing module, also shows competitive results in datasets with low bias severity. We publicly release the implementation of existing debiasing methods in DebiasBench to encourage future researchers in debiasing to conduct fair comparisons and further push the state-of-the-art performances.
Personalised federated learning (FL) aims at collaboratively learning a machine learning model taylored for each client. Albeit promising advances have been made in this direction, most of existing approaches works do not allow for uncertainty quantification which is crucial in many applications. In addition, personalisation in the cross-device setting still involves important issues, especially for new clients or those having small number of observations. This paper aims at filling these gaps. To this end, we propose a novel methodology coined FedPop by recasting personalised FL into the population modeling paradigm where clients' models involve fixed common population parameters and random effects, aiming at explaining data heterogeneity. To derive convergence guarantees for our scheme, we introduce a new class of federated stochastic optimisation algorithms which relies on Markov chain Monte Carlo methods. Compared to existing personalised FL methods, the proposed methodology has important benefits: it is robust to client drift, practical for inference on new clients, and above all, enables uncertainty quantification under mild computational and memory overheads. We provide non-asymptotic convergence guarantees for the proposed algorithms and illustrate their performances on various personalised federated learning tasks.
Two of the most significant challenges in uncertainty quantification pertain to the high computational cost for simulating complex physical models and the high dimension of the random inputs. In applications of practical interest, both of these problems are encountered, and standard methods either fail or are not feasible. To overcome the current limitations, we present a generalized formulation of a Bayesian multi-fidelity Monte-Carlo (BMFMC) framework that can exploit lower-fidelity model versions in a small data regime. The goal of our analysis is an efficient and accurate estimation of the complete probabilistic response for high-fidelity models. BMFMC circumvents the curse of dimensionality by learning the relationship between the outputs of a reference high-fidelity model and potentially several lower-fidelity models. While the continuous formulation is mathematically exact and independent of the low-fidelity model's accuracy, we address challenges associated with the small data regime (i.e., only a small number of 50 to 300 high-fidelity model runs can be performed). Specifically, we complement the formulation with a set of informative input features at no extra cost. Despite the inaccurate and noisy information that some low-fidelity models provide, we demonstrate that accurate and certifiable estimates for the quantities of interest can be obtained for uncertainty quantification problems in high stochastic dimensions, with significantly fewer high-fidelity model runs than state-of-the-art methods for uncertainty quantification. We illustrate our approach by applying it to challenging numerical examples such as Navier-Stokes flow simulations and fluid-structure interaction problems.
To investigate the heterogeneity of federated learning in real-world scenarios, we generalize the classical federated learning to federated hetero-task learning, which emphasizes the inconsistency across the participants in federated learning in terms of both data distribution and learning tasks. We also present B-FHTL, a federated hetero-task learning benchmark consisted of simulation dataset, FL protocols and a unified evaluation mechanism. B-FHTL dataset contains three well-designed federated learning tasks with increasing heterogeneity. Each task simulates the clients with different data distributions and learning tasks. To ensure fair comparison among different FL algorithms, B-FHTL builds in a full suite of FL protocols by providing high-level APIs to avoid privacy leakage, and presets most common evaluation metrics spanning across different learning tasks, such as regression, classification, text generation and etc. Furthermore, we compare the FL algorithms in fields of federated multi-task learning, federated personalization and federated meta learning within B-FHTL, and highlight the influence of heterogeneity and difficulties of federated hetero-task learning. Our benchmark, including the federated dataset, protocols, the evaluation mechanism and the preliminary experiment, is open-sourced at //github.com/alibaba/FederatedScope/tree/contest/v1.0.
In this paper, we study a novel episodic risk-sensitive Reinforcement Learning (RL) problem, named Iterated CVaR RL, where the objective is to maximize the tail of the reward-to-go at each step. Different from existing risk-aware RL formulations, Iterated CVaR RL focuses on safety-at-all-time, by enabling the agent to tightly control the risk of getting into catastrophic situations at each stage, and is applicable to important risk-sensitive tasks that demand strong safety guarantees throughout the decision process, such as autonomous driving, clinical treatment planning and robotics. We investigate Iterated CVaR RL with two performance metrics, i.e., Regret Minimization and Best Policy Identification. For both metrics, we design efficient algorithms ICVaR-RM and ICVaR-BPI, respectively, and provide matching upper and lower bounds with respect to the number of episodes $K$. We also investigate an interesting limiting case of Iterated CVaR RL, called Worst Path RL, where the objective becomes to maximize the minimum possible cumulative reward, and propose an efficient algorithm with constant upper and lower bounds. Finally, the techniques we develop for bounding the change of CVaR due to the value function shift and decomposing the regret via a distorted visitation distribution are novel and can find applications in other risk-sensitive online learning problems.
A growing body of research runs human subject evaluations to study whether providing users with explanations of machine learning models can help them with practical real-world use cases. However, running user studies is challenging and costly, and consequently each study typically only evaluates a limited number of different settings, e.g., studies often only evaluate a few arbitrarily selected explanation methods. To address these challenges and aid user study design, we introduce Use-Case-Grounded Simulated Evaluations (SimEvals). SimEvals involve training algorithmic agents that take as input the information content (such as model explanations) that would be presented to each participant in a human subject study, to predict answers to the use case of interest. The algorithmic agent's test set accuracy provides a measure of the predictiveness of the information content for the downstream use case. We run a comprehensive evaluation on three real-world use cases (forward simulation, model debugging, and counterfactual reasoning) to demonstrate that Simevals can effectively identify which explanation methods will help humans for each use case. These results provide evidence that SimEvals can be used to efficiently screen an important set of user study design decisions, e.g. selecting which explanations should be presented to the user, before running a potentially costly user study.
There are many examples of cases where access to improved models of human behavior and cognition has allowed creation of robots which can better interact with humans, and not least in road vehicle automation this is a rapidly growing area of research. Human-robot interaction (HRI) therefore provides an important applied setting for human behavior modeling - but given the vast complexity of human behavior, how complete and accurate do these models need to be? Here, we outline some possible ways of thinking about this problem, starting from the suggestion that modelers need to keep the right end goal in sight: A successful human-robot interaction, in terms of safety, performance, and human satisfaction. Efforts toward model completeness and accuracy should be focused on those aspects of human behavior to which interaction success is most sensitive. We emphasise that identifying which those aspects are is a difficult scientific objective in its own right, distinct for each given HRI context. We propose and exemplify an approach to formulating a priori hypotheses on this matter, in cases where robots are to be involved in interactions which currently take place between humans, such as in automated driving. Our perspective also highlights some possible risks of overreliance on machine-learned models of human behavior in HRI, and how to mitigate against those risks.
Bionic underwater robots have demonstrated their superiority in many applications. Yet, training their intelligence for a variety of tasks that mimic the behavior of underwater creatures poses a number of challenges in practice, mainly due to lack of a large amount of available training data as well as the high cost in real physical environment. Alternatively, simulation has been considered as a viable and important tool for acquiring datasets in different environments, but it mostly targeted rigid and soft body systems. There is currently dearth of work for more complex fluid systems interacting with immersed solids that can be efficiently and accurately simulated for robot training purposes. In this paper, we propose a new platform called "FishGym", which can be used to train fish-like underwater robots. The framework consists of a robotic fish modeling module using articulated body with skinning, a GPU-based high-performance localized two-way coupled fluid-structure interaction simulation module that handles both finite and infinitely large domains, as well as a reinforcement learning module. We leveraged existing training methods with adaptations to underwater fish-like robots and obtained learned control policies for multiple benchmark tasks. The training results are demonstrated with reasonable motion trajectories, with comparisons and analyses to empirical models as well as known real fish swimming behaviors to highlight the advantages of the proposed platform.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.