The field of artificial intelligence (AI) is devoted to the creation of artificial decision-makers that can perform (at least) on par with the human counterparts on a domain of interest. Unlike the agents in traditional AI, the agents in artificial general intelligence (AGI) are required to replicate human intelligence in almost every domain of interest. Moreover, an AGI agent should be able to achieve this without (virtually any) further changes, retraining, or fine-tuning of the parameters. The real world is non-stationary, non-ergodic, and non-Markovian: we, humans, can neither revisit our past nor are the most recent observations sufficient statistics. Yet, we excel at a variety of complex tasks. Many of these tasks require longterm planning. We can associate this success to our natural faculty to abstract away task-irrelevant information from our overwhelming sensory experience. We make task-specific mental models of the world without much effort. Due to this ability to abstract, we can plan on a significantly compact representation of a task without much loss of performance. Not only this, we also abstract our actions to produce high-level plans: the level of action-abstraction can be anywhere between small muscle movements to a mental notion of "doing an action". It is natural to assume that any AGI agent competing with humans (at every plausible domain) should also have these abilities to abstract its experiences and actions. This thesis is an inquiry into the existence of such abstractions which aid efficient planing for a wide range of domains, and most importantly, these abstractions come with some optimality guarantees.
Applications of Reinforcement Learning (RL), in which agents learn to make a sequence of decisions despite lacking complete information about the latent states of the controlled system, that is, they act under partial observability of the states, are ubiquitous. Partially observable RL can be notoriously difficult -- well-known information-theoretic results show that learning partially observable Markov decision processes (POMDPs) requires an exponential number of samples in the worst case. Yet, this does not rule out the existence of large subclasses of POMDPs over which learning is tractable. In this paper we identify such a subclass, which we call weakly revealing POMDPs. This family rules out the pathological instances of POMDPs where observations are uninformative to a degree that makes learning hard. We prove that for weakly revealing POMDPs, a simple algorithm combining optimism and Maximum Likelihood Estimation (MLE) is sufficient to guarantee polynomial sample complexity. To the best of our knowledge, this is the first provably sample-efficient result for learning from interactions in overcomplete POMDPs, where the number of latent states can be larger than the number of observations.
Class imbalance distribution widely exists in real-world engineering. However, the mainstream optimization algorithms that seek to minimize error will trap the deep learning model in sub-optimums when facing extreme class imbalance. It seriously harms the classification precision, especially on the minor classes. The essential reason is that the gradients of the classifier weights are imbalanced among the components from different classes. In this paper, we propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients. We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance via only one-stage training instead of 2-stage learning like nowadays SOTA works.
We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable generative models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basketball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.
Building sample-efficient agents that generalize out-of-distribution (OOD) in real-world settings remains a fundamental unsolved problem on the path towards achieving higher-level cognition. One particularly promising approach is to begin with low-dimensional, pretrained representations of our world, which should facilitate efficient downstream learning and generalization. By training 240 representations and over 10,000 reinforcement learning (RL) policies on a simulated robotic setup, we evaluate to what extent different properties of pretrained VAE-based representations affect the OOD generalization of downstream agents. We observe that many agents are surprisingly robust to realistic distribution shifts, including the challenging sim-to-real case. In addition, we find that the generalization performance of a simple downstream proxy task reliably predicts the generalization performance of our RL agents under a wide range of OOD settings. Such proxy tasks can thus be used to select pretrained representations that will lead to agents that generalize.
Creating reinforcement learning (RL) agents that are capable of accepting and leveraging task-specific knowledge from humans has been long identified as a possible strategy for developing scalable approaches for solving long-horizon problems. While previous works have looked at the possibility of using symbolic models along with RL approaches, they tend to assume that the high-level action models are executable at low level and the fluents can exclusively characterize all desirable MDP states. This need not be true and this assumption overlooks one of the central technical challenges of incorporating symbolic task knowledge, namely, that these symbolic models are going to be an incomplete representation of the underlying task. To this end, we introduce Symbolic-Model Guided Reinforcement Learning, wherein we will formalize the relationship between the symbolic model and the underlying MDP that will allow us to capture the incompleteness of the symbolic model. We will use these models to extract high-level landmarks that will be used to decompose the task, and at the low level, we learn a set of diverse policies for each possible task sub-goal identified by the landmark. We evaluate our system by testing on three different benchmark domains and we show how even with incomplete symbolic model information, our approach is able to discover the task structure and efficiently guide the RL agent towards the goal.
Autonomous driving has achieved a significant milestone in research and development over the last decade. There is increasing interest in the field as the deployment of self-operating vehicles on roads promises safer and more ecologically friendly transportation systems. With the rise of computationally powerful artificial intelligence (AI) techniques, autonomous vehicles can sense their environment with high precision, make safe real-time decisions, and operate more reliably without human interventions. However, intelligent decision-making in autonomous cars is not generally understandable by humans in the current state of the art, and such deficiency hinders this technology from being socially acceptable. Hence, aside from making safe real-time decisions, the AI systems of autonomous vehicles also need to explain how these decisions are constructed in order to be regulatory compliant across many jurisdictions. Our study sheds a comprehensive light on developing explainable artificial intelligence (XAI) approaches for autonomous vehicles. In particular, we make the following contributions. First, we provide a thorough overview of the present gaps with respect to explanations in the state-of-the-art autonomous vehicle industry. We then show the taxonomy of explanations and explanation receivers in this field. Thirdly, we propose a framework for an architecture of end-to-end autonomous driving systems and justify the role of XAI in both debugging and regulating such systems. Finally, as future research directions, we provide a field guide on XAI approaches for autonomous driving that can improve operational safety and transparency towards achieving public approval by regulators, manufacturers, and all engaged stakeholders.
Seamlessly interacting with humans or robots is hard because these agents are non-stationary. They update their policy in response to the ego agent's behavior, and the ego agent must anticipate these changes to co-adapt. Inspired by humans, we recognize that robots do not need to explicitly model every low-level action another agent will make; instead, we can capture the latent strategy of other agents through high-level representations. We propose a reinforcement learning-based framework for learning latent representations of an agent's policy, where the ego agent identifies the relationship between its behavior and the other agent's future strategy. The ego agent then leverages these latent dynamics to influence the other agent, purposely guiding them towards policies suitable for co-adaptation. Across several simulated domains and a real-world air hockey game, our approach outperforms the alternatives and learns to influence the other agent.
This paper surveys the field of transfer learning in the problem setting of Reinforcement Learning (RL). RL has been the key solution to sequential decision-making problems. Along with the fast advance of RL in various domains. including robotics and game-playing, transfer learning arises as an important technique to assist RL by leveraging and transferring external expertise to boost the learning process. In this survey, we review the central issues of transfer learning in the RL domain, providing a systematic categorization of its state-of-the-art techniques. We analyze their goals, methodologies, applications, and the RL frameworks under which these transfer learning techniques would be approachable. We discuss the relationship between transfer learning and other relevant topics from an RL perspective and also explore the potential challenges as well as future development directions for transfer learning in RL.
Reinforcement learning is one of the core components in designing an artificial intelligent system emphasizing real-time response. Reinforcement learning influences the system to take actions within an arbitrary environment either having previous knowledge about the environment model or not. In this paper, we present a comprehensive study on Reinforcement Learning focusing on various dimensions including challenges, the recent development of different state-of-the-art techniques, and future directions. The fundamental objective of this paper is to provide a framework for the presentation of available methods of reinforcement learning that is informative enough and simple to follow for the new researchers and academics in this domain considering the latest concerns. First, we illustrated the core techniques of reinforcement learning in an easily understandable and comparable way. Finally, we analyzed and depicted the recent developments in reinforcement learning approaches. My analysis pointed out that most of the models focused on tuning policy values rather than tuning other things in a particular state of reasoning.
Humans and animals have the ability to continually acquire, fine-tune, and transfer knowledge and skills throughout their lifespan. This ability, referred to as lifelong learning, is mediated by a rich set of neurocognitive mechanisms that together contribute to the development and specialization of our sensorimotor skills as well as to long-term memory consolidation and retrieval. Consequently, lifelong learning capabilities are crucial for autonomous agents interacting in the real world and processing continuous streams of information. However, lifelong learning remains a long-standing challenge for machine learning and neural network models since the continual acquisition of incrementally available information from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback for state-of-the-art deep neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic forgetting. We discuss well-established and emerging research motivated by lifelong learning factors in biological systems such as structural plasticity, memory replay, curriculum and transfer learning, intrinsic motivation, and multisensory integration.