Active inference, a corollary of the free energy principle, is a formal way of describing the behavior of certain kinds of random dynamical systems that have the appearance of sentience. In this chapter, we describe how active inference combines Bayesian decision theory and optimal Bayesian design principles under a single imperative to minimize expected free energy. It is this aspect of active inference that allows for the natural emergence of information-seeking behavior. When removing prior outcomes preferences from expected free energy, active inference reduces to optimal Bayesian design, i.e., information gain maximization. Conversely, active inference reduces to Bayesian decision theory in the absence of ambiguity and relative risk, i.e., expected utility maximization. Using these limiting cases, we illustrate how behaviors differ when agents select actions that optimize expected utility, expected information gain, and expected free energy. Our T-maze simulations show optimizing expected free energy produces goal-directed information-seeking behavior while optimizing expected utility induces purely exploitive behavior and maximizing information gain engenders intrinsically motivated behavior.
Spatially inhomogeneous functions, which may be smooth in some regions and rough in other regions, are modelled naturally in a Bayesian manner using so-called Besov priors which are given by random wavelet expansions with Laplace-distributed coefficients. This paper studies theoretical guarantees for such prior measures - specifically, we examine their frequentist posterior contraction rates in the setting of non-linear inverse problems with Gaussian white noise. Our results are first derived under a general local Lipschitz assumption on the forward map. We then verify the assumption for two non-linear inverse problems arising from elliptic partial differential equations, the Darcy flow model from geophysics as well as a model for the Schr\"odinger equation appearing in tomography. In the course of the proofs, we also obtain novel concentration inequalities for penalized least squares estimators with $\ell^1$ wavelet penalty, which have a natural interpretation as maximum a posteriori (MAP) estimators. The true parameter is assumed to belong to some spatially inhomogeneous Besov class $B^{\alpha}_{11}$, $\alpha>0$. In a setting with direct observations, we complement these upper bounds with a lower bound on the rate of contraction for arbitrary Gaussian priors. An immediate consequence of our results is that while Laplace priors can achieve minimax-optimal rates over $B^{\alpha}_{11}$-classes, Gaussian priors are limited to a (by a polynomial factor) slower contraction rate. This gives information-theoretical justification for the intuition that Laplace priors are more compatible with $\ell^1$ regularity structure in the underlying parameter.
An important issue during an engineering design process is to develop an understanding which design parameters have the most influence on the performance. Especially in the context of optimization approaches this knowledge is crucial in order to realize an efficient design process and achieve high-performing results. Information theory provides powerful tools to investigate these relationships because measures are model-free and thus also capture non-linear relationships, while requiring only minimal assumptions on the input data. We therefore propose to use recently introduced information-theoretic methods and estimation algorithms to find the most influential input parameters in optimization results. The proposed methods are in particular able to account for interactions between parameters, which are often neglected but may lead to redundant or synergistic contributions of multiple parameters. We demonstrate the application of these methods on optimization data from aerospace engineering, where we first identify the most relevant optimization parameters using a recently introduced information-theoretic feature-selection algorithm that accounts for interactions between parameters. Second, we use the novel partial information decomposition (PID) framework that allows to quantify redundant and synergistic contributions between selected parameters with respect to the optimization outcome to identify parameter interactions. We thus demonstrate the power of novel information-theoretic approaches in identifying relevant parameters in optimization runs and highlight how these methods avoid the selection of redundant parameters, while detecting interactions that result in synergistic contributions of multiple parameters.
In statistical decision theory, a model is said to be Pareto optimal (or admissible) if no other model carries less risk for at least one state of nature while presenting no more risk for others. How can you rationally aggregate/combine a finite set of Pareto optimal models while preserving Pareto efficiency? This question is nontrivial because weighted model averaging does not, in general, preserve Pareto efficiency. This paper presents an answer in four logical steps: (1) A rational aggregation rule should preserve Pareto efficiency (2) Due to the complete class theorem, Pareto optimal models must be Bayesian, i.e., they minimize a risk where the true state of nature is averaged with respect to some prior. Therefore each Pareto optimal model can be associated with a prior, and Pareto efficiency can be maintained by aggregating Pareto optimal models through their priors. (3) A prior can be interpreted as a preference ranking over models: prior $\pi$ prefers model A over model B if the average risk of A is lower than the average risk of B. (4) A rational/consistent aggregation rule should preserve this preference ranking: If both priors $\pi$ and $\pi'$ prefer model A over model B, then the prior obtained by aggregating $\pi$ and $\pi'$ must also prefer A over B. Under these four steps, we show that all rational/consistent aggregation rules are as follows: Give each individual Pareto optimal model a weight, introduce a weak order/ranking over the set of Pareto optimal models, aggregate a finite set of models S as the model associated with the prior obtained as the weighted average of the priors of the highest-ranked models in S. This result shows that all rational/consistent aggregation rules must follow a generalization of hierarchical Bayesian modeling. Following our main result, we present applications to Kernel smoothing, time-depreciating models, and voting mechanisms.
Platooning has been exploited as a method for vehicles to minimize energy consumption. In this article, we present a constraint-driven optimal control framework that yields emergent platooning behavior for connected and automated vehicles operating in an open transportation system. Our approach combines recent insights in constraint-driven optimal control with the physical aerodynamic interactions between vehicles in a highway setting. The result is a set of equations that describes when platooning is an appropriate strategy, as well as a descriptive optimal control law that yields emergent platooning behavior. Finally, we demonstrate these properties in simulation.
In this paper we study a discrete-time semidiscretization of an infinite time horizon noncooperative $N$-player differential game. We prove that as the discretization time step approaches zero the discrete-time value function approximate the value function of the differential game. Furthermore, the discrete-time Nash equilibrium is an $\epsilon$-Nash equilibrium for the continuous-time differential game.
Physically-inspired latent force models offer an interpretable alternative to purely data driven tools for inference in dynamical systems. They carry the structure of differential equations and the flexibility of Gaussian processes, yielding interpretable parameters and dynamics-imposed latent functions. However, the existing inference techniques associated with these models rely on the exact computation of posterior kernel terms which are seldom available in analytical form. Most applications relevant to practitioners, such as Hill equations or diffusion equations, are hence intractable. In this paper, we overcome these computational problems by proposing a variational solution to a general class of non-linear and parabolic partial differential equation latent force models. Further, we show that a neural operator approach can scale our model to thousands of instances, enabling fast, distributed computation. We demonstrate the efficacy and flexibility of our framework by achieving competitive performance on several tasks where the kernels are of varying degrees of tractability.
Active inference is a unifying theory for perception and action resting upon the idea that the brain maintains an internal model of the world by minimizing free energy. From a behavioral perspective, active inference agents can be seen as self-evidencing beings that act to fulfill their optimistic predictions, namely preferred outcomes or goals. In contrast, reinforcement learning requires human-designed rewards to accomplish any desired outcome. Although active inference could provide a more natural self-supervised objective for control, its applicability has been limited because of the shortcomings in scaling the approach to complex environments. In this work, we propose a contrastive objective for active inference that strongly reduces the computational burden in learning the agent's generative model and planning future actions. Our method performs notably better than likelihood-based active inference in image-based tasks, while also being computationally cheaper and easier to train. We compare to reinforcement learning agents that have access to human-designed reward functions, showing that our approach closely matches their performance. Finally, we also show that contrastive methods perform significantly better in the case of distractors in the environment and that our method is able to generalize goals to variations in the background.
The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizing for expected performance, many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.
Recommender systems exploit interaction history to estimate user preference, having been heavily used in a wide range of industry applications. However, static recommendation models are difficult to answer two important questions well due to inherent shortcomings: (a) What exactly does a user like? (b) Why does a user like an item? The shortcomings are due to the way that static models learn user preference, i.e., without explicit instructions and active feedback from users. The recent rise of conversational recommender systems (CRSs) changes this situation fundamentally. In a CRS, users and the system can dynamically communicate through natural language interactions, which provide unprecedented opportunities to explicitly obtain the exact preference of users. Considerable efforts, spread across disparate settings and applications, have been put into developing CRSs. Existing models, technologies, and evaluation methods for CRSs are far from mature. In this paper, we provide a systematic review of the techniques used in current CRSs. We summarize the key challenges of developing CRSs into five directions: (1) Question-based user preference elicitation. (2) Multi-turn conversational recommendation strategies. (3) Dialogue understanding and generation. (4) Exploitation-exploration trade-offs. (5) Evaluation and user simulation. These research directions involve multiple research fields like information retrieval (IR), natural language processing (NLP), and human-computer interaction (HCI). Based on these research directions, we discuss some future challenges and opportunities. We provide a road map for researchers from multiple communities to get started in this area. We hope this survey helps to identify and address challenges in CRSs and inspire future research.
We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.