In this work, we introduce a new and efficient solution approach for the problem of decision making under uncertainty, which can be formulated as decision making in a belief space, over a possibly high-dimensional state space. Typically, to solve a decision problem, one should identify the optimal action from a set of candidates, according to some objective. We claim that one can often generate and solve an analogous yet simplified decision problem, which can be solved more efficiently. A wise simplification method can lead to the same action selection, or one for which the maximal loss in optimality can be guaranteed. Furthermore, such simplification is separated from the state inference, and does not compromise its accuracy, as the selected action would finally be applied on the original state. First, we present the concept for general decision problems, and provide a theoretical framework for a coherent formulation of the approach. We then practically apply these ideas to decision problems in the belief space, which can be simplified by considering a sparse approximation of their initial belief. The scalable belief sparsification algorithm we provide is able to yield solutions which are guaranteed to be consistent with the original problem. We demonstrate the benefits of the approach in the solution of a realistic active-SLAM problem, and manage to significantly reduce computation time, with no loss in the quality of solution. This work is both fundamental and practical, and holds numerous possible extensions.
In this paper we study paramertized motion planning algorithms which provide universal and flexible solutions to diverse motion planning problems. Such algorithms are intended to function under a variety of external conditions which are viewed as parameters and serve as part of the input of the algorithm. Continuing a recent paper, we study further the concept of parametrized topological complexity. We analyse in full detail the problem of controlling a swarm of robots in the presence of multiple obstacles in Euclidean space which served for us a natural motivating example. We present an explicit parametrized motion planning algorithm solving the motion planning problem for any number of robots and obstacles.. This algorithm is optimal, it has minimal possible topological complexity for any d odd. Besides, we describe a modification of this algorithm which is optimal for d even. We also analyse the parametrized topological complexity of sphere bundles using the Stiefel - Whitney characteristic classes.
We consider generic optimal Bayesian inference, namely, models of signal reconstruction where the posterior distribution and all hyperparameters are known. Under a standard assumption on the concentration of the free energy, we show how replica symmetry in the strong sense of concentration of all multioverlaps can be established as a consequence of the Franz-de Sanctis identities; the identities themselves in the current setting are obtained via a novel perturbation coming from exponentially distributed "side-observations" of the signal. Concentration of multioverlaps means that asymptotically the posterior distribution has a particularly simple structure encoded by a random probability measure (or, in the case of binary signal, a non-random probability measure). We believe that such strong control of the model should be key in the study of inference problems with underlying sparse graphical structure (error correcting codes, block models, etc) and, in particular, in the rigorous derivation of replica symmetric formulas for the free energy and mutual information in this context.
Bisimulation metrics define a distance measure between states of a Markov decision process (MDP) based on a comparison of reward sequences. Due to this property they provide theoretical guarantees in value function approximation. In this work we first prove that bisimulation metrics can be defined via any $p$-Wasserstein metric for $p\geq 1$. Then we describe an approximate policy iteration (API) procedure that uses $\epsilon$-aggregation with $\pi$-bisimulation and prove performance bounds for continuous state spaces. We bound the difference between $\pi$-bisimulation metrics in terms of the change in the policies themselves. Based on these theoretical results, we design an API($\alpha$) procedure that employs conservative policy updates and enjoys better performance bounds than the naive API approach. In addition, we propose a novel trust region approach which circumvents the requirement to explicitly solve a constrained optimization problem. Finally, we provide experimental evidence of improved stability compared to non-conservative alternatives in simulated continuous control.
We introduce the zip tree, a form of randomized binary search tree that integrates previous ideas into one practical, performant, and pleasant-to-implement package. A zip tree is a binary search tree in which each node has a numeric rank and the tree is (max)-heap-ordered with respect to ranks, with rank ties broken in favor of smaller keys. Zip trees are essentially treaps (Seidel and Aragon 1996), except that ranks are drawn from a geometric distribution instead of a uniform distribution, and we allow rank ties. These changes enable us to use fewer random bits per node. We perform insertions and deletions by unmerging and merging paths ("unzipping" and "zipping") rather than by doing rotations, which avoids some pointer changes and improves efficiency. The methods of zipping and unzipping take inspiration from previous top-down approaches to insertion and deletion (Stephenson 1980; Mart\'inez and Roura 1998; Sprugnoli 1980). From a theoretical standpoint, this work provides two main results. First, zip trees require only $O(\log \log n)$ bits (with high probability) to represent the largest rank in an $n$-node binary search tree; previous data structures require $O(\log n)$ bits for the largest rank. Second, zip trees are naturally isomorphic to skip lists (Pugh 1990), and simplify the mapping of (Dean and Jones 2007) between skip lists and binary search trees.
The subset sum problem is known to be an NP-hard problem in the field of computer science with the fastest known approach having a run-time complexity of $O(2^{0.3113n})$. A modified version of this problem is known as the perfect sum problem and extends the subset sum idea further. This extension results in additional complexity, making it difficult to compute for a large input. In this paper, I propose a probabilistic approach which approximates the solution to the perfect sum problem by approximating the distribution of potential sums. Since this problem is an extension of the subset sum, our approximation also grants some probabilistic insight into the solution for the subset sum problem. We harness distributional approximations to model the number of subsets which sum to a certain size. These distributional approximations are formulated in two ways: using bounds to justify normal approximation, and approximating the empirical distribution via density estimation. These approximations can be computed in $O(n)$ complexity, and can increase in accuracy with the size of the input data making it useful for large-scale combinatorial problems. Code is available at //github.com/KristofPusztai/PerfectSum.
A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to alleviate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet effective method to improve continual performance. However, the training and inference cost of ensembles can increase linearly with the number of models. Motivated by this limitation, we leverage the recent advances in the deep learning optimization literature, such as mode connectivity and neural network subspaces, to derive a new method that is both computationally advantageous and can outperform the state-of-the-art continual learning algorithms.
We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain. We consider a natural actor-critic method that employs a finite internal memory for policy parameterization, and a multi-step temporal difference learning algorithm for policy evaluation. We establish, to the best of our knowledge, the first non-asymptotic global convergence of actor-critic methods for partially observed systems under function approximation. In particular, in addition to the function approximation and statistical errors that also arise in MDPs, we explicitly characterize the error due to the use of finite-state controllers. This additional error is stated in terms of the total variation distance between the traditional belief state in POMDPs and the posterior distribution of the hidden state when using a finite-state controller. Further, we show that this error can be made small in the case of sliding-block controllers by using larger block sizes.
Multi-task learning (MTL) aims to improve the generalization of several related tasks by learning them jointly. As a comparison, in addition to the joint training scheme, modern meta-learning allows unseen tasks with limited labels during the test phase, in the hope of fast adaptation over them. Despite the subtle difference between MTL and meta-learning in the problem formulation, both learning paradigms share the same insight that the shared structure between existing training tasks could lead to better generalization and adaptation. In this paper, we take one important step further to understand the close connection between these two learning paradigms, through both theoretical analysis and empirical investigation. Theoretically, we first demonstrate that MTL shares the same optimization formulation with a class of gradient-based meta-learning (GBML) algorithms. We then prove that for over-parameterized neural networks with sufficient depth, the learned predictive functions of MTL and GBML are close. In particular, this result implies that the predictions given by these two models are similar over the same unseen task. Empirically, we corroborate our theoretical findings by showing that, with proper implementation, MTL is competitive against state-of-the-art GBML algorithms on a set of few-shot image classification benchmarks. Since existing GBML algorithms often involve costly second-order bi-level optimization, our first-order MTL method is an order of magnitude faster on large-scale datasets such as mini-ImageNet. We believe this work could help bridge the gap between these two learning paradigms, and provide a computationally efficient alternative to GBML that also supports fast task adaptation.
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated model design. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights with a hierarchy of gamma distributions, and an encoder network by stacking Weibull distributions with a deterministic-upward-stochastic-downward structure to approximate the posterior. The resulting auto-encoding networks can be optimized in a differentiable way with a variational lower bound. It is simple to convert any models with deterministic attention, including pretrained ones, to the proposed Bayesian attention belief networks. On a variety of language understanding tasks, we show that our method outperforms deterministic attention and state-of-the-art stochastic attention in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. We further demonstrate the general applicability of our method on neural machine translation and visual question answering, showing great potential of incorporating our method into various attention-related tasks.
Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampled action subsets. This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm based upon policy iteration. Concretely, we propose Sampled MuZero, an extension of the MuZero algorithm that is able to learn in domains with arbitrarily complex action spaces by planning over sampled actions. We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: DeepMind Control Suite and Real-World RL Suite.