Location-aware networks will introduce new services and applications for modern convenience, surveillance, and public safety. In this paper, we consider the problem of cooperative localization in a wireless network where the position of certain anchor nodes can be controlled. We introduce an active planning method that aims at moving the anchors such that the information gain of future measurements is maximized. In the control layer of the proposed method, control inputs are calculated by minimizing the traces of approximate inverse Bayesian Fisher information matrixes (FIMs). The estimation layer computes estimates of the agent states and provides Gaussian representations of marginal posteriors of agent positions to the control layer for approximate Bayesian FIM computations. Based on a cost function that accumulates Bayesian FIM contributions over a sliding window of discrete future timesteps, a receding horizon (RH) control is performed. Approximations that make it possible to solve the resulting tree-search problem efficiently are also discussed. A numerical case study demonstrates the intelligent behavior of a single controlled anchor in a 3-D scenario and the resulting significantly improved localization accuracy.
The empirical success of Reinforcement Learning (RL) in the setting of contact-rich manipulation leaves much to be understood from a model-based perspective, where the key difficulties are often attributed to (i) the explosion of contact modes, (ii) stiff, non-smooth contact dynamics and the resulting exploding / discontinuous gradients, and (iii) the non-convexity of the planning problem. The stochastic nature of RL addresses (i) and (ii) by effectively sampling and averaging the contact modes. On the other hand, model-based methods have tackled the same challenges by smoothing contact dynamics analytically. Our first contribution is to establish the theoretical equivalence of the two methods for simple systems, and provide qualitative and empirical equivalence on a number of complex examples. In order to further alleviate (ii), our second contribution is a convex, differentiable and quasi-dynamic formulation of contact dynamics, which is amenable to both smoothing schemes, and has proven through experiments to be highly effective for contact-rich planning. Our final contribution resolves (iii), where we show that classical sampling-based motion planning algorithms can be effective in global planning when contact modes are abstracted via smoothing. Applying our method on a collection of challenging contact-rich manipulation tasks, we demonstrate that efficient model-based motion planning can achieve results comparable to RL with dramatically less computation. Video: //youtu.be/12Ew4xC-VwA
We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments that are typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear kernel MDPs from a single stream of experience, and show that it achieves near-optimal statistical guarantees.
This article presents a general approximation-theoretic framework to analyze measure-transport algorithms for sampling and characterizing probability measures. Sampling is a task that frequently arises in data science and uncertainty quantification. We provide error estimates in the continuum limit, i.e., when the measures (or their densities) are given, but when the transport map is discretized or approximated using a finite-dimensional function space. Our analysis relies on the regularity theory of transport maps, as well as on classical approximation theory for high-dimensional functions. A third element of our analysis, which is of independent interest, is the development of new stability estimates that relate the normed distance between two maps to the divergence between the pushforward measures they define. We further present a series of applications where quantitative convergence rates are obtained for practical problems using Wasserstein metrics, maximum mean discrepancy, and Kullback-Leibler divergence. Specialized rates for approximations of the popular triangular Kn{\"o}the-Rosenblatt maps are obtained, followed by numerical experiments that demonstrate and extend our theory.
In human-robot collaboration, the objectives of the human are often unknown to the robot. Moreover, even assuming a known objective, the human behavior is also uncertain. In order to plan a robust robot behavior, a key preliminary question is then: How to derive realistic human behaviors given a known objective? A major issue is that such a human behavior should itself account for the robot behavior, otherwise collaboration cannot happen. In this paper, we rely on Markov decision models, representing the uncertainty over the human objective as a probability distribution over a finite set of objective functions (inducing a distribution over human behaviors). Based on this, we propose two contributions: 1) an approach to automatically generate an uncertain human behavior (a policy) for each given objective function while accounting for possible robot behaviors; and 2) a robot planning algorithm that is robust to the above-mentioned uncertainties and relies on solving a partially observable Markov decision process (POMDP) obtained by reasoning on a distribution over human behaviors. A co-working scenario allows conducting experiments and presenting qualitative and quantitative results to evaluate our approach.
Collision-free mobile robot navigation is an important problem for many robotics applications, especially in cluttered environments. In such environments, obstacles can be static or dynamic. Dynamic obstacles can additionally be interactive, i.e. changing their behavior according to the behavior of other entities. The perception and prediction modules of robotic systems create probabilistic representations and predictions of such environments. In this paper, we propose a novel prediction representation for interactive behaviors of dynamic obstacles. Then, we propose a real-time trajectory planning algorithm that probabilistically avoids collisions against static and interactive dynamic obstacles, and produces dynamically feasible trajectories. During decision making, our planner simulates the interactive behavior of dynamic obstacles in response to the actions planning robot takes. We explicitly minimize collision probabilities against static and dynamic obstacles using a multi-objective search formulation. Then, we formulate a quadratic program to safely fit a smooth trajectory to the search result while attempting to preserve the collision probabilities computed during search. We evaluate our algorithm extensively in simulations to show its performance under different environments and configurations using 78000 randomly generated cases. We compare its performance to a state-of-the-art trajectory planning algorithm for static and dynamic obstacle avoidance using 4500 randomly generated cases. We show that our algorithm achieves up to 3.8x success rate using as low as 0.18x time the baseline uses. We implement our algorithm for physical quadrotors, and show its feasibility in the real world.
Discrete data are abundant and often arise as counts or rounded data. These data commonly exhibit complex distributional features such as zero-inflation, over-/under-dispersion, boundedness, and heaping, which render many parametric models inadequate. Yet even for parametric regression models, approximations such as MCMC typically are needed for posterior inference. This paper introduces a Bayesian modeling and algorithmic framework that enables semiparametric regression analysis for discrete data with Monte Carlo (not MCMC) sampling. The proposed approach pairs a nonparametric marginal model with a latent linear regression model to encourage both flexibility and interpretability, and delivers posterior consistency even under model misspecification. For a parametric or large-sample approximation of this model, we identify a class of conjugate priors with (pseudo) closed-form posteriors. All posterior and predictive distributions are available analytically or via direct Monte Carlo sampling. These tools are broadly useful for linear regression, nonlinear models via basis expansions, and variable selection with discrete data. Simulation studies demonstrate significant advantages in computing, prediction, estimation, and selection relative to existing alternatives. This novel approach is applied successfully to self-reported mental health data that exhibit zero-inflation, overdispersion, boundedness, and heaping.
The exploding research interest for neural networks in modeling nonlinear dynamical systems is largely explained by the networks' capacity to model complex input-output relations directly from data. However, they typically need vast training data before they can be put to any good use. The data generation process for dynamical systems can be an expensive endeavor both in terms of time and resources. Active learning addresses this shortcoming by acquiring the most informative data, thereby reducing the need to collect enormous datasets. What makes the current work unique is integrating the deep active learning framework into nonlinear system identification. We formulate a general static deep active learning acquisition problem for nonlinear system identification. This is enabled by exploring system dynamics locally in different regions of the input space to obtain a simulated dataset covering the broader input space. This simulated dataset can be used in a static deep active learning acquisition scheme referred to as global explorations. The global exploration acquires a batch of initial states corresponding to the most informative state-action trajectories according to a batch acquisition function. The local exploration solves an optimal control problem, finding the control trajectory that maximizes some measure of information. After a batch of informative initial states is acquired, a new round of local explorations from the initial states in the batch is conducted to obtain a set of corresponding control trajectories that are to be applied on the system dynamics to get data from the system. Information measures used in the acquisition scheme are derived from the predictive variance of an ensemble of neural networks. The novel method outperforms standard data acquisition methods used for system identification of nonlinear dynamical systems in the case study performed on simulated data.
The ability to efficiently plan and execute search missions in challenging and complex environments during natural and man-made disasters is imperative. In many emergency situations, precise navigation between obstacles and time-efficient searching around 3D structures is essential for finding survivors. In this work we propose an integrated assessment and search planning approach which allows an autonomous UAV (unmanned aerial vehicle) agent to plan and execute collision-free search trajectories in 3D environments. More specifically, the proposed search-planning framework aims to integrate and automate the first two phases (i.e., the assessment phase and the search phase) of a traditional search-and-rescue (SAR) mission. In the first stage, termed assessment-planning we aim to find a high-level assessment plan which the UAV agent can execute in order to visit a set of points of interest. The generated plan of this stage guides the UAV to fly over the objects of interest thus providing a first assessment of the situation at hand. In the second stage, termed search-planning, the UAV trajectory is further fine-tuned to allow the UAV to search in 3D (i.e., across all faces) the objects of interest for survivors. The performance of the proposed approach is demonstrated through extensive simulation analysis.
The paper introduces DiSProD, an online planner developed for environments with probabilistic transitions in continuous state and action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy's value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. An extensive experimental evaluation compares DiSProD to state-of-the-art planners in discrete-time planning and real-time control of robotic systems. The proposed method improves over existing planners in handling stochastic environments, sensitivity to search depth, sparsity of rewards, and large action spaces. Additional real-world experiments demonstrate that DiSProD can control ground vehicles and surface vessels to successfully navigate around obstacles.
The problem of path planning has been studied for years. Classic planning pipelines, including perception, mapping, and path searching, can result in latency and compounding errors between modules. While recent studies have demonstrated the effectiveness of end-to-end learning methods in achieving high planning efficiency, these methods often struggle to match the generalization abilities of classic approaches in handling different environments. Moreover, end-to-end training of policies often requires a large number of labeled data or training iterations to reach convergence. In this paper, we present a novel Imperative Learning (IL) approach. This approach leverages a differentiable cost map to provide implicit supervision during policy training, eliminating the need for demonstrations or labeled trajectories. Furthermore, the policy training adopts a Bi-Level Optimization (BLO) process, which combines network update and metric-based trajectory optimization, to generate a smooth and collision-free path toward the goal based on a single depth measurement. The proposed method allows task-level costs of predicted trajectories to be backpropagated through all components to update the network through direct gradient descent. In our experiments, the method demonstrates around 4x faster planning than the classic approach and robustness against localization noise. Additionally, the IL approach enables the planner to generalize to various unseen environments, resulting in an overall 26-87% improvement in SPL performance compared to baseline learning methods.