Robots are still poor at traversing cluttered large obstacles required for important applications like search and rescue. By contrast, animals are excellent at doing so, often using direct physical interaction with obstacles rather than avoiding them. Here, towards understanding the dynamics of cluttered obstacle traversal, we developed a minimalistic stochastic dynamics simulation inspired by our recent study of insects traversing grass-like beams. The 2-D model system consists of a forward self-propelled circular locomotor translating on a frictionless level plane with a lateral random force and interacting with two adjacent horizontal beams that form a gate. We found that traversal probability increases monotonically with propulsive force, but first increases then decreases with random force magnitude. For asymmetric beams with different stiffness, traversal is more likely towards the side of the less stiff beam. These observations are in accord with those expected from a potential energy landscape approach. Furthermore, we extended the single gate in a lattice configuration to form a large cluttered obstacle field. A Markov chain Monte Carlo method was applied to predict traversal in the large field, using the input-output probability map obtained from single gate simulations. This method achieved high accuracy in predicting the statistical distribution of the final location of the body within the obstacle field, while saving computation time by a factor of 10^5.
This paper presents an inverse reinforcement learning~(IRL) framework for Bayesian stopping time problems. By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function. In a Bayesian (partially observed) setting, the inverse learner can at best identify optimality wrt the observed actions. Our IRL algorithm identifies optimality and then constructs set valued estimates of the cost function. To achieve this IRL objective, we use novel ideas from Bayesian revealed preferences stemming from microeconomics. We illustrate the proposed IRL scheme using two important examples of stopping time problems, namely, sequential hypothesis testing and Bayesian search. Finally, for finite datasets, we propose an IRL detection algorithm and give finite sample bounds on its error probabilities.
In this paper, we present an innovative risk-bounded motion planning methodology for stochastic multi-agent systems. For this methodology, the disturbance, noise, and model uncertainty are considered; and a velocity obstacle method is utilized to formulate the collision-avoidance constraints in the velocity space. With the exploitation of geometric information of static obstacles and velocity obstacles, a distributed optimization problem with probabilistic chance constraints is formulated for the stochastic multi-agent system. Consequently, collision-free trajectories are generated under a prescribed collision risk bound. Due to the existence of probabilistic and disjunctive constraints, the distributed chance-constrained optimization problem is reformulated as a mixed-integer program by introducing the binary variable to improve computational efficiency. This approach thus renders it possible to execute the motion planning task in the velocity space instead of the position space, which leads to smoother collision-free trajectories for multi-agent systems and higher computational efficiency. Moreover, the risk of potential collisions is bounded with this robust motion planning methodology. To validate the effectiveness of the methodology, different scenarios for multiple agents are investigated, and the simulation results clearly show that the proposed approach can generate high-quality trajectories under a predefined collision risk bound and avoid potential collisions effectively in the velocity space.
Offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. The lack of environmental interactions makes the policy training vulnerable to state-action pairs far from the training dataset and prone to missing rewarding actions. For training more effective agents, we propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. We further propose a simple modification to the classical policy-matching methods for regularizing with respect to the dual form of the Jensen--Shannon divergence and the integral probability metrics. We theoretically show the correctness of the policy-matching approach, and the correctness and a good finite-sample property of our modification. An effective instantiation of our framework through the GAN structure is provided, together with techniques to explicitly smooth the state-action mapping for robust generalization beyond the static dataset. Extensive experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
Given a single trajectory of a dynamical system, we analyze the performance of the nonparametric least squares estimator (LSE). More precisely, we give nonasymptotic expected $l^2$-distance bounds between the LSE and the true regression function, where expectation is evaluated on a fresh, counterfactual, trajectory. We leverage recently developed information-theoretic methods to establish the optimality of the LSE for nonparametric hypotheses classes in terms of supremum norm metric entropy and a subgaussian parameter. Next, we relate this subgaussian parameter to the stability of the underlying process using notions from dynamical systems theory. When combined, these developments lead to rate-optimal error bounds that scale as $T^{-1/(2+q)}$ for suitably stable processes and hypothesis classes with metric entropy growth of order $\delta^{-q}$. Here, $T$ is the length of the observed trajectory, $\delta \in \mathbb{R}_+$ is the packing granularity and $q\in (0,2)$ is a complexity term. Finally, we specialize our results to a number of scenarios of practical interest, such as Lipschitz dynamics, generalized linear models, and dynamics described by functions in certain classes of Reproducing Kernel Hilbert Spaces (RKHS).
Panel data involving longitudinal measurements of the same set of participants taken over multiple time points is common in studies to understand childhood development and disease modeling. Deep hybrid models that marry the predictive power of neural networks with physical simulators such as differential equations, are starting to drive advances in such applications. The task of modeling not just the observations but the hidden dynamics that are captured by the measurements poses interesting statistical/computational questions. We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing such panel data. We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem. We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms using MC based sampling methods and numerical ODE solvers. We demonstrate ME-NODE's utility on tasks spanning the spectrum from simulations and toy data to real longitudinal 3D imaging data from an Alzheimer's disease (AD) study, and study its performance in terms of accuracy of reconstruction for interpolation, uncertainty estimates and personalized prediction.
We propose a new approach to model the collective dynamics of a population of particles evolving with time. As is often the case in challenging scientific applications, notably single-cell genomics, measuring features for these particles requires destroying them. As a result, the population can only be monitored with periodic snapshots, obtained by sampling a few particles that are sacrificed in exchange for measurements. Given only access to these snapshots, can we reconstruct likely individual trajectories for all other particles? We propose to model these trajectories as collective realizations of a causal Jordan-Kinderlehrer-Otto (JKO) flow of measures: The JKO scheme posits that the new configuration taken by a population at time $t+1$ is one that trades off an improvement, in the sense that it decreases an energy, while remaining close (in Wasserstein distance) to the previous configuration observed at $t$. In order to learn such an energy using only snapshots, we propose JKOnet, a neural architecture that computes (in end-to-end differentiable fashion) the JKO flow given a parametric energy and initial configuration of points. We demonstrate the good performance and robustness of the JKOnet fitting procedure, compared to a more direct forward method.
Diffusion probabilistic models (DPMs) represent a class of powerful generative models. Despite their success, the inference of DPMs is expensive since it generally needs to iterate over thousands of timesteps. A key problem in the inference is to estimate the variance in each timestep of the reverse process. In this work, we present a surprising result that both the optimal reverse variance and the corresponding optimal KL divergence of a DPM have analytic forms w.r.t. its score function. Building upon it, we propose Analytic-DPM, a training-free inference framework that estimates the analytic forms of the variance and KL divergence using the Monte Carlo method and a pretrained score-based model. Further, to correct the potential bias caused by the score-based model, we derive both lower and upper bounds of the optimal variance and clip the estimate for a better result. Empirically, our analytic-DPM improves the log-likelihood of various DPMs, produces high-quality samples, and meanwhile enjoys a 20x to 80x speed up.
We consider the problem of autonomous mobile robot exploration in an unknown environment, taking into account a robot's coverage rate, map uncertainty, and state estimation uncertainty. This paper presents a novel exploration framework for underwater robots operating in cluttered environments, built upon simultaneous localization and mapping (SLAM) with imaging sonar. The proposed system comprises path generation, place recognition forecasting, belief propagation and utility evaluation using a virtual map, which estimates the uncertainty associated with map cells throughout a robot's workspace. We evaluate the performance of this framework in simulated experiments, showing that our algorithm maintains a high coverage rate during exploration while also maintaining low mapping and localization error. The real-world applicability of our framework is also demonstrated on an underwater remotely operated vehicle (ROV) exploring a harbor environment.
Machine learning methods are powerful in distinguishing different phases of matter in an automated way and provide a new perspective on the study of physical phenomena. We train a Restricted Boltzmann Machine (RBM) on data constructed with spin configurations sampled from the Ising Hamiltonian at different values of temperature and external magnetic field using Monte Carlo methods. From the trained machine we obtain the flow of iterative reconstruction of spin state configurations to faithfully reproduce the observables of the physical system. We find that the flow of the trained RBM approaches the spin configurations of the maximal possible specific heat which resemble the near criticality region of the Ising model. In the special case of the vanishing magnetic field the trained RBM converges to the critical point of the Renormalization Group (RG) flow of the lattice model. Our results suggest an alternative explanation of how the machine identifies the physical phase transitions, by recognizing certain properties of the configuration like the maximization of the specific heat, instead of associating directly the recognition procedure with the RG flow and its fixed points. Then from the reconstructed data we deduce the critical exponent associated to the magnetization to find satisfactory agreement with the actual physical value. We assume no prior knowledge about the criticality of the system and its Hamiltonian.
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.