Minimum-time navigation within constrained and dynamic environments is of special relevance in robotics. Seeking time-optimality, while guaranteeing the integrity of time-varying spatial bounds, is an appealing trade-off for agile vehicles, such as quadrotors. State of the art approaches, either assume bounds to be static and generate time-optimal trajectories offline, or compromise time-optimality for constraint satisfaction. Leveraging nonlinear model predictive control and a path parametric reformulation of the quadrotor model, we present a real-time control that approximates time-optimal behavior and remains within dynamic corridors. The efficacy of the approach is evaluated according to simulated results, showing itself capable of performing extremely aggressive maneuvers as well as stop-and-go and backward motions.
While visual imitation learning offers one of the most effective ways of learning from visual demonstrations, generalizing from them requires either hundreds of diverse demonstrations, task specific priors, or large, hard-to-train parametric models. One reason such complexities arise is because standard visual imitation frameworks try to solve two coupled problems at once: learning a succinct but good representation from the diverse visual data, while simultaneously learning to associate the demonstrated actions with such representations. Such joint learning causes an interdependence between these two problems, which often results in needing large amounts of demonstrations for learning. To address this challenge, we instead propose to decouple representation learning from behavior learning for visual imitation. First, we learn a visual representation encoder from offline data using standard supervised and self-supervised learning methods. Once the representations are trained, we use non-parametric Locally Weighted Regression to predict the actions. We experimentally show that this simple decoupling improves the performance of visual imitation models on both offline demonstration datasets and real-robot door opening compared to prior work in visual imitation. All of our generated data, code, and robot videos are publicly available at //jyopari.github.io/VINN/.
We consider an optimal control problem for the steady-state Kirchhoff equation, a prototype for nonlocal partial differential equations, different from fractional powers of closed operators. Existence and uniqueness of solutions of the state equation, existence of global optimal solutions, differentiability of the control-to-state map and first-order necessary optimality conditions are established. The aforementioned results require the controls to be functions in $H^1$ and subject to pointwise upper and lower bounds. In order to obtain the Newton differentiability of the optimality conditions, we employ a Moreau-Yosida-type penalty approach to treat the control constraints and study its convergence. The first-order optimality conditions of the regularized problems are shown to be Newton diffentiable, and a generalized Newton method is detailed. A discretization of the optimal control problem by piecewise linear finite elements is proposed and numerical results are presented.
We consider the setting of vector valued non-linear dynamical systems $X_{t+1} = \phi(A^* X_t) + \eta_t$, where $\eta_t$ is unbiased noise and $\phi : \mathbb{R} \to \mathbb{R}$ is a known link function that satisfies certain {\em expansivity property}. The goal is to learn $A^*$ from a single trajectory $X_1,\cdots,X_T$ of {\em dependent or correlated} samples. While the problem is well-studied in the linear case, where $\phi$ is identity, with optimal error rates even for non-mixing systems, existing results in the non-linear case hold only for mixing systems. In this work, we improve existing results for learning nonlinear systems in a number of ways: a) we provide the first offline algorithm that can learn non-linear dynamical systems without the mixing assumption, b) we significantly improve upon the sample complexity of existing results for mixing systems, c) in the much harder one-pass, streaming setting we study a SGD with Reverse Experience Replay ($\mathsf{SGD-RER}$) method, and demonstrate that for mixing systems, it achieves the same sample complexity as our offline algorithm, d) we justify the expansivity assumption by showing that for the popular ReLU link function -- a non-expansive but easy to learn link function with i.i.d. samples -- any method would require exponentially many samples (with respect to dimension of $X_t$) from the dynamical system. We validate our results via. simulations and demonstrate that a naive application of SGD can be highly sub-optimal. Indeed, our work demonstrates that for correlated data, specialized methods designed for the dependency structure in data can significantly outperform standard SGD based methods.
An autonomous experimentation platform in manufacturing is supposedly capable of conducting a sequential search for finding suitable manufacturing conditions for advanced materials by itself or even for discovering new materials with minimal human intervention. The core of the intelligent control of such platforms is the policy directing sequential experiments, namely, to decide where to conduct the next experiment based on what has been done thus far. Such policy inevitably trades off exploitation versus exploration and the current practice is under the Bayesian optimization framework using the expected improvement criterion or its variants. We discuss whether it is beneficial to trade off exploitation versus exploration by measuring the element and degree of surprise associated with the immediate past observation. We devise a surprise-reacting policy using two existing surprise metrics, known as the Shannon surprise and Bayesian surprise. Our analysis shows that the surprise-reacting policy appears to be better suited for quickly characterizing the overall landscape of a response surface or a design place under resource constraints. We argue that such capability is much needed for futuristic autonomous experimentation platforms. We do not claim that we have a fully autonomous experimentation platform, but believe that our current effort sheds new lights or provides a different view angle as researchers are racing to elevate the autonomy of various primitive autonomous experimentation systems.
In Chen and Zhou 2021, they consider an inference problem for an Ornstein-Uhlenbeck process driven by a general one-dimensional centered Gaussian process $(G_t)_{t\ge 0}$. The second order mixed partial derivative of the covariance function $ R(t,\, s)=\mathbb{E}[G_t G_s]$ can be decomposed into two parts, one of which coincides with that of fractional Brownian motion and the other is bounded by $(ts)^{H-1}$ with $H\in (\frac12,\,1)$, up to a constant factor. In this paper, we investigate the same problem but with the assumption of $H\in (0,\,\frac12)$. The starting point of this paper is a new relationship between the inner product of $\mathfrak{H}$ and that of the Hilbert space $\mathfrak{H}_1$ associated with the fractional Brownian motion $(B^{H}_t)_{t\ge 0}$. Based on this relationship and some known estimation of the inner product of $\mathfrak{H}_1$, we prove the strong consistency with $H\in (0, \frac12)$, and the asymptotic normality and the Berry-Ess\'{e}en bounds with $H\in (0,\frac38)$ for both the least squares estimator and the moment estimator of the drift parameter constructed from the continuous observations.
We study the non-parametric estimation of the value ${\theta}(f )$ of a linear functional evaluated at an unknown density function f with support on $R_+$ based on an i.i.d. sample with multiplicative measurement errors. The proposed estimation procedure combines the estimation of the Mellin transform of the density $f$ and a regularisation of the inverse of the Mellin transform by a spectral cut-off. In order to bound the mean squared error we distinguish several scenarios characterised through different decays of the upcoming Mellin transforms and the smoothnes of the linear functional. In fact, we identify scenarios, where a non-trivial choice of the upcoming tuning parameter is necessary and propose a data-driven choice based on a Goldenshluger-Lepski method. Additionally, we show minimax-optimality over Mellin-Sobolev spaces of the estimator.
The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizing for expected performance, many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.
We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate $\widetilde{O}(B_{\star} \sqrt{S A K})$, where $K$ is the number of episodes, $S$ is the number of states, $A$ is the number of actions and $B_{\star}$ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of $B_{\star}$, nor of $T_{\star}$ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of $T_{\star}$ is available) where the regret only contains a logarithmic dependence on $T_{\star}$, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.
UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.
This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and to adapt its gaze control strategy for human-robot interaction neither with the use of external sensors nor with human supervision. The robot learns to focus its attention onto groups of people from its own audio-visual experiences, independently of the number of people, of their positions and of their physical appearances. In particular, we use a recurrent neural network architecture in combination with Q-learning to find an optimal action-selection policy; we pre-train the network using a simulated environment that mimics realistic scenarios that involve speaking/silent participants, thus avoiding the need of tedious sessions of a robot interacting with people. Our experimental evaluation suggests that the proposed method is robust against parameter estimation, i.e. the parameter values yielded by the method do not have a decisive impact on the performance. The best results are obtained when both audio and visual information is jointly used. Experiments with the Nao robot indicate that our framework is a step forward towards the autonomous learning of socially acceptable gaze behavior.