For the best human-robot interaction experience, the robot's navigation policy should take into account personal preferences of the user. In this paper, we present a learning framework complemented by a perception pipeline to train a depth vision-based, personalized navigation controller from user demonstrations. Our virtual reality interface enables the demonstration of robot navigation trajectories under motion of the user for dynamic interaction scenarios. The novel perception pipeline enrolls a variational autoencoder in combination with a motion predictor. It compresses the perceived depth images to a latent state representation to enable efficient reasoning of the learning agent about the robot's dynamic environment. In a detailed analysis and ablation study, we evaluate different configurations of the perception pipeline. To further quantify the navigation controller's quality of personalization, we develop and apply a novel metric to measure preference reflection based on the Fr\'echet Distance. We discuss the robot's navigation performance in various virtual scenes and demonstrate the first personalized robot navigation controller that solely relies on depth images. A supplemental video highlighting our approach is available online.
Learning algorithms, like Quality-Diversity (QD), can be used to acquire repertoires of diverse robotics skills. This learning is commonly done via computer simulation due to the large number of evaluations required. However, training in a virtual environment generates a gap between simulation and reality. Here, we build upon the Reset-Free QD (RF-QD) algorithm to learn controllers directly on a physical robot. This method uses a dynamics model, learned from interactions between the robot and the environment, to predict the robot's behaviour and improve sample efficiency. A behaviour selection policy filters out uninteresting or unsafe policies predicted by the model. RF-QD also includes a recovery policy that returns the robot to a safe zone when it has walked outside of it, allowing continuous learning. We demonstrate that our method enables a physical quadruped robot to learn a repertoire of behaviours in two hours without human supervision. We successfully test the solution repertoire using a maze navigation task. Finally, we compare our approach to the MAP-Elites algorithm. We show that dynamics awareness and a recovery policy are required for training on a physical robot for optimal archive generation. Video available at //youtu.be/BgGNvIsRh7Q
Cooperative perception is a promising technique for enhancing the perception capabilities of automated vehicles through vehicle-to-everything (V2X) cooperation, provided that accurate relative pose transforms are available. Nevertheless, obtaining precise positioning information often entails high costs associated with navigation systems. Moreover, signal drift resulting from factors such as occlusion and multipath effects can compromise the stability of the positioning information. Hence, a low-cost and robust method is required to calibrate relative pose information for multi-agent cooperative perception. In this paper, we propose a simple but effective inter-agent object association approach (CBM), which constructs contexts using the detected bounding boxes, followed by local context matching and global consensus maximization. Based on the matched correspondences, optimal relative pose transform is estimated, followed by cooperative perception fusion. Extensive experimental studies are conducted on both the simulated and real-world datasets, high object association precision and decimeter level relative pose calibration accuracy is achieved among the cooperating agents even with larger inter-agent localization errors. Furthermore, the proposed approach outperforms the state-of-the-art methods in terms of object association and relative pose estimation accuracy, as well as the robustness of cooperative perception against the pose errors of the connected agents. The code will be available at //github.com/zhyingS/CBM.
The Teacher-Student Framework (TSF) is a reinforcement learning setting where a teacher agent guards the training of a student agent by intervening and providing online demonstrations. Assuming optimal, the teacher policy has the perfect timing and capability to intervene in the learning process of the student agent, providing safety guarantee and exploration guidance. Nevertheless, in many real-world settings it is expensive or even impossible to obtain a well-performing teacher policy. In this work, we relax the assumption of a well-performing teacher and develop a new method that can incorporate arbitrary teacher policies with modest or inferior performance. We instantiate an Off-Policy Reinforcement Learning algorithm, termed Teacher-Student Shared Control (TS2C), which incorporates teacher intervention based on trajectory-based value estimation. Theoretical analysis validates that the proposed TS2C algorithm attains efficient exploration and substantial safety guarantee without being affected by the teacher's own performance. Experiments on various continuous control tasks show that our method can exploit teacher policies at different performance levels while maintaining a low training cost. Moreover, the student policy surpasses the imperfect teacher policy in terms of higher accumulated reward in held-out testing environments. Code is available at //metadriverse.github.io/TS2C.
Today's robots often assume that their behavior should be transparent. These transparent (e.g., legible, explainable) robots intentionally choose actions that convey their internal state to nearby humans. But while transparent behavior seems beneficial, is it actually optimal? In this paper we consider collaborative settings where the human and robot have the same objective, and the human is uncertain about the robot's type (i.e., the robot's internal state). We extend a recursive combination of Bayesian Nash equilibrium and the Bellman equation to solve for optimal robot policies. Interestingly, we discover that it is not always optimal for collaborative robots to be transparent; instead, human and robot teams can sometimes achieve higher rewards when the robot is opaque. Opaque robots select the same actions regardless of their internal state: because each type of opaque robot behaves in the same way, the human cannot infer the robot's type. Our analysis suggests that opaque behavior becomes optimal when either (a) human-robot interactions have a short time horizon or (b) users are slow to learn from the robot's actions. Across online and in-person user studies with 43 total participants, we find that users reach higher rewards when working with opaque partners, and subjectively rate opaque robots as about equal to transparent robots. See videos of our experiments here: //youtu.be/u8q1Z7WHUuI
With the widespread application of industrial robots, the problem of absolute positioning accuracy becomes increasingly prominent. To ensure the working state of the robots, researchers commonly adopt calibration techniques to improve its accuracy. However, an industrial robot's working space is mostly restricted in real working environments, making the collected samples fail in covering the actual working space to result in the overall migration data. To address this vital issue, this work proposes a novel industrial robot calibrator that integrates a measurement configurations selection (MCS) method and an alternation-direction-method-of-multipliers with multiple planes constraints (AMPC) algorithm into its working process, whose ideas are three-fold: a) selecting a group of optimal measurement configurations based on the observability index to suppress the measurement noises, b) developing an AMPC algorithm that evidently enhances the calibration accuracy and suppresses the long-tail convergence, and c) proposing an industrial robot calibration algorithm that incorporates MCS and AMPC to optimize an industrial robot's kinematic parameters efficiently. For validating its performance, a public-available dataset (HRS-P) is established on an HRS-JR680 industrial robot. Extensive experimental results demonstrate that the proposed calibrator outperforms several state-of-the-art models in calibration accuracy.
This paper attempts to characterize the kinds of physical scenarios in which an online learning-based cognitive radar is expected to reliably outperform a fixed rule-based waveform selection strategy, as well as the converse. We seek general insights through an examination of two decision-making scenarios, namely dynamic spectrum access and multiple-target tracking. The radar scene is characterized by inducing a state-space model and examining the structure of its underlying Markov state transition matrix, in terms of entropy rate and diagonality. It is found that entropy rate is a strong predictor of online learning-based waveform selection, while diagonality is a better predictor of fixed rule-based waveform selection. We show that these measures can be used to predict first and second-order stochastic dominance relationships, which can allow system designers to make use of simple decision rules instead of more cumbersome learning approaches under certain conditions. We validate our findings through numerical results for each application and provide guidelines for future implementations.
Tracking multiple moving objects of interest (OOI) with multiple robot systems (MRS) has been addressed by active sensing that maintains a shared belief of OOIs and plans the motion of robots to maximize the information quality. Mobility of robots enables the behavior of pursuing better visibility, which is constrained by sensor field of view (FoV) and occlusion objects. We first extend prior work to detect, maintain and share occlusion information explicitly, allowing us to generate occlusion-aware planning even if a priori semantic occlusion information is unavailable. The efficacy of active sensing approaches is often evaluated according to estimation error and information gain metrics. However, these metrics do not directly explain the level of cooperative behavior engendered by the active sensing algorithms. Next, we extract different emergent cooperative behaviors that stem from the same underlying algorithms but manifest differently under differing scenarios. In particular, we highlight and demonstrate three emergent behavior patterns in active sensing MRS: (i) Change of tracking responsibility between agents when tracking trajectories with divergent directions or due to a re-allocation of the resource among heterogeneous agents; (ii) Awareness of occlusions to a trajectory and temporal leave-and-return of the sensing agent; (iii) Sharing of local occlusion objects in MRS that subsequently improves the awareness of occlusion.
The emergence of reinforcement learning (RL) methods in traffic signal control tasks has achieved better performance than conventional rule-based approaches. Most RL approaches require the observation of the environment for the agent to decide which action is optimal for a long-term reward. However, in real-world urban scenarios, missing observation of traffic states may frequently occur due to the lack of sensors, which makes existing RL methods inapplicable on road networks with missing observation. In this work, we aim to control the traffic signals in a real-world setting, where some of the intersections in the road network are not installed with sensors and thus with no direct observations around them. To the best of our knowledge, we are the first to use RL methods to tackle the traffic signal control problem in this real-world setting. Specifically, we propose two solutions: the first one imputes the traffic states to enable adaptive control, and the second one imputes both states and rewards to enable adaptive control and the training of RL agents. Through extensive experiments on both synthetic and real-world road network traffic, we reveal that our method outperforms conventional approaches and performs consistently with different missing rates. We also provide further investigations on how missing data influences the performance of our model.
Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling. A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage. In the forward diffusion stage, the input data is gradually perturbed over several steps by adding Gaussian noise. In the reverse stage, a model is tasked at recovering the original input data by learning to gradually reverse the diffusion process, step by step. Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens, i.e. low speeds due to the high number of steps involved during sampling. In this survey, we provide a comprehensive review of articles on denoising diffusion models applied in vision, comprising both theoretical and practical contributions in the field. First, we identify and present three generic diffusion modeling frameworks, which are based on denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. We further discuss the relations between diffusion models and other deep generative models, including variational auto-encoders, generative adversarial networks, energy-based models, autoregressive models and normalizing flows. Then, we introduce a multi-perspective categorization of diffusion models applied in computer vision. Finally, we illustrate the current limitations of diffusion models and envision some interesting directions for future research.
Meta-learning extracts the common knowledge acquired from learning different tasks and uses it for unseen tasks. It demonstrates a clear advantage on tasks that have insufficient training data, e.g., few-shot learning. In most meta-learning methods, tasks are implicitly related via the shared model or optimizer. In this paper, we show that a meta-learner that explicitly relates tasks on a graph describing the relations of their output dimensions (e.g., classes) can significantly improve the performance of few-shot learning. This type of graph is usually free or cheap to obtain but has rarely been explored in previous works. We study the prototype based few-shot classification, in which a prototype is generated for each class, such that the nearest neighbor search between the prototypes produces an accurate classification. We introduce "Gated Propagation Network (GPN)", which learns to propagate messages between prototypes of different classes on the graph, so that learning the prototype of each class benefits from the data of other related classes. In GPN, an attention mechanism is used for the aggregation of messages from neighboring classes, and a gate is deployed to choose between the aggregated messages and the message from the class itself. GPN is trained on a sequence of tasks from many-shot to few-shot generated by subgraph sampling. During training, it is able to reuse and update previously achieved prototypes from the memory in a life-long learning cycle. In experiments, we change the training-test discrepancy and test task generation settings for thorough evaluations. GPN outperforms recent meta-learning methods on two benchmark datasets in all studied cases.