Inertial-aided systems require continuous motion excitation among other reasons to characterize the measurement biases that will enable accurate integration required for localization frameworks. This paper proposes the use of informative path planning to find the best trajectory for minimizing the uncertainty of IMU biases and an adaptive traces method to guide the planner towards trajectories which aid convergence. The key contribution is a novel regression method based on Gaussian Process (GP) to enforce continuity and differentiability between waypoints from a variant of the RRT* planning algorithm. We employ linear operators applied to the GP kernel function to infer not only continuous position trajectories, but also velocities and accelerations. The use of linear functionals enable velocity and acceleration constraints given by the IMU measurements to be imposed on the position GP model. The results from both simulation and real world experiments show that planning for IMU bias convergence helps minimize localization errors in state estimation frameworks.
Optimal path planning is the problem of finding a valid sequence of states between a start and goal that optimizes an objective. Informed path planning algorithms order their search with problem-specific knowledge expressed as heuristics and can be orders of magnitude more efficient than uninformed algorithms. Heuristics are most effective when they are both accurate and computationally inexpensive to evaluate, but these are often conflicting characteristics. This makes the selection of appropriate heuristics difficult for many problems. This paper presents two almost-surely asymptotically optimal sampling-based path planning algorithms to address this challenge, Adaptively Informed Trees (AIT*) and Effort Informed Trees (EIT*). These algorithms use an asymmetric bidirectional search in which both searches continuously inform each other. This allows AIT* and EIT* to improve planning performance by simultaneously calculating and exploiting increasingly accurate, problem-specific heuristics. The benefits of AIT* and EIT* relative to other sampling-based algorithms are demonstrated on twelve problems in abstract, robotic, and biomedical domains optimizing path length and obstacle clearance. The experiments show that AIT* and EIT* outperform other algorithms on problems optimizing obstacle clearance, where a priori cost heuristics are often ineffective, and still perform well on problems minimizing path length, where such heuristics are often effective.
Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message. In this paper, we present a novel equivariant network architecture for use in Dec-POMDPs that prevents the agent from learning policies which break symmetries, doing so more effectively than prior methods. Our method also acts as a "coordination-improvement operator" for generic, pre-trained policies, and thus may be applied at test-time in conjunction with any self-play algorithm. We provide theoretical guarantees of our work and test on the AI benchmark task of Hanabi, where we demonstrate our methods outperforming other symmetry-aware baselines in zero-shot coordination, as well as able to improve the coordination ability of a variety of pre-trained policies. In particular, we show our method can be used to improve on the state of the art for zero-shot coordination on the Hanabi benchmark.
Motion planning and control are crucial components of robotics applications. Here, spatio-temporal hard constraints like system dynamics and safety boundaries (e.g., obstacles in automated driving) restrict the robot's motions. Direct methods from optimal control solve a constrained optimization problem. However, in many applications finding a proper cost function is inherently difficult because of the weighting of partially conflicting objectives. On the other hand, Imitation Learning (IL) methods such as Behavior Cloning (BC) provide a intuitive framework for learning decision-making from offline demonstrations and constitute a promising avenue for planning and control in complex robot applications. Prior work primarily relied on soft-constraint approaches, which use additional auxiliary loss terms describing the constraints. However, catastrophic safety-critical failures might occur in out-of-distribution (OOD) scenarios. This work integrates the flexibility of IL with hard constraint handling in optimal control. Our approach constitutes a general framework for constraint robotic motion planning and control using offline IL. Hard constraints are integrated into the learning problem in a differentiable manner, via explicit completion and gradient-based correction. Simulated experiments of mobile robot navigation and automated driving provide evidence for the performance of the proposed method.
A good estimation of the actions' cost is key in task planning for human-robot collaboration. The duration of an action depends on agents' capabilities and the correlation between actions performed simultaneously by the human and the robot. This paper proposes an approach to learning actions' costs and coupling between actions executed concurrently by humans and robots. We leverage the information from past executions to learn the average duration of each action and a synergy coefficient representing the effect of an action performed by the human on the duration of the action performed by the robot (and vice versa). We implement the proposed method in a simulated scenario where both agents can access the same area simultaneously. Safety measures require the robot to slow down when the human is close, denoting a bad synergy of tasks operating in the same area. We show that our approach can learn such bad couplings so that a task planner can leverage this information to find better plans.
Human awareness in robot motion planning is crucial for seamless interaction with humans. Many existing techniques slow down, stop, or change the robot's trajectory locally to avoid collisions with humans. Although using the information on the human's state in the path planning phase could reduce future interference with the human's movements and make safety stops less frequent, such an approach is less widespread. This paper proposes a novel approach to embedding a human model in the robot's path planner. The method explicitly addresses the problem of minimizing the path execution time, including slowdowns and stops owed to the proximity of humans. For this purpose, it converts safety speed limits into configuration-space cost functions that drive the path's optimization. The costmap can be updated based on the observed or predicted state of the human. The method can handle deterministic and probabilistic representations of the human state and is independent of the prediction algorithm. Numerical and experimental results on an industrial collaborative cell demonstrate that the proposed approach consistently reduces the robot's execution time and avoids unnecessary safety speed reductions.
When robots enter everyday human environments, they need to understand their tasks and how they should perform those tasks. To encode these, reward functions, which specify the objective of a robot, are employed. However, designing reward functions can be extremely challenging for complex tasks and environments. A promising approach is to learn reward functions from humans. Recently, several robot learning works embrace this approach and leverage human demonstrations to learn the reward functions. Known as inverse reinforcement learning, this approach relies on a fundamental assumption: humans can provide near-optimal demonstrations to the robot. Unfortunately, this is rarely the case: human demonstrations to the robot are often suboptimal due to various reasons, e.g., difficulty of teleoperation, robot having high degrees of freedom, or humans' cognitive limitations. This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities. Specifically, we study how reward functions can be learned using comparative feedback, in which the human user compares multiple robot trajectories instead of (or in addition to) providing demonstrations. To this end, we first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function, which may be parametric or non-parametric. Next, we propose active learning techniques to enable the robot to ask for comparison feedback that optimizes for the expected information that will be gained from that user feedback. Finally, we demonstrate the applicability of our methods in a wide variety of domains, ranging from autonomous driving simulations to home robotics, from standard reinforcement learning benchmarks to lower-body exoskeletons.
Designing autonomous aerial robot team systems remains a grand challenge in robotics. Existing works in this field can be categorized as centralized and decentralized. Centralized methods suffer from scale dilemmas, while decentralized ones often lead to poor planning quality. In this paper, we propose an enhanced decentralized autonomous aerial robot team system with group planning. According to the spatial distribution of agents, the system dynamically divides the team into several groups and isolated agents. For conflicts within each group, we propose a novel coordination mechanism named group planning. The group planning consists of efficient multi-agent pathfinding (MAPF) and trajectory joint optimization, which can significantly improve planning quality and success rate. We demonstrate through simulations and real-world experiments that our method not only has applicability for a large-scale team but also has top-level planning quality
Collaborative robots (cobots) are machines designed to work safely alongside people in human-centric environments. Providing cobots with the ability to quickly infer the inertial parameters of manipulated objects will improve their flexibility and enable greater usage in manufacturing and other areas. To ensure safety, cobots are subject to kinematic limits that result in low signal-to-noise ratios (SNR) for velocity, acceleration, and force-torque data. This renders existing inertial parameter identification algorithms prohibitively slow and inaccurate. Motivated by the desire for faster model acquisition, we investigate the use of an approximation of rigid body dynamics to improve the SNR. Additionally, we introduce a mass discretization method that can make use of shape information to quickly identify plausible inertial parameters for a manipulated object. We present extensive simulation studies and real-world experiments demonstrating that our approach complements existing inertial parameter identification methods by specifically targeting the typical cobot operating regime.
Command, Control, Communication, and Intelligence (C3I) system is a kind of system-of-system that integrates computing machines, sensors, and communication networks. C3I systems are increasingly used in critical civil and military operations for achieving information superiority, assurance, and operational efficacy. C3I systems are no exception to the traditional systems facing widespread cyber-threats. However, the sensitive nature of the application domain (e.g., military operations) of C3I systems makes their security a critical concern. For instance, a cyber-attack on military installations can have detrimental impacts on national security. Therefore, in this paper, we review the state-of-the-art on the security of C3I systems. In particular, this paper aims to identify the security vulnerabilities, attack vectors, and countermeasures for C3I systems. We used the well-known systematic literature review method to select and review 77 studies on the security of C3I systems. Our review enabled us to identify 27 vulnerabilities, 22 attack vectors, and 62 countermeasures for C3I systems. This review has also revealed several areas for future research and identified key lessons with regards to C3I systems' security.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.