Bio-inspired sensorimotor control systems may be appealing to roboticists who try to solve problems of multiDOF humanoids and human-robot interactions. This paper presents a simple posture control concept from neuroscience, called disturbance estimation and compensation, DEC concept [1]. It provides human-like mechanical compliance due to low loop gain, tolerance of time delays, and automatic adjustment to changes in external disturbance scenarios. Its outstanding feature is that it uses feedback of multisensory disturbance estimates rather than 'raw' sensory signals for disturbance compensation. After proof-of-principle tests in 1 and 2 DOF posture control robots, we present here a generalized DEC control module for multi-DOF robots. In the control layout, one DEC module controls one DOF (modular control architecture). Modules of neighboring joints are synergistically interconnected using vestibular information in combination with joint angle and torque signals. These sensory interconnections allow each module to control the kinematics of the more distal links as if they were a single link. This modular design makes the complexity of the robot control scale linearly with the DOFs and error robustness high compared to monolithic control architectures. The presented concept uses Matlab/Simulink (The MathWorks, Natick, USA) for both, model simulation and robot control and will be available as open library
The control of pneumatically driven soft robots typically requires electronics. Microcontrollers are connected to power electronics that switch valves and pumps on and off. As a recent alternative, fluidic control methods have been introduced, in which soft digital logic gates permit multiple actuation states to be achieved in soft systems. Such systems have demonstrated autonomous behaviors without the use of electronics. However, fluidic controllers have required complex fabrication processes. To democratize the exploration of fluidic controllers, we developed tube-balloon logic circuitry, which consists of logic gates made from straws and balloons. Each tube-balloon logic device takes a novice five minutes to fabricate and costs $0.45. Tube-balloon logic devices can also operate at pressures of up to 200 kPa and oscillate at frequencies of up to 15 Hz. We configure the tube-balloon logic device as NOT-, NAND-, and NOR-gates and assemble them into a three-ring oscillator to demonstrate a vibrating sieve that separates sugar from rice. Because tube-balloon logic devices are low-cost, easy to fabricate, and their operating principle is simple, they are well suited for exploring fundamental concepts of fluidic control schemes while encouraging design inquiry for pneumatically driven soft robots
Multi-agent path finding (MAPF) has been widely used to solve large-scale real-world problems, e.g. automation warehouse. The learning-based fully decentralized framework has been introduced to simultaneously alleviate real-time problem and pursuit the optimal planning policy. However, existing methods might generate significantly more vertex conflicts (called collision), which lead to low success rate or more makespan. In this paper, we propose a PrIoritized COmmunication learning method (PICO), which incorporates the implicit planning priorities into the communication topology within the decentralized multi-agent reinforcement learning framework. Assembling with the classic coupled planners, the implicit priority learning module can be utilized to form the dynamic communication topology, which also build an effective collision-avoiding mechanism. PICO performs significantly better in large-scale multi-agent path finding tasks in both success rates and collision rates than state-of-the-art learning-based planners.
Given a Markov decision process (MDP) and a linear-time ($\omega$-regular or LTL) specification, the controller synthesis problem aims to compute the optimal policy that satisfies the specification. More recently, problems that reason over the asymptotic behavior of systems have been proposed through the lens of steady-state planning. This entails finding a control policy for an MDP such that the Markov chain induced by the solution policy satisfies a given set of constraints on its steady-state distribution. This paper studies a generalization of the controller synthesis problem for a linear-time specification under steady-state constraints on the asymptotic behavior. We present an algorithm to find a deterministic policy satisfying $\omega$-regular and steady-state constraints by characterizing the solutions as an integer linear program, and experimentally evaluate our approach.
Consider the problem of covertly controlling a linear system. In this problem, Alice desires to control (stabilize or change the parameters of) a linear system, while keeping an observer, Willie, unable to decide if the system is indeed being controlled or not. We formally define the problem, under two different models: (i) When Willie can only observe the system's output (ii) When Willie can directly observe the control signal. Focusing on AR(1) systems, we show that when Willie observes the system's output through a clean channel, an inherently unstable linear system can not be covertly stabilized. However, an inherently stable linear system can be covertly controlled, in the sense of covertly changing its parameter. Moreover, we give direct and converse results for two important controllers: a minimal-information controller, where Alice is allowed to used only $1$ bit per sample, and a maximal-information controller, where Alice is allowed to view the real-valued output. Unlike covert communication, where the trade-off is between rate and covertness, the results reveal an interesting \emph{three--fold} trade--off in covert control: the amount of information used by the controller, control performance and covertness. To the best of our knowledge, this is the first study formally defining covert control.
While many works exploiting an existing Lie group structure have been proposed for state estimation, in particular the Invariant Extended Kalman Filter (IEKF), few papers address the construction of a group structure that allows casting a given system into the framework of invariant filtering. In this paper we introduce a large class of systems encompassing most problems involving a navigating vehicle encountered in practice. For those systems we introduce a novel methodology that systematically provides a group structure for the state space, including vectors of the body frame such as biases. We use it to derive observers having properties akin to those of linear observers or filters. The proposed unifying and versatile framework encompasses all systems where IEKF has proved successful, improves state-of-the art "imperfect" IEKF for inertial navigation with sensor biases, and allows addressing novel examples, like GNSS antenna lever arm estimation.
We show that Gottesman's semantics (GROUP22, 1998) for Clifford circuits based on the Heisenberg representation can be treated as a type system that can efficiently characterize a common subset of quantum programs. Our applications include (i) certifying whether auxiliary qubits can be safely disposed of, (ii) determining if a system is separable across a given bi-partition, (iii) checking the transversality of a gate with respect to a given stabilizer code, and (iv) typing post-measurement states for computational basis measurements. Further, this type system is extended to accommodate universal quantum computing by deriving types for the $T$-gate, multiply-controlled unitaries such as the Toffoli gate, and some gate injection circuits that use associated magic states. These types allow us to prove a lower bound on the number of $T$ gates necessary to perform a multiply-controlled $Z$ gate.
Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.
Autonomous urban driving navigation with complex multi-agent dynamics is under-explored due to the difficulty of learning an optimal driving policy. The traditional modular pipeline heavily relies on hand-designed rules and the pre-processing perception system while the supervised learning-based models are limited by the accessibility of extensive human experience. We present a general and principled Controllable Imitative Reinforcement Learning (CIRL) approach which successfully makes the driving agent achieve higher success rates based on only vision inputs in a high-fidelity car simulator. To alleviate the low exploration efficiency for large continuous action space that often prohibits the use of classical RL on challenging real tasks, our CIRL explores over a reasonably constrained action space guided by encoded experiences that imitate human demonstrations, building upon Deep Deterministic Policy Gradient (DDPG). Moreover, we propose to specialize adaptive policies and steering-angle reward designs for different control signals (i.e. follow, straight, turn right, turn left) based on the shared representations to improve the model capability in tackling with diverse cases. Extensive experiments on CARLA driving benchmark demonstrate that CIRL substantially outperforms all previous methods in terms of the percentage of successfully completed episodes on a variety of goal-directed driving tasks. We also show its superior generalization capability in unseen environments. To our knowledge, this is the first successful case of the learned driving policy through reinforcement learning in the high-fidelity simulator, which performs better-than supervised imitation learning.
Deep hierarchical reinforcement learning has gained a lot of attention in recent years due to its ability to produce state-of-the-art results in challenging environments where non-hierarchical frameworks fail to learn useful policies. However, as problem domains become more complex, deep hierarchical reinforcement learning can become inefficient, leading to longer convergence times and poor performance. We introduce the Deep Nested Agent framework, which is a variant of deep hierarchical reinforcement learning where information from the main agent is propagated to the low level $nested$ agent by incorporating this information into the nested agent's state. We demonstrate the effectiveness and performance of the Deep Nested Agent framework by applying it to three scenarios in Minecraft with comparisons to a deep non-hierarchical single agent framework, as well as, a deep hierarchical framework.
This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and to adapt its gaze control strategy for human-robot interaction neither with the use of external sensors nor with human supervision. The robot learns to focus its attention onto groups of people from its own audio-visual experiences, independently of the number of people, of their positions and of their physical appearances. In particular, we use a recurrent neural network architecture in combination with Q-learning to find an optimal action-selection policy; we pre-train the network using a simulated environment that mimics realistic scenarios that involve speaking/silent participants, thus avoiding the need of tedious sessions of a robot interacting with people. Our experimental evaluation suggests that the proposed method is robust against parameter estimation, i.e. the parameter values yielded by the method do not have a decisive impact on the performance. The best results are obtained when both audio and visual information is jointly used. Experiments with the Nao robot indicate that our framework is a step forward towards the autonomous learning of socially acceptable gaze behavior.