Soft robotics is an emergent and swiftly evolving field. Pneumatic actuators are suitable for driving soft robots because of their superior performance. However, their control is not easy due to their hysteresis characteristics. In response to these challenges, we propose an adaptive control method to compensate hysteresis of a soft actuator. Employing a novel dual pneumatic artificial muscle (PAM) bending actuator, the innovative control strategy abates hysteresis effects by dynamically modulating gains within a traditional PID controller corresponding with the predicted motion of the reference trajectory. Through comparative experimental evaluation, we found that the new control method outperforms its conventional counterparts regarding tracking accuracy and response speed. Our work reveals a new direction for advancing control in soft actuators.
Humanoid robots are well suited for human habitats due to their morphological similarity, but developing controllers for them is a challenging task that involves multiple sub-problems, such as control, planning and perception. In this paper, we introduce a method to simplify controller design by enabling users to train and fine-tune robot control policies using natural language commands. We first learn a neural network policy that generates behaviors given a natural language command, such as "walk forward", by combining Large Language Models (LLMs), motion retargeting, and motion imitation. Based on the synthesized motion, we iteratively fine-tune by updating the text prompt and querying LLMs to find the best checkpoint associated with the closest motion in history. We validate our approach using a simulated Digit humanoid robot and demonstrate learning of diverse motions, such as walking, hopping, and kicking, without the burden of complex reward engineering. In addition, we show that our iterative refinement enables us to learn 3x times faster than a naive formulation that learns from scratch.
Optimal Control for legged robots has gone through a paradigm shift from position-based to torque-based control, owing to the latter's compliant and robust nature. In parallel to this shift, the community has also turned to Deep Reinforcement Learning (DRL) as a promising approach to directly learn locomotion policies for complex real-life tasks. However, most end-to-end DRL approaches still operate in position space, mainly because learning in torque space is often sample-inefficient and does not consistently converge to natural gaits. To address these challenges, we introduce Decaying Action Priors (DecAP), a novel three-stage framework to learn and deploy torque policies for legged locomotion. In the first stage, we generate our own imitation data by training a position policy, eliminating the need for expert knowledge in designing optimal controllers. The second stage incorporates decaying action priors to enhance the exploration of torque-based policies aided by imitation rewards. We show that our approach consistently outperforms imitation learning alone and is significantly robust to the scaling of these rewards. Finally, our third stage facilitates safe sim-to-real transfer by directly deploying our learned torques, alongside low-gain PID control from our trained position policy. We demonstrate the generality of our approach by training torque-based locomotion policies for a biped, a quadruped, and a hexapod robot in simulation, and experimentally demonstrate our learned policies on a quadruped (Unitree Go1).
Coalition is an important mean of multi-robot systems to collaborate on common tasks. An adaptive coalition strategy is essential for the online performance in dynamic and unknown environments. In this work, the problem of territory defense by large-scale heterogeneous robotic teams is considered. The tasks include exploration, capture of dynamic targets, and perimeter defense over valuable resources. Since each robot can choose among many tasks, it remains a challenging problem to coordinate jointly these robots such that the overall utility is maximized. This work proposes a generic coalition strategy called K-serial stable coalition algorithm. Different from centralized approaches, it is distributed and complete, meaning that only local communication is required and a K-serial Stable solution is ensured. Furthermore, to accelerate adaptation to dynamic targets and resource distribution that are only perceived online, a heterogeneous graph attention network based heuristic is learned to select more appropriate parameters and promising initial solutions during local optimization. Compared with manual heuristics or end-to-end predictors, it is shown to both improve online adaptability and retain the quality guarantee. The proposed methods are validated via large-scale simulations with 170 robots and hardware experiments of 13 robots, against several strong baselines such as GreedyNE and FastMaxSum.
Expressive human speech generally abounds with rich and flexible speech prosody variations. The speech prosody predictors in existing expressive speech synthesis methods mostly produce deterministic predictions, which are learned by directly minimizing the norm of prosody prediction error. Its unimodal nature leads to a mismatch with ground truth distribution and harms the model's ability in making diverse predictions. Thus, we propose a novel prosody predictor based on the denoising diffusion probabilistic model to take advantage of its high-quality generative modeling and training stability. Experiment results confirm that the proposed prosody predictor outperforms the deterministic baseline on both the expressiveness and diversity of prediction results with even fewer network parameters.
Robotic grasping refers to making a robotic system pick an object by applying forces and torques on its surface. Many recent studies use data-driven approaches to address grasping, but the sparse reward nature of this task made the learning process challenging to bootstrap. To avoid constraining the operational space, an increasing number of works propose grasping datasets to learn from. But most of them are limited to simulations. The present paper investigates how automatically generated grasps can be exploited in the real world. More than 7000 reach-and-grasp trajectories have been generated with Quality-Diversity (QD) methods on 3 different arms and grippers, including parallel fingers and a dexterous hand, and tested in the real world. Conducted analysis on the collected measure shows correlations between several Domain Randomization-based quality criteria and sim-to-real transferability. Key challenges regarding the reality gap for grasping have been identified, stressing matters on which researchers on grasping should focus in the future. A QD approach has finally been proposed for making grasps more robust to domain randomization, resulting in a transfer ratio of 84% on the Franka Research 3 arm.
Assistive devices, such as exoskeletons and prostheses, have revolutionized the field of rehabilitation and mobility assistance. Efficiently detecting transitions between different activities, such as walking, stair ascending and descending, and sitting, is crucial for ensuring adaptive control and enhancing user experience. We here present an approach for real-time transition detection, aimed at optimizing the processing-time performance. By establishing activity-specific threshold values through trained machine learning models, we effectively distinguish motion patterns and we identify transition moments between locomotion modes. This threshold-based method improves real-time embedded processing time performance by up to 11 times compared to machine learning approaches. The efficacy of the developed finite-state machine is validated using data collected from three different measurement systems. Moreover, experiments with healthy participants were conducted on an active pelvis orthosis to validate the robustness and reliability of our approach. The proposed algorithm achieved high accuracy in detecting transitions between activities. These promising results show the robustness and reliability of the method, reinforcing its potential for integration into practical applications.
Foundation models (FMs) are general-purpose artificial intelligence (AI) models that have recently enabled multiple brand-new generative AI applications. The rapid advances in FMs serve as an important contextual backdrop for the vision of next-generation wireless networks, where federated learning (FL) is a key enabler of distributed network intelligence. Currently, the exploration of the interplay between FMs and FL is still in its nascent stage. Naturally, FMs are capable of boosting the performance of FL, and FL could also leverage decentralized data and computing resources to assist in the training of FMs. However, the exceptionally high requirements that FMs have for computing resources, storage, and communication overhead would pose critical challenges to FL-enabled wireless networks. In this article, we explore the extent to which FMs are suitable for FL over wireless networks, including a broad overview of research challenges and opportunities. In particular, we discuss multiple new paradigms for realizing future intelligent networks that integrate FMs and FL. We also consolidate several broad research directions associated with these paradigms.
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.