This systems paper presents the implementation and design of RB5, a wheeled robot for autonomous long-term exploration with fewer and cheaper sensors. Requiring just an RGB-D camera and low-power computing hardware, the system consists of an experimental platform with rocker-bogie suspension. It operates in unknown and GPS-denied environments and on indoor and outdoor terrains. The exploration consists of a methodology that extends frontier- and sampling-based exploration with a path-following vector field and a state-of-the-art SLAM algorithm. The methodology allows the robot to explore its surroundings at lower update frequencies, enabling the use of lower-performing and lower-cost hardware while still retaining good autonomous performance. The approach further consists of a methodology to interact with a remotely located human operator based on an inexpensive long-range and low-power communication technology from the internet-of-things domain (i.e., LoRa) and a customized communication protocol. The results and the feasibility analysis show the possible applications and limitations of the approach.
In the realm of data-driven AI technology, the application of open-source large language models (LLMs) in robotic task planning represents a significant milestone. Recent robotic task planning methods based on open-source LLMs typically leverage vast task planning datasets to enhance models' planning abilities. While these methods show promise, they struggle with complex long-horizon tasks, which require comprehending more context and generating longer action sequences. This paper addresses this limitation by proposing MLDT, theMulti-Level Decomposition Task planning method. This method innovatively decomposes tasks at the goal-level, task-level, and action-level to mitigate the challenge of complex long-horizon tasks. In order to enhance open-source LLMs' planning abilities, we introduce a goal-sensitive corpus generation method to create high-quality training data and conduct instruction tuning on the generated corpus. Since the complexity of the existing datasets is not high enough, we construct a more challenging dataset, LongTasks, to specifically evaluate planning ability on complex long-horizon tasks. We evaluate our method using various LLMs on four datasets in VirtualHome. Our results demonstrate a significant performance enhancement in robotic task planning, showcasing MLDT's effectiveness in overcoming the limitations of existing methods based on open-source LLMs as well as its practicality in complex, real-world scenarios.
In this work, we present a model-based optimal boundary control design for an aerial robotic system composed of a quadrotor carrying a flexible cable. The whole system is modeled by partial differential equations (PDEs) combined with boundary conditions described by ordinary differential equations (ODEs). The proper orthogonal decomposition (POD) method is adopted to project the original infinite-dimensional system on a subspace spanned by orthogonal basis functions. Based on the reduced order model, nonlinear model predictive control (NMPC) is implemented online to realize shape trajectory tracking of the flexible cable in an optimal predictive fashion. The proposed reduced modeling and optimal control paradigms are numerically verified against an accurate high-dimensional FDM-based model in different scenarios and the controller's superior performance is shown compared to an optimally tuned PID controller.
This paper presents a self-supervised learning method to safely learn a motion planner for ground robots to navigate environments with dense and dynamic obstacles. When facing highly-cluttered, fast-moving, hard-to-predict obstacles, classical motion planners may not be able to keep up with limited onboard computation. For learning-based planners, high-quality demonstrations are difficult to acquire for imitation learning while reinforcement learning becomes inefficient due to the high probability of collision during exploration. To safely and efficiently provide training data, the Learning from Hallucination (LfH) approaches synthesize difficult navigation environments based on past successful navigation experiences in relatively easy or completely open ones, but unfortunately cannot address dynamic obstacles. In our new Dynamic Learning from Learned Hallucination (Dyna-LfLH), we design and learn a novel latent distribution and sample dynamic obstacles from it, so the generated training data can be used to learn a motion planner to navigate in dynamic environments. Dyna-LfLH is evaluated on a ground robot in both simulated and physical environments and achieves up to 25% better success rate compared to baselines.
Terrain-aware perception holds the potential to improve the robustness and accuracy of autonomous robot navigation in the wilds, thereby facilitating effective off-road traversals. However, the lack of multi-modal perception across various motion patterns hinders the solutions of Simultaneous Localization And Mapping (SLAM), especially when confronting non-geometric hazards in demanding landscapes. In this paper, we first propose a Terrain-Aware multI-modaL (TAIL) dataset tailored to deformable and sandy terrains. It incorporates various types of robotic proprioception and distinct ground interactions for the unique challenges and benchmark of multi-sensor fusion SLAM. The versatile sensor suite comprises stereo frame cameras, multiple ground-pointing RGB-D cameras, a rotating 3D LiDAR, an IMU, and an RTK device. This ensemble is hardware-synchronized, well-calibrated, and self-contained. Utilizing both wheeled and quadrupedal locomotion, we efficiently collect comprehensive sequences to capture rich unstructured scenarios. It spans the spectrum of scope, terrain interactions, scene changes, ground-level properties, and dynamic robot characteristics. We benchmark several state-of-the-art SLAM methods against ground truth and provide performance validations. Corresponding challenges and limitations are also reported. All associated resources are accessible upon request at \url{//tailrobot.github.io/}.
This paper presents a comprehensive study on the unified module for accelerating stable-diffusion processes, specifically focusing on the lcm-lora module. Stable-diffusion processes play a crucial role in various scientific and engineering domains, and their acceleration is of paramount importance for efficient computational performance. The standard iterative procedures for solving fixed-source discrete ordinates problems often exhibit slow convergence, particularly in optically thick scenarios. To address this challenge, unconditionally stable diffusion-acceleration methods have been developed, aiming to enhance the computational efficiency of transport equations and discrete ordinates problems. This study delves into the theoretical foundations and numerical results of unconditionally stable diffusion synthetic acceleration methods, providing insights into their stability and performance for model discrete ordinates problems. Furthermore, the paper explores recent advancements in diffusion model acceleration, including on device acceleration of large diffusion models via gpu aware optimizations, highlighting the potential for significantly improved inference latency. The results and analyses in this study provide important insights into stable diffusion processes and have important ramifications for the creation and application of acceleration methods specifically, the lcm-lora module in a variety of computing environments.
This study introduces a robust planning framework that utilizes a model predictive control (MPC) approach, enhanced by incorporating signal temporal logic (STL) specifications. This marks the first-ever study to apply STL-guided trajectory optimization for bipedal locomotion, specifically designed to handle both translational and orientational perturbations. Existing recovery strategies often struggle with reasoning complex task logic and evaluating locomotion robustness systematically, making them susceptible to failures caused by inappropriate recovery strategies or lack of robustness. To address these issues, we design an analytical robustness metric for bipedal locomotion and quantify this metric using STL specifications, which guide the generation of recovery trajectories to achieve maximum locomotion robustness. To enable safe and computational-efficient crossed-leg maneuver, we design data-driven self-leg-collision constraints that are $1000$ times faster than the traditional inverse-kinematics-based approach. Our framework outperforms a state-of-the-art locomotion controller, a standard MPC without STL, and a linear-temporal-logic-based planner in a high-fidelity dynamic simulation, especially in scenarios involving crossed-leg maneuvers. Additionally, the Cassie bipedal robot achieves robust performance under horizontal and orientational perturbations such as those observed in ship motions. These environments are validated in simulations and deployed on hardware. Furthermore, our proposed method demonstrates versatility on stepping stones and terrain-agnostic features on inclined terrains.
This paper presents the design, analysis, and performance evaluation of RicMonk, a novel three-link brachiation robot equipped with passive hook-shaped grippers. Brachiation, an agile and energy-efficient mode of locomotion observed in primates, has inspired the development of RicMonk to explore versatile locomotion and maneuvers on ladder-like structures. The robot's anatomical resemblance to gibbons and the integration of a tail mechanism for energy injection contribute to its unique capabilities. The paper discusses the use of the Direct Collocation methodology for optimizing trajectories for the robot's dynamic behaviors and stabilization of these trajectories using a Time-varying Linear Quadratic Regulator. With RicMonk we demonstrate bidirectional brachiation, and provide comparative analysis with its predecessor, AcroMonk - a two-link brachiation robot, to demonstrate that the presence of a passive tail helps improve energy efficiency. The system design, controllers, and software implementation are publicly available on GitHub and the video demonstration of the experiments can be viewed YouTube.
This paper presents an innovative large language model (LLM)-based robotic system for enhancing multi-modal human-robot interaction (HRI). Traditional HRI systems relied on complex designs for intent estimation, reasoning, and behavior generation, which were resource-intensive. In contrast, our system empowers researchers and practitioners to regulate robot behavior through three key aspects: providing high-level linguistic guidance, creating "atomic actions" and expressions the robot can use, and offering a set of examples. Implemented on a physical robot, it demonstrates proficiency in adapting to multi-modal inputs and determining the appropriate manner of action to assist humans with its arms, following researchers' defined guidelines. Simultaneously, it coordinates the robot's lid, neck, and ear movements with speech output to produce dynamic, multi-modal expressions. This showcases the system's potential to revolutionize HRI by shifting from conventional, manual state-and-flow design methods to an intuitive, guidance-based, and example-driven approach. Supplementary material can be found at //hri-eu.github.io/Lami/
We propose a data-driven control method for systems with aleatoric uncertainty, for example, robot fleets with variations between agents. Our method leverages shared trajectory data to increase the robustness of the designed controller and thus facilitate transfer to new variations without the need for prior parameter and uncertainty estimations. In contrast to existing work on experience transfer for performance, our approach focuses on robustness and uses data collected from multiple realizations to guarantee generalization to unseen ones. Our method is based on scenario optimization combined with recent formulations for direct data-driven control. We derive lower bounds on the amount of data required to achieve quadratic stability for probabilistic systems with aleatoric uncertainty and demonstrate the benefits of our data-driven method through a numerical example. We find that the learned controllers generalize well to high variations in the dynamics even when based on only a few short open-loop trajectories. Robust experience transfer enables the design of safe and robust controllers that work out of the box without any additional learning during deployment.
This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.