This paper introduces Borinot, an open-source flying robotic platform designed to perform hybrid agile locomotion and manipulation. This platform features a compact and powerful hexarotor that can be outfitted with torque-actuated extremities of diverse architecture, allowing for whole-body dynamic control. As a result, Borinot can perform agile tasks such as aggressive or acrobatic maneuvers with the participation of the whole-body dynamics. The extremities attached to Borinot can be utilized in various ways; during contact, they can be used as legs to create contact-based locomotion, or as arms to manipulate objects. In free flight, they can be used as tails to contribute to dynamics, mimicking the movements of many animals. This allows for any hybridization of these dynamic modes, like the jump-flight of chicken and locusts, making Borinot an ideal open-source platform for research on hybrid aerial-contact agile motion. To demonstrate the key capabilities of Borinot, we have fitted a planar 2DoF arm and implemented whole-body torque-level model-predictive-control. The result is a capable and adaptable platform that, we believe, opens up new avenues of research in the field of agile robotics.
This paper explores the potential of 5G new radio (NR) Time-of-Arrival (TOA) data for indoor drone localization under different scenarios and conditions when fused with inertial measurement unit (IMU) data. Our approach involves performing graph-based optimization to estimate the drone's position and orientation from the multiple sensor measurements. Due to the lack of real-world data, we use Matlab 5G toolbox and QuaDRiGa (quasi-deterministic radio channel generator) channel simulator to generate TOA measurements for the EuRoC MAV indoor dataset that provides IMU readings and ground truths 6DoF poses of a flying drone. Hence, we create twelve sequences combining three predefined indoor scenarios setups of QuaDRiGa with 2 to 5 base station antennas. Therefore, experimental results demonstrate that, for a sufficient number of base stations and a high bandwidth 5G configuration, the pose graph optimization approach achieves accurate drone localization, with an average error of less than 15 cm on the overall trajectory. Furthermore, the adopted graph-based optimization algorithm is fast and can be easily implemented for onboard real-time pose tracking on a micro aerial vehicle (MAV).
This paper proposes a method for designing human-robot collaboration tasks and generating corresponding trajectories. The method uses high-level specifications, expressed as a Signal Temporal Logic (STL) formula, to automatically synthesize task assignments and trajectories. To illustrate the approach, we focus on a specific task: a multi-rotor aerial vehicle performing object handovers in a power line setting. The motion planner considers limitations, such as payload capacity and recharging constraints, while ensuring that the trajectories are feasible. Additionally, the method enables users to specify robot behaviors that take into account human comfort (e.g., ergonomics, preferences) while using high-level goals and constraints. The approach is validated through numerical analyzes in MATLAB and realistic Gazebo simulations using a mock-up scenario.
In federated learning, each participant trains its local model with its own data and a global model is formed at a trusted server by aggregating model updates coming from these participants. Since the server has no effect and visibility on the training procedure of the participants to ensure privacy, the global model becomes vulnerable to attacks such as data poisoning and model poisoning. Although many defense algorithms have recently been proposed to address these attacks, they often make strong assumptions that do not agree with the nature of federated learning, such as assuming Non-IID datasets. Moreover, they mostly lack comprehensive experimental analyses. In this work, we propose a defense algorithm called ARFED that does not make any assumptions about data distribution, update similarity of participants, or the ratio of the malicious participants. ARFED mainly considers the outlier status of participant updates for each layer of the model architecture based on the distance to the global model. Hence, the participants that do not have any outlier layer are involved in model aggregation. We have performed extensive experiments on diverse scenarios and shown that the proposed approach provides a robust defense against different attacks. To test the defense capability of the ARFED in different conditions, we considered label flipping, Byzantine, and partial knowledge attacks for both IID and Non-IID settings in our experimental evaluations. Moreover, we proposed a new attack, called organized partial knowledge attack, where malicious participants use their training statistics collaboratively to define a common poisoned model. We have shown that organized partial knowledge attacks are more effective than independent attacks.
Humans perform everyday tasks using a combination of locomotion and manipulation skills. Building a system that can handle both skills is essential to creating virtual humans. We present a physically-simulated human capable of solving box rearrangement tasks, which requires a combination of both skills. We propose a hierarchical control architecture, where each level solves the task at a different level of abstraction, and the result is a physics-based simulated virtual human capable of rearranging boxes in a cluttered environment. The control architecture integrates a planner, diffusion models, and physics-based motion imitation of sparse motion clips using deep reinforcement learning. Boxes can vary in size, weight, shape, and placement height. Code and trained control policies are provided.
Motion forecasting plays a critical role in enabling robots to anticipate future trajectories of surrounding agents and plan accordingly. However, existing forecasting methods often rely on curated datasets that are not faithful to what real-world perception pipelines can provide. In reality, upstream modules that are responsible for detecting and tracking agents, and those that gather road information to build the map, can introduce various errors, including misdetections, tracking errors, and difficulties in being accurate for distant agents and road elements. This paper aims to uncover the challenges of bringing motion forecasting models to this more realistic setting where inputs are provided by perception modules. In particular, we quantify the impacts of the domain gap through extensive evaluation. Furthermore, we design synthetic perturbations to better characterize their consequences, thus providing insights into areas that require improvement in upstream perception modules and guidance toward the development of more robust forecasting methods.
The important phenomenon of "stickiness" of chaotic orbits in low dimensional dynamical systems has been investigated for several decades, in view of its applications to various areas of physics, such as classical and statistical mechanics, celestial mechanics and accelerator dynamics. Most of the work to date has focused on two-degree of freedom Hamiltonian models often represented by two-dimensional (2D) area preserving maps. In this paper, we extend earlier results using a 4-dimensional extension of the 2D McMillan map, and show that a symplectic model of two coupled McMillan maps also exhibits stickiness phenomena in limited regions of phase space. To this end, we employ probability distributions in the sense of the Central Limit Theorem to demonstrate that, as in the 2D case, sticky regions near the origin are also characterized by "weak" chaos and Tsallis entropy, in sharp contrast to the "strong" chaos that extends over much wider domains and is described by Boltzmann Gibbs statistics. Remarkably, similar stickiness phenomena have been observed in higher dimensional Hamiltonian systems around unstable simple periodic orbits at various values of the total energy of the system.
Marine waves significantly disturb the unmanned surface vehicle (USV) motion. An unmanned aerial vehicle (UAV) can hardly land on a USV that undergoes irregular motion. An oversized landing platform is usually necessary to guarantee the landing safety, which limits the number of UAVs that can be carried. We propose a landing system assisted by tether and robot manipulation. The system can land multiple UAVs without increasing the USV's size. An MPC controller stabilizes the end-effector and tracks the UAVs, and an adaptive estimator addresses the disturbance caused by the base motion. The working strategy of the system is designed to plan the motion of each device. We have validated the manipulator controller through simulations and well-controlled indoor experiments. During the field tests, the proposed system caught and placed the UAVs when the disturbed USV roll range was approximately 12 degrees.
This paper proposes the transition-net, a robust transition strategy that expands the versatility of robot locomotion in the real-world setting. To this end, we start by distributing the complexity of different gaits into dedicated locomotion policies applicable to real-world robots. Next, we expand the versatility of the robot by unifying the policies with robust transitions into a single coherent meta-controller by examining the latent state representations. Our approach enables the robot to iteratively expand its skill repertoire and robustly transition between any policy pair in a library. In our framework, adding new skills does not introduce any process that alters the previously learned skills. Moreover, training of a locomotion policy takes less than an hour with a single consumer GPU. Our approach is effective in the real-world and achieves a 19% higher average success rate for the most challenging transition pairs in our experiments compared to existing approaches.
The key challenge of image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data, whilst specific to prevent false alarms on authentic images. Current research emphasizes the sensitivity, with the specificity overlooked. In this paper we address both aspects by multi-view feature learning and multi-scale supervision. By exploiting noise distribution and boundary artifact surrounding tampered regions, the former aims to learn semantic-agnostic and thus more generalizable features. The latter allows us to learn from authentic images which are nontrivial to be taken into account by current semantic segmentation network based methods. Our thoughts are realized by a new network which we term MVSS-Net. Extensive experiments on five benchmark sets justify the viability of MVSS-Net for both pixel-level and image-level manipulation detection.
Most previous event extraction studies have relied heavily on features derived from annotated event mentions, thus cannot be applied to new event types without annotation effort. In this work, we take a fresh look at event extraction and model it as a grounding problem. We design a transferable neural architecture, mapping event mentions and types jointly into a shared semantic space using structural and compositional neural networks, where the type of each event mention can be determined by the closest of all candidate types . By leveraging (1)~available manual annotations for a small set of existing event types and (2)~existing event ontologies, our framework applies to new event types without requiring additional annotation. Experiments on both existing event types (e.g., ACE, ERE) and new event types (e.g., FrameNet) demonstrate the effectiveness of our approach. \textit{Without any manual annotations} for 23 new event types, our zero-shot framework achieved performance comparable to a state-of-the-art supervised model which is trained from the annotations of 500 event mentions.