The growing demand for electric vehicles requires the development of automated car charging methods. At the moment, the process of charging an electric car is completely manual, and that requires physical effort to accomplish the task, which is not suitable for people with disabilities. Typically, the effort in the research is focused on detecting the position and orientation of the socket, which resulted in a relatively high accuracy, $\pm 5 \: mm $ and $\pm 10^o$. However, this accuracy is not enough to complete the charging process. In this work, we focus on designing a novel methodology for robust robotic plug-in and plug-out based on human haptics, to overcome the error in the position and orientation of the socket. Participants were invited to perform the charging task, and their cognitive capabilities were recognized by measuring the applied forces along with the movement of the charger. Three controllers were designed based on impedance control to mimic the human patterns of charging an electric car. The recorded data from humans were used to calibrate the parameters of the impedance controllers: inertia $M_d$, damping $D_d$, and stiffness $K_d$. A robotic validation was performed, where the designed controllers were applied to the robot UR10. Using the proposed controllers and the human kinesthetic data, it was possible to successfully automate the operation of charging an electric car.
The autonomous driving community has shown significant interest in 3D occupancy prediction, driven by its exceptional geometric perception and general object recognition capabilities. To achieve this, current works try to construct a Tri-Perspective View (TPV) or Occupancy (OCC) representation extending from the Bird-Eye-View perception. However, compressed views like TPV representation lose 3D geometry information while raw and sparse OCC representation requires heavy but reducant computational costs. To address the above limitations, we propose Compact Occupancy TRansformer (COTR), with a geometry-aware occupancy encoder and a semantic-aware group decoder to reconstruct a compact 3D OCC representation. The occupancy encoder first generates a compact geometrical OCC feature through efficient explicit-implicit view transformation. Then, the occupancy decoder further enhances the semantic discriminability of the compact OCC representation by a coarse-to-fine semantic grouping strategy. Empirical experiments show that there are evident performance gains across multiple baselines, e.g., COTR outperforms baselines with a relative improvement of 8%-15%, demonstrating the superiority of our method.
Many autonomous systems face safety challenges, requiring robust closed-loop control to handle physical limitations and safety constraints. Real-world systems, like autonomous ships, encounter nonlinear dynamics and environmental disturbances. Reinforcement learning is increasingly used to adapt to complex scenarios, but standard frameworks ensuring safety and stability are lacking. Predictive Safety Filters (PSF) offer a promising solution, ensuring constraint satisfaction in learning-based control without explicit constraint handling. This modular approach allows using arbitrary control policies, with the safety filter optimizing proposed actions to meet physical and safety constraints. We apply this approach to marine navigation, combining RL with PSF on a simulated Cybership II model. The RL agent is trained on path following and collision avpodance, while the PSF monitors and modifies control actions for safety. Results demonstrate the PSF's effectiveness in maintaining safety without hindering the RL agent's learning rate and performance, evaluated against a standard RL agent without PSF.
We present RobotGPT, an innovative decision framework for robotic manipulation that prioritizes stability and safety. The execution code generated by ChatGPT cannot guarantee the stability and safety of the system. ChatGPT may provide different answers for the same task, leading to unpredictability. This instability prevents the direct integration of ChatGPT into the robot manipulation loop. Although setting the temperature to 0 can generate more consistent outputs, it may cause ChatGPT to lose diversity and creativity. Our objective is to leverage ChatGPT's problem-solving capabilities in robot manipulation and train a reliable agent. The framework includes an effective prompt structure and a robust learning model. Additionally, we introduce a metric for measuring task difficulty to evaluate ChatGPT's performance in robot manipulation. Furthermore, we evaluate RobotGPT in both simulation and real-world environments. Compared to directly using ChatGPT to generate code, our framework significantly improves task success rates, with an average increase from 38.5% to 91.5%. Therefore, training a RobotGPT by utilizing ChatGPT as an expert is a more stable approach compared to directly using ChatGPT as a task planner.
Communication delays can be catastrophic for multiagent systems. However, most existing state-of-the-art multiagent trajectory planners assume perfect communication and therefore lack a strategy to rectify this issue in real-world environments. To address this challenge, we propose Robust MADER (RMADER), a decentralized, asynchronous multiagent trajectory planner robust to communication delay. RMADER ensures safety by introducing (1) a Delay Check step, (2) a two-step trajectory publication scheme, and (3) a novel trajectory-storing-and-checking approach. Our primary contributions include: proving recursive feasibility for collision-free trajectory generation in asynchronous decentralized trajectory-sharing, simulation benchmark studies, and hardware experiments with different network topologies and dynamic obstacles. We show that RMADER outperforms existing approaches by achieving a 100% success rate of collision-free trajectory generation, whereas the next best asynchronous decentralized method only achieves 83% success.
The current fabrication and assembly of fluidic circuits for soft robots relies heavily on manual processes; as the complexity of fluidic circuits increases, manual assembly becomes increasingly arduous, error-prone, and timeconsuming. We introduce a software tool that generates printable fluidic networks automatically. We provide a library of fluidic logic elements that are easily 3D printed from thermoplastic polyurethanes using Fused Deposition Modeling only. Our software tool and component library allow the development of arbitrary soft digital circuits. We demonstrate a variable frequency ring oscillator and a full adder. The simplicity of our approach using FDM printers only, democratizes fluidic circuit implementation beyond specialized laboratories. Our software is available on GitHub (//github.com/roboticmaterialsgroup/FluidLogic).
Testing autonomous vehicles (AVs) under various environmental scenarios that lead the vehicles to unsafe situations is known to be challenging. Given the infinite possible environmental scenarios, it is essential to find critical scenarios efficiently. To this end, we propose a novel testing method, named EpiTESTER, by taking inspiration from epigenetics, which enables species to adapt to sudden environmental changes. In particular, EpiTESTER adopts gene silencing as its epigenetic mechanism, which regulates gene expression to prevent the expression of a certain gene, and the probability of gene expression is dynamically computed as the environment changes. Given different data modalities (e.g., images, lidar point clouds) in the context of AV, EpiTESTER benefits from a multi-model fusion transformer to extract high-level feature representations from environmental factors and then calculates probabilities based on these features with the attention mechanism. To assess the cost-effectiveness of EpiTESTER, we compare it with a classical genetic algorithm (GA) (i.e., without any epigenetic mechanism implemented) and EpiTESTER with equal probability for each gene. We evaluate EpiTESTER with four initial environments from CARLA, an open-source simulator for autonomous driving research, and an end-to-end AV controller, Interfuser. Our results show that EpiTESTER achieved a promising performance in identifying critical scenarios compared to the baselines, showing that applying epigenetic mechanisms is a good option for solving practical problems.
With the advent of power-meters allowing cyclists to precisely track their power outputs throughout the duration of a race, devising optimal power output strategies for races has become increasingly important in competitive cycling. To do so, the track, weather, and individual cyclist's abilities must all be considered. We propose differential equation models of fatigue and kinematics to simulate the performance of such strategies, and an innovative optimization algorithm to find the optimal strategy. Our model for fatigue translates a cyclist's power curve (obtained by fitting the Omni-Power Duration Model to power curve data) into a differential equation to capture which power output strategies are feasible. Our kinematics model calculates the forces on the rider, and with power output models the cyclist's velocity and position via a system of differential equations. Using track data, including the slope of the track and velocity of the wind, the model accurately computes race times given a power output strategy on the exact track being raced. To make power strategy optimization computationally tractable, we split the track into segments based on changes in slope and discretize the power output levels. As the space of possible strategies is large, we vectorize the differential equation model for efficient numerical integration of many simulations at once and develop a parallelized Tree Exploration with Monte-Carlo Evaluation algorithm. The algorithm is efficient, running in $O(ab\sqrt{n})$ time and $O(n)$ space where $n$ is the number of simulations done for each choice, $a$ is the number of segments, and $b$ is the number of discrete power output levels. We present results of this optimization for several different tracks and athletes. As an example, the model's time for Filippo Ganna in Tokyo 2020 differs from his real time by just 18%, supporting our model's efficacy.
Adept traffic models are critical to both planning and closed-loop simulation for autonomous vehicles (AV), and key design objectives include accuracy, diverse multimodal behaviors, interpretability, and downstream compatibility. Recently, with the advent of large language models (LLMs), an additional desirable feature for traffic models is LLM compatibility. We present Categorical Traffic Transformer (CTT), a traffic model that outputs both continuous trajectory predictions and tokenized categorical predictions (lane modes, homotopies, etc.). The most outstanding feature of CTT is its fully interpretable latent space, which enables direct supervision of the latent variable from the ground truth during training and avoids mode collapse completely. As a result, CTT can generate diverse behaviors conditioned on different latent modes with semantic meanings while beating SOTA on prediction accuracy. In addition, CTT's ability to input and output tokens enables integration with LLMs for common-sense reasoning and zero-shot generalization.
As generative large model capabilities advance, safety concerns become more pronounced in their outputs. To ensure the sustainable growth of the AI ecosystem, it's imperative to undertake a holistic evaluation and refinement of associated safety risks. This survey presents a framework for safety research pertaining to large models, delineating the landscape of safety risks as well as safety evaluation and improvement methods. We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models, encompassing preference-based testing, adversarial attack approaches, issues detection, and other advanced evaluation methods. Additionally, we explore the strategies for enhancing large model safety from training to deployment, highlighting cutting-edge safety approaches for each stage in building large models. Finally, we discuss the core challenges in advancing towards more responsible AI, including the interpretability of safety mechanisms, ongoing safety issues, and robustness against malicious attacks. Through this survey, we aim to provide clear technical guidance for safety researchers and encourage further study on the safety of large models.
We present Emu, a system that semantically enhances multilingual sentence embeddings. Our framework fine-tunes pre-trained multilingual sentence embeddings using two main components: a semantic classifier and a language discriminator. The semantic classifier improves the semantic similarity of related sentences, whereas the language discriminator enhances the multilinguality of the embeddings via multilingual adversarial training. Our experimental results based on several language pairs show that our specialized embeddings outperform the state-of-the-art multilingual sentence embedding model on the task of cross-lingual intent classification using only monolingual labeled data.