While we can see robots in more areas of our lives, they still make errors. One common cause of failure stems from the robot perception module when detecting objects. Allowing users to correct such errors can help improve the interaction and prevent the same errors in the future. Consequently, we investigate the effectiveness of a virtual reality (VR) framework for correcting perception errors of a Franka Panda robot. We conducted a user study with 56 participants who interacted with the robot using both VR and screen interfaces. Participants learned to collaborate with the robot faster in the VR interface compared to the screen interface. Additionally, participants found the VR interface more immersive, enjoyable, and expressed a preference for using it again. These findings suggest that VR interfaces may offer advantages over screen interfaces for human-robot interaction in erroneous environments.
Parallel robots (PRs) offer the potential for safe human-robot collaboration because of their low moving masses. Due to the in-parallel kinematic chains, the risk of contact in the form of collisions and clamping at a chain increases. Ensuring safety is investigated in this work through various contact reactions on a real planar PR. External forces are estimated based on proprioceptive information and a dynamics model, which allows contact detection. Retraction along the direction of the estimated line of action provides an instantaneous response to limit the occurring contact forces within the experiment to 70N at a maximum velocity 0.4m/s. A reduction in the stiffness of a Cartesian impedance control is investigated as a further strategy. For clamping, a feedforward neural network (FNN) is trained and tested in different joint angle configurations to classify whether a collision or clamping occurs with an accuracy of 80%. A second FNN classifies the clamping kinematic chain to enable a subsequent kinematic projection of the clamping joint angle onto the rotational platform coordinates. In this way, a structure opening is performed in addition to the softer retraction movement. The reaction strategies are compared in real-world experiments at different velocities and controller stiffnesses to demonstrate their effectiveness. The results show that in all collision and clamping experiments the PR terminates the contact in less than 130ms.
Parallel robots (PRs) allow for higher speeds in human-robot collaboration due to their lower moving masses but are more prone to unintended contact. For a safe reaction, knowledge of the location and force of a collision is useful. A novel algorithm for collision isolation and identification with proprioceptive information for a real PR is the scope of this work. To classify the collided body, the effects of contact forces at the links and platform of the PR are analyzed using a kinetostatic projection. This insight enables the derivation of features from the line of action of the estimated external force. The significance of these features is confirmed in experiments for various load cases. A feedforward neural network (FNN) classifies the collided body based on these physically modeled features. Generalization with the FNN to 300k load cases on the whole robot structure in other joint angle configurations is successfully performed with a collision-body classification accuracy of 84% in the experiments. Platform collisions are isolated and identified with an explicit solution, while a particle filter estimates the location and force of a contact on a kinematic chain. Updating the particle filter with estimated external joint torques leads to an isolation error of less than 3cm and an identification error of 4N in a real-world experiment.
Parallel robots provide the potential to be leveraged for human-robot collaboration (HRC) due to low collision energies even at high speeds resulting from their reduced moving masses. However, the risk of unintended contact with the leg chains increases compared to the structure of serial robots. As a first step towards HRC, contact cases on the whole parallel robot structure are investigated and a disturbance observer based on generalized momenta and measurements of motor current is applied. In addition, a Kalman filter and a second-order sliding-mode observer based on generalized momenta are compared in terms of error and detection time. Gearless direct drives with low friction improve external force estimation and enable low impedance. The experimental validation is performed with two force-torque sensors and a kinetostatic model. This allows a new identification method of the motor torque constant of an assembled parallel robot to estimate external forces from the motor current and via a dynamics model. A Cartesian impedance control scheme for compliant robot-environmental dynamics with stiffness from 0.1-2N/mm and the force observation for low forces over the entire structure are validated. The observers are used for collisions and clamping at velocities of 0.4-0.9m/s for detection within 9-58ms and a reaction in the form of a zero-g mode.
Role-playing chatbots built on large language models have drawn interest, but better techniques are needed to enable mimicking specific fictional characters. We propose an algorithm that controls language models via an improved prompt and memories of the character extracted from scripts. We construct ChatHaruhi, a dataset covering 32 Chinese / English TV / anime characters with over 54k simulated dialogues. Both automatic and human evaluations show our approach improves role-playing ability over baselines. Code and data are available at //github.com/LC1332/Chat-Haruhi-Suzumiya .
The reconfigurable intelligent surface (RIS) is a promising technology that enables wireless communication systems to achieve improved performance by intelligently manipulating wireless channels. In this paper, we consider the sum-rate maximization problem in a downlink multi-user multi-input-single-output (MISO) channel via space-division multiple access (SDMA). Two major challenges of this problem are the high dimensionality due to the large number of RIS elements and the difficulty to obtain the full channel state information (CSI), which is assumed known in many algorithms proposed in the literature. Instead, we propose a hybrid machine learning approach using the weighted minimum mean squared error (WMMSE) precoder at the base station (BS) and a dedicated neural network (NN) architecture, RISnet, for RIS configuration. The RISnet has a good scalability to optimize 1296 RIS elements and requires partial CSI of only 16 RIS elements as input. We show it achieves a high performance with low requirement for channel estimation for geometric channel models obtained with ray-tracing simulation. The unsupervised learning lets the RISnet find an optimized RIS configuration by itself. Numerical results show that a trained model configures the RIS with low computational effort, considerably outperforms the baselines, and can work with discrete phase shifts.
Safety is the primary priority of autonomous driving. Nevertheless, no published dataset currently supports the direct and explainable safety evaluation for autonomous driving. In this work, we propose DeepAccident, a large-scale dataset generated via a realistic simulator containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset with 40k annotated samples. In addition, we propose a new task, end-to-end motion and accident prediction, which can be used to directly evaluate the accident prediction ability for different autonomous driving algorithms. Furthermore, for each scenario, we set four vehicles along with one infrastructure to record data, thus providing diverse viewpoints for accident scenarios and enabling V2X (vehicle-to-everything) research on perception and prediction tasks. Finally, we present a baseline V2X model named V2XFormer that demonstrates superior performance for motion and accident prediction and 3D object detection compared to the single-vehicle model.
Our goal is for robots to follow natural language instructions like "put the towel next to the microwave." But getting large amounts of labeled data, i.e. data that contains demonstrations of tasks labeled with the language instruction, is prohibitive. In contrast, obtaining policies that respond to image goals is much easier, because any autonomous trial or demonstration can be labeled in hindsight with its final state as the goal. In this work, we contribute a method that taps into joint image- and goal- conditioned policies with language using only a small amount of language data. Prior work has made progress on this using vision-language models or by jointly training language-goal-conditioned policies, but so far neither method has scaled effectively to real-world robot tasks without significant human annotation. Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to. We then train a policy on this embedding: the policy benefits from all the unlabeled data, but the aligned embedding provides an interface for language to steer the policy. We show instruction following across a variety of manipulation tasks in different scenes, with generalization to language instructions outside of the labeled data. Videos and code for our approach can be found on our website: //rail-berkeley.github.io/grif/ .
Grammatical error correction aims to correct ungrammatical sentences automatically. Recently, some work has demonstrated the excellent capabilities of closed-source Large Language Models (LLMs, e.g., ChatGPT) in grammatical error correction. However, the potential of open-source LLMs remains unexplored. In this paper, we introduced GrammarGPT, an open-source LLM, to preliminary explore its potential for native Chinese grammatical error correction. The core recipe of GrammarGPT is to leverage the hybrid dataset of ChatGPT-generated and human-annotated. For grammatical errors with clues, we proposed a heuristic method to guide ChatGPT to generate ungrammatical sentences by providing those clues. For grammatical errors without clues, we collected ungrammatical sentences from publicly available websites and manually corrected them. In addition, we employed an error-invariant augmentation method to enhance the ability of the model to correct native Chinese grammatical errors. We ultimately constructed about 1k parallel data and utilized these data to fine-tune open-source LLMs (e.g., Phoenix, released by The Chinese University of Hong Kong, Shenzhen) with instruction tuning. The experimental results show that GrammarGPT outperforms the existing SOTA system significantly. Although model parameters are 20x larger than the SOTA baseline, the required amount of data for instruction tuning is 1200x smaller, illustrating the potential of open-source LLMs on native CGEC. Our GrammarGPT ranks $3^{rd}$ on NLPCC2023 SharedTask1, demonstrating our approach's effectiveness. The code and data are available at \url{//github.com/FreedomIntelligence/GrammarGPT}.
Robotic grasping is a fundamental skill required for object manipulation in robotics. Multi-fingered robotic hands, which mimic the structure of the human hand, can potentially perform complex object manipulation. Nevertheless, current techniques for multi-fingered robotic grasping frequently predict only a single grasp for each inference time, limiting computational efficiency and their versatility, i.e. unimodal grasp distribution. This paper proposes a differentiable multi-fingered grasp generation network (DMFC-GraspNet) with three main contributions to address this challenge. Firstly, a novel neural grasp planner is proposed, which predicts a new grasp representation to enable versatile and dense grasp predictions. Secondly, a scene creation and label mapping method is developed for dense labeling of multi-fingered robotic hands, which allows a dense association of ground truth grasps. Thirdly, we propose to train DMFC-GraspNet end-to-end using using a forward-backward automatic differentiation approach with both a supervised loss and a differentiable collision loss and a generalized Q 1 grasp metric loss. The proposed approach is evaluated using the Shadow Dexterous Hand on Mujoco simulation and ablated by different choices of loss functions. The results demonstrate the effectiveness of the proposed approach in predicting versatile and dense grasps, and in advancing the field of multi-fingered robotic grasping.
Hyperspectral sensors have enjoyed widespread use in the realm of remote sensing; however, they must be adapted to a format in which they can be operated onboard mobile robots. In this work, we introduce a first-of-its-kind system architecture with snapshot hyperspectral cameras and point spectrometers to efficiently generate composite datacubes from a robotic base. Our system collects and registers datacubes spanning the visible to shortwave infrared (660-1700 nm) spectrum while simultaneously capturing the ambient solar spectrum reflected off a white reference tile. We collect and disseminate a large dataset of more than 500 labeled datacubes from on-road and off-road terrain compliant with the ATLAS ontology to further the integration and demonstration of hyperspectral imaging (HSI) as beneficial in terrain class separability. Our analysis of this data demonstrates that HSI is a significant opportunity to increase understanding of scene composition from a robot-centric context. All code and data are open source online: //river-lab.github.io/hyper_drive_data