We present the RBO Hand 3, a highly capable and versatile anthropomorphic soft hand based on pneumatic actuation. The RBO Hand 3 is designed to enable dexterous manipulation, to facilitate transfer of insights about human dexterity, and to serve as a robust research platform for extensive real-world experiments. It achieves these design goals by combining many degrees of actuation with intrinsic compliance, replicating relevant functioning of the human hand, and by combining robust components in a modular design. The RBO Hand 3 possesses 16 independent degrees of actuation, implemented in a dexterous opposable thumb, two-chambered fingers, an actuated palm, and the ability to spread the fingers. In this work, we derive the design objectives that are based on experimentation with the hand's predecessors, observations about human grasping, and insights about principles of dexterity. We explain in detail how the design features of the RBO Hand 3 achieve these goals and evaluate the hand by demonstrating its ability to achieve the highest possible score in the Kapandji test for thumb opposition, to realize all 33 grasp types of the comprehensive GRASP taxonomy, to replicate common human grasping strategies, and to perform dexterous in-hand manipulation.
Ground Penetrating Radar (GPR) is a very useful non-destructive evaluation (NDE) device for locating and mapping underground assets prior to digging and trenching efforts in construction. This paper presents a novel robotic system to automate the GPR data collection process, localize the underground utilities, interpret and reconstruct the underground objects for better visualization allowing regular non-professional users to understand the survey results. This system is composed of three modules: 1) an Omni-directional robotic data collection platform, that carries an RGB-D camera with an Inertial Measurement Unit (IMU) and a GPR antenna to perform automatic GPR data collection, and tag each GPR measurement with visual positioning information at every sampling step; 2) a learning-based migration module to interpret the raw GPR B-scan image into a 2D cross-section model of objects; 3) a 3D reconstruction module, i.e., GPRNet, to generate underground utility model represented as fine 3D point cloud. Comparative studies are performed on synthetic data and field GPR raw data with various incompleteness and noise. Experimental results demonstrate that our proposed method achieves a $30.0\%$ higher GPR imaging accuracy in mean Intersection Over Union (IoU) than the conventional back projection (BP) migration approach and $6.9\%$-$7.2\%$ less loss in Chamfer Distance (CD) than baseline methods regarding point cloud model reconstruction. The GPR-based robotic inspection provides an effective tool for civil engineers to detect and survey underground utilities before construction.
We present a framework for gesture customization requiring minimal examples from users, all without degrading the performance of existing gesture sets. To achieve this, we first deployed a large-scale study (N=500+) to collect data and train an accelerometer-gyroscope recognition model with a cross-user accuracy of 95.7% and a false-positive rate of 0.6 per hour when tested on everyday non-gesture data. Next, we design a few-shot learning framework which derives a lightweight model from our pre-trained model, enabling knowledge transfer without performance degradation. We validate our approach through a user study (N=20) examining on-device customization from 12 new gestures, resulting in an average accuracy of 55.3%, 83.1%, and 87.2% on using one, three, or five shots when adding a new gesture, while maintaining the same recognition accuracy and false-positive rate from the pre-existing gesture set. We further evaluate the usability of our real-time implementation with a user experience study (N=20). Our results highlight the effectiveness, learnability, and usability of our customization framework. Our approach paves the way for a future where users are no longer bound to pre-existing gestures, freeing them to creatively introduce new gestures tailored to their preferences and abilities.
We study how visual representations pre-trained on diverse human video data can enable data-efficient learning of downstream robotic manipulation tasks. Concretely, we pre-train a visual representation using the Ego4D human video dataset using a combination of time-contrastive learning, video-language alignment, and an L1 penalty to encourage sparse and compact representations. The resulting representation, R3M, can be used as a frozen perception module for downstream policy learning. Across a suite of 12 simulated robot manipulation tasks, we find that R3M improves task success by over 20% compared to training from scratch and by over 10% compared to state-of-the-art visual representations like CLIP and MoCo. Furthermore, R3M enables a Franka Emika Panda arm to learn a range of manipulation tasks in a real, cluttered apartment given just 20 demonstrations. Code and pre-trained models are available at //tinyurl.com/robotr3m.
We design and implement LEGOStore, an erasure coding (EC) based linearizable data store over geo-distributed public cloud data centers (DCs). For such a data store, the confluence of the following factors opens up opportunities for EC to be latency-competitive with replication: (a) the necessity of communicating with remote DCs to tolerate entire DC failures and implement linearizability; and (b) the emergence of DCs near most large population centers. LEGOStore employs an optimization framework that, for a given object, carefully chooses among replication and EC, as well as among various DC placements to minimize overall costs. To handle workload dynamism, LEGOStore employs a novel agile reconfiguration protocol. Our evaluation using a LEGOStore prototype spanning 9 Google Cloud Platform DCs demonstrates the efficacy of our ideas. We observe cost savings ranging from moderate (5-20\%) to significant (60\%) over baselines representing the state of the art while meeting tail latency SLOs. Our reconfiguration protocol is able to transition key placements in 3 to 4 inter-DC RTTs ($<$ 1s in our experiments), allowing for agile adaptation to dynamic conditions.
Despite the remarkable performance that modern deep neural networks have achieved on independent and identically distributed (I.I.D.) data, they can crash under distribution shifts. Most current evaluation methods for domain generalization (DG) adopt the leave-one-out strategy as a compromise on the limited number of domains. We propose a large-scale benchmark with extensive labeled domains named NICO++{\ddag} along with more rational evaluation methods for comprehensively evaluating DG algorithms. To evaluate DG datasets, we propose two metrics to quantify covariate shift and concept shift, respectively. Two novel generalization bounds from the perspective of data construction are proposed to prove that limited concept shift and significant covariate shift favor the evaluation capability for generalization. Through extensive experiments, NICO++ shows its superior evaluation capability compared with current DG datasets and its contribution in alleviating unfairness caused by the leak of oracle knowledge in model selection.
As technology advances, the need for safe, efficient, and collaborative human-robot-teams has become increasingly important. One of the most fundamental collaborative tasks in any setting is the object handover. Human-to-robot handovers can take either of two approaches: (1) direct hand-to-hand or (2) indirect hand-to-placement-to-pick-up. The latter approach ensures minimal contact between the human and robot but can also result in increased idle time due to having to wait for the object to first be placed down on a surface. To minimize such idle time, the robot must preemptively predict the human intent of where the object will be placed. Furthermore, for the robot to preemptively act in any sort of productive manner, predictions and motion planning must occur in real-time. We introduce a novel prediction-planning pipeline that allows the robot to preemptively move towards the human agent's intended placement location using gaze and gestures as model inputs. In this paper, we investigate the performance and drawbacks of our early intent predictor-planner as well as the practical benefits of using such a pipeline through a human-robot case study.
The design and fabrication of soft robot hands is still a time-consuming and difficult process. Advances in rapid prototyping have accelerated the fabrication process significantly while introducing new complexities into the design process. In this work, we present an approach that utilizes novel low-cost fabrication techniques in conjunction with design tools helping soft hand designers to systematically take advantage of multi-material 3D printing to create dexterous soft robotic hands. While very low cost and lightweight, we show that generated designs are highly durable, surprisingly strong, and capable of dexterous grasping.
Imitation learning is a promising approach to help robots acquire dexterous manipulation capabilities without the need for a carefully-designed reward or a significant computational effort. However, existing imitation learning approaches require sophisticated data collection infrastructure and struggle to generalize beyond the training distribution. One way to address this limitation is to gather additional data that better represents the full operating conditions. In this work, we investigate characteristics of such additional demonstrations and their impact on performance. Specifically, we study the effects of corrective and randomly-sampled additional demonstrations on learning a policy that guides a five-fingered robot hand through a pick-and-place task. Our results suggest that corrective demonstrations considerably outperform randomly-sampled demonstrations, when the proportion of additional demonstrations sampled from the full task distribution is larger than the number of original demonstrations sampled from a restrictive training distribution. Conversely, when the number of original demonstrations are higher than that of additional demonstrations, we find no significant differences between corrective and randomly-sampled additional demonstrations. These results provide insights into the inherent trade-off between the effort required to collect corrective demonstrations and their relative benefits over randomly-sampled demonstrations. Additionally, we show that inexpensive vision-based sensors, such as LeapMotion, can be used to dramatically reduce the cost of providing demonstrations for dexterous manipulation tasks. Our code is available at //github.com/GT-STAR-Lab/corrective-demos-dexterous-manipulation.
Soft robots are promising for manipulation tasks thanks to their compliance, safety, and high degree of freedom. However, the commonly used bidirectional continuum segment design means soft robotic manipulators only function in a limited hemispherical workspace. This work increases a soft robotic arm's workspace by designing, fabricating, and controlling an additional soft prismatic actuator at the base of the soft arm. This actuator consists of pneumatic artificial muscles and a piston, making the actuator back-driveable. We increase the task space volume by 116\%, and we are now able to perform manipulation tasks that were previously impossible for soft robots, such as picking and placing objects at different positions on a surface and grabbing an object out of a container. By combining a soft robotic arm with a prismatic joint, we greatly increase the usability of soft robots for object manipulation. This work promotes the use of integrated and modular soft robotic systems for practical manipulation applications in human-centered environments.
Multi-camera vehicle tracking is one of the most complicated tasks in Computer Vision as it involves distinct tasks including Vehicle Detection, Tracking, and Re-identification. Despite the challenges, multi-camera vehicle tracking has immense potential in transportation applications including speed, volume, origin-destination (O-D), and routing data generation. Several recent works have addressed the multi-camera tracking problem. However, most of the effort has gone towards improving accuracy on high-quality benchmark datasets while disregarding lower camera resolutions, compression artifacts and the overwhelming amount of computational power and time needed to carry out this task on its edge and thus making it prohibitive for large-scale and real-time deployment. Therefore, in this work we shed light on practical issues that should be addressed for the design of a multi-camera tracking system to provide actionable and timely insights. Moreover, we propose a real-time city-scale multi-camera vehicle tracking system that compares favorably to computationally intensive alternatives and handles real-world, low-resolution CCTV instead of idealized and curated video streams. To show its effectiveness, in addition to integration into the Regional Integrated Transportation Information System (RITIS), we participated in the 2021 NVIDIA AI City multi-camera tracking challenge and our method is ranked among the top five performers on the public leaderboard.