Enabling robots to work in close proximity to humans necessitates a control framework that does not only incorporate multi-sensory information for autonomous and coordinated interactions but also has perceptive task planning to ensure an adaptable and flexible collaborative behaviour. In this research, an intuitive stack-of-tasks (iSoT) formulation is proposed, that defines the robot's actions by considering the human-arm postures and the task progression. The framework is augmented with visuo-tactile information to effectively perceive the collaborative environment and intuitively switch between the planned sub-tasks. The visual feedback from depth cameras monitors and estimates the objects' poses and human-arm postures, while the tactile data provides the exploration skills to detect and maintain the desired contacts to avoid object slippage. To evaluate the performance, effectiveness and usability of the proposed framework, assembly and disassembly tasks, performed by the human-human and human-robot partners, are considered and analyzed using distinct evaluation metrics i.e, approach adaptation, grasp correction, task coordination latency, cumulative posture deviation, and task repeatability.
Runtime verification or runtime monitoring equips safety-critical cyber-physical systems to augment design assurance measures and ensure operational safety and security. Cyber-physical systems have interaction failures, attack surfaces, and attack vectors resulting in unanticipated hazards and loss scenarios. These interaction failures pose challenges to runtime verification regarding monitoring specifications and monitoring placements for in-time detection of hazards. We develop a well-formed workflow model that connects system theoretic process analysis, commonly referred to as STPA, hazard causation information to lower-level runtime monitoring to detect hazards at the operational phase. Specifically, our model follows the DepDevOps paradigm to provide evidence and insights to runtime monitoring on what to monitor, where to monitor, and the monitoring context. We demonstrate and evaluate the value of multilevel monitors by injecting hazards on an autonomous emergency braking system model.
Most automatic matting methods try to separate the salient foreground from the background. However, the insufficient quantity and subjective bias of the current existing matting datasets make it difficult to fully explore the semantic association between object-to-object and object-to-environment in a given image. In this paper, we propose a Situational Perception Guided Image Matting (SPG-IM) method that mitigates subjective bias of matting annotations and captures sufficient situational perception information for better global saliency distilled from the visual-to-textual task. SPG-IM can better associate inter-objects and object-to-environment saliency, and compensate the subjective nature of image matting and its expensive annotation. We also introduce a textual Semantic Transformation (TST) module that can effectively transform and integrate the semantic feature stream to guide the visual representations. In addition, an Adaptive Focal Transformation (AFT) Refinement Network is proposed to adaptively switch multi-scale receptive fields and focal points to enhance both global and local details. Extensive experiments demonstrate the effectiveness of situational perception guidance from the visual-to-textual tasks on image matting, and our model outperforms the state-of-the-art methods. We also analyze the significance of different components in our model. The code will be released soon.
Ground Penetrating Radar (GPR) is a very useful non-destructive evaluation (NDE) device for locating and mapping underground assets prior to digging and trenching efforts in construction. This paper presents a novel robotic system to automate the GPR data collection process, localize the underground utilities, interpret and reconstruct the underground objects for better visualization allowing regular non-professional users to understand the survey results. This system is composed of three modules: 1) an Omni-directional robotic data collection platform, that carries an RGB-D camera with an Inertial Measurement Unit (IMU) and a GPR antenna to perform automatic GPR data collection, and tag each GPR measurement with visual positioning information at every sampling step; 2) a learning-based migration module to interpret the raw GPR B-scan image into a 2D cross-section model of objects; 3) a 3D reconstruction module, i.e., GPRNet, to generate underground utility model represented as fine 3D point cloud. Comparative studies are performed on synthetic data and field GPR raw data with various incompleteness and noise. Experimental results demonstrate that our proposed method achieves a $30.0\%$ higher GPR imaging accuracy in mean Intersection Over Union (IoU) than the conventional back projection (BP) migration approach and $6.9\%$-$7.2\%$ less loss in Chamfer Distance (CD) than baseline methods regarding point cloud model reconstruction. The GPR-based robotic inspection provides an effective tool for civil engineers to detect and survey underground utilities before construction.
We present a method to simulate movement in interaction with computers, using Model Predictive Control (MPC). The method starts from understanding interaction from an Optimal Feedback Control (OFC) perspective. We assume that users aim to minimize an internalized cost function, subject to the constraints imposed by the human body and the interactive system. In contrast to previous linear approaches used in HCI, MPC can compute optimal controls for nonlinear systems. This allows us to use state-of-the-art biomechanical models and handle nonlinearities that occur in almost any interactive system. Instead of torque actuation, our model employs second-order muscles acting directly at the joints. We compare three different cost functions and evaluate the simulated trajectories against user movements in a Fitts' Law type pointing study with four different interaction techniques. Our results show that the combination of distance, control, and joint acceleration cost matches individual users' movements best, and predicts movements with an accuracy that is within the between-user variance. To aid HCI researchers and designers, we introduce CFAT, a novel method to identify maximum voluntary torques in joint-actuated models based on experimental data, and give practical advice on how to simulate human movement for different users, interaction techniques, and tasks.
Recent developments in Artificial Intelligence (AI) have fueled the emergence of human-AI collaboration, a setting where AI is a coequal partner. Especially in clinical decision-making, it has the potential to improve treatment quality by assisting overworked medical professionals. Even though research has started to investigate the utilization of AI for clinical decision-making, its potential benefits do not imply its adoption by medical professionals. While several studies have started to analyze adoption criteria from a technical perspective, research providing a human-centered perspective with a focus on AI's potential for becoming a coequal team member in the decision-making process remains limited. Therefore, in this work, we identify factors for the adoption of human-AI collaboration by conducting a series of semi-structured interviews with experts in the healthcare domain. We identify six relevant adoption factors and highlight existing tensions between them and effective human-AI collaboration.
The most common sensing modalities found in a robot perception system are vision and touch, which together can provide global and highly localized data for manipulation. However, these sensing modalities often fail to adequately capture the behavior of target objects during the critical moments as they transition out of static, controlled contact with an end-effector to dynamic and uncontrolled motion. In this work, we present a novel multimodal visuotactile sensor that provides simultaneous visuotactile and proximity depth data. The sensor integrates an RGB camera and air pressure sensor to sense touch with an infrared time-of-flight (ToF) camera to sense proximity by leveraging a selectively transmissive soft membrane to enable the dual sensing modalities. We present the mechanical design, fabrication techniques, algorithm implementations, and evaluation of the sensor's tactile and proximity modalities. The sensor is demonstrated in three open-loop robotic tasks: approaching and contacting an object, catching, and throwing. The fusion of tactile and proximity data could be used to capture key information about a target object's transition behavior for sensor-based control in dynamic manipulation.
Perceiving and interacting with 3D articulated objects, such as cabinets, doors, and faucets, pose particular challenges for future home-assistant robots performing daily tasks in human environments. Besides parsing the articulated parts and joint parameters, researchers recently advocate learning manipulation affordance over the input shape geometry which is more task-aware and geometrically fine-grained. However, taking only passive observations as inputs, these methods ignore many hidden but important kinematic constraints (e.g., joint location and limits) and dynamic factors (e.g., joint friction and restitution), therefore losing significant accuracy for test cases with such uncertainties. In this paper, we propose a novel framework, named AdaAfford, that learns to perform very few test-time interactions for quickly adapting the affordance priors to more accurate instance-specific posteriors. We conduct large-scale experiments using the PartNet-Mobility dataset and prove that our system performs better than baselines.
Imitation learning is a promising approach to help robots acquire dexterous manipulation capabilities without the need for a carefully-designed reward or a significant computational effort. However, existing imitation learning approaches require sophisticated data collection infrastructure and struggle to generalize beyond the training distribution. One way to address this limitation is to gather additional data that better represents the full operating conditions. In this work, we investigate characteristics of such additional demonstrations and their impact on performance. Specifically, we study the effects of corrective and randomly-sampled additional demonstrations on learning a policy that guides a five-fingered robot hand through a pick-and-place task. Our results suggest that corrective demonstrations considerably outperform randomly-sampled demonstrations, when the proportion of additional demonstrations sampled from the full task distribution is larger than the number of original demonstrations sampled from a restrictive training distribution. Conversely, when the number of original demonstrations are higher than that of additional demonstrations, we find no significant differences between corrective and randomly-sampled additional demonstrations. These results provide insights into the inherent trade-off between the effort required to collect corrective demonstrations and their relative benefits over randomly-sampled demonstrations. Additionally, we show that inexpensive vision-based sensors, such as LeapMotion, can be used to dramatically reduce the cost of providing demonstrations for dexterous manipulation tasks. Our code is available at //github.com/GT-STAR-Lab/corrective-demos-dexterous-manipulation.
Effective multi-robot teams require the ability to move to goals in complex environments in order to address real-world applications such as search and rescue. Multi-robot teams should be able to operate in a completely decentralized manner, with individual robot team members being capable of acting without explicit communication between neighbors. In this paper, we propose a novel game theoretic model that enables decentralized and communication-free navigation to a goal position. Robots each play their own distributed game by estimating the behavior of their local teammates in order to identify behaviors that move them in the direction of the goal, while also avoiding obstacles and maintaining team cohesion without collisions. We prove theoretically that generated actions approach a Nash equilibrium, which also corresponds to an optimal strategy identified for each robot. We show through extensive simulations that our approach enables decentralized and communication-free navigation by a multi-robot system to a goal position, and is able to avoid obstacles and collisions, maintain connectivity, and respond robustly to sensor noise.
Behaviors of the synthetic characters in current military simulations are limited since they are generally generated by rule-based and reactive computational models with minimal intelligence. Such computational models cannot adapt to reflect the experience of the characters, resulting in brittle intelligence for even the most effective behavior models devised via costly and labor-intensive processes. Observation-based behavior model adaptation that leverages machine learning and the experience of synthetic entities in combination with appropriate prior knowledge can address the issues in the existing computational behavior models to create a better training experience in military training simulations. In this paper, we introduce a framework that aims to create autonomous synthetic characters that can perform coherent sequences of believable behavior while being aware of human trainees and their needs within a training simulation. This framework brings together three mutually complementary components. The first component is a Unity-based simulation environment - Rapid Integration and Development Environment (RIDE) - supporting One World Terrain (OWT) models and capable of running and supporting machine learning experiments. The second is Shiva, a novel multi-agent reinforcement and imitation learning framework that can interface with a variety of simulation environments, and that can additionally utilize a variety of learning algorithms. The final component is the Sigma Cognitive Architecture that will augment the behavior models with symbolic and probabilistic reasoning capabilities. We have successfully created proof-of-concept behavior models leveraging this framework on realistic terrain as an essential step towards bringing machine learning into military simulations.