Within the concept of physical human-robot interaction (pHRI), the most important criterion is the safety of the human operator interacting with a high degree of freedom (DoF) robot. Therefore, a robust control scheme is in high demand to establish safe pHRI and stabilize nonlinear, high DoF systems. In this paper, an adaptive decentralized control strategy is designed to accomplish the abovementioned objectives. To do so, a human upper limb model and an exoskeleton model are decentralized and augmented at the subsystem level to enable a decentralized control action design. Moreover, human exogenous force (HEF) that can resist exoskeleton motion is estimated using radial basis function neural networks (RBFNNs). Estimating both human upper limb and robot rigid body parameters, along with HEF estimation, makes the controller adaptable to different operators, ensuring their physical safety. The barrier Lyapunov function (BLF) is employed to guarantee that the robot can operate in a safe workspace while ensuring stability by adjusting the control law. Unknown actuator uncertainty and constraints are also considered in this study to ensure a smooth and safe pHRI. Then, the asymptotic stability of the whole system is established by means of the virtual stability concept and virtual power flows (VPFs) under the proposed robust controller. The experimental results are presented and compared to proportional-derivative (PD) and proportional-integral-derivative (PID) controllers. To show the robustness of the designed controller and its good performance, experiments are performed at different velocities, with different human users, and in the presence of unknown disturbances. The proposed controller showed perfect performance in controlling the robot, whereas PD and PID controllers could not even ensure stable motion in the wrist joints of the robot.
RoboCup represents an International testbed for advancing research in AI and robotics, focusing on a definite goal: developing a robot team that can win against the human world soccer champion team by the year 2050. To achieve this goal, autonomous humanoid robots' coordination is crucial. This paper explores novel solutions within the RoboCup Standard Platform League (SPL), where a reduction in WiFi communication is imperative, leading to the development of new coordination paradigms. The SPL has experienced a substantial decrease in network packet rate, compelling the need for advanced coordination architectures to maintain optimal team functionality in dynamic environments. Inspired by market-based task assignment, we introduce a novel distributed coordination system to orchestrate autonomous robots' actions efficiently in low communication scenarios. This approach has been tested with NAO robots during official RoboCup competitions and in the SimRobot simulator, demonstrating a notable reduction in task overlaps in limited communication settings.
Motion prediction for intelligent vehicles typically focuses on estimating the most probable future evolutions of a traffic scenario. Estimating the gap acceptance, i.e., whether a vehicle merges or crosses before another vehicle with the right of way, is often handled implicitly in the prediction. However, an infrastructure-based maneuver planning can assign artificial priorities between cooperative vehicles, so it needs to evaluate many more potential scenarios. Additionally, the prediction horizon has to be long enough to assess the impact of a maneuver. We, therefore, present a novel long-term prediction approach handling the gap acceptance estimation and the velocity prediction in two separate stages. Thereby, the behavior of regular vehicles as well as priority assignments of cooperative vehicles can be considered. We train both stages on real-world traffic observations to achieve realistic prediction results. Our method has a competitive accuracy and is fast enough to predict a multitude of scenarios in a short time, making it suitable to be used in a maneuver planning framework.
We present an experimental validation of a recently proposed optimization technique for reservoir computing, using an optoelectronic setup. Reservoir computing is a robust framework for signal processing applications, and the development of efficient optimization approaches remains a key challenge. The technique we address leverages solely a delayed version of the input signal to identify the optimal operational region of the reservoir, simplifying the traditionally time-consuming task of hyperparameter tuning. We verify the effectiveness of this approach on different benchmark tasks and reservoir operating conditions.
The majority of the research on the quantization of Deep Neural Networks (DNNs) is focused on reducing the precision of tensors visible by high-level frameworks (e.g., weights, activations, and gradients). However, current hardware still relies on high-accuracy core operations. Most significant is the operation of accumulating products. This high-precision accumulation operation is gradually becoming the main computational bottleneck. This is because, so far, the usage of low-precision accumulators led to a significant degradation in performance. In this work, we present a simple method to train and fine-tune high-end DNNs, to allow, for the first time, utilization of cheaper, $12$-bits accumulators, with no significant degradation in accuracy. Lastly, we show that as we decrease the accumulation precision further, using fine-grained gradient approximations can improve the DNN accuracy.
Appearance-based gaze estimation, which uses only a regular camera to estimate human gaze, is important in various application fields. While the technique faces data bias issues, data collection protocol is often demanding, and collecting data from a wide range of participants is difficult. It is an important challenge to design opportunities that allow a diverse range of people to participate while ensuring the quality of the training data. To tackle this challenge, we introduce a novel gamified approach for collecting training data. In this game, two players communicate words via eye gaze through a transparent letter board. Images captured during gameplay serve as valuable training data for gaze estimation models. The game is designed as a physical installation that involves communication between players, and it is expected to attract the interest of diverse participants. We assess the game's significance on data quality and user experience through a comparative user study.
Soft robots are gaining popularity thanks to their intrinsic safety to contacts and adaptability. However, the potentially infinite number of Degrees of Freedom makes their modeling a daunting task, and in many cases only an approximated description is available. This challenge makes reinforcement learning (RL) based approaches inefficient when deployed on a realistic scenario, due to the large domain gap between models and the real platform. In this work, we demonstrate, for the first time, how Domain Randomization (DR) can solve this problem by enhancing RL policies for soft robots with: i) robustness w.r.t. unknown dynamics parameters; ii) reduced training times by exploiting drastically simpler dynamic models for learning; iii) better environment exploration, which can lead to exploitation of environmental constraints for optimal performance. Moreover, we introduce a novel algorithmic extension to previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects. We provide an extensive evaluation in simulation on four different tasks and two soft robot designs, opening interesting perspectives for future research on Reinforcement Learning for closed-loop soft robot control.
A common limitation of autonomous tissue manipulation in robotic minimally invasive surgery (MIS) is the absence of force sensing and control at the tool level. Recently, our team has developed haptics-enabled forceps that can simultaneously measure the grasping and pulling forces during tissue manipulation. Based on this design, here we further present a method to automate tissue traction with controlled grasping and pulling forces. Specifically, the grasping stage relies on a controlled grasping force, while the pulling stage is under the guidance of a controlled pulling force. Notably, during the pulling process, the simultaneous control of both grasping and pulling forces is also enabled for more precise tissue traction, achieved through force decoupling. The force controller is built upon a static model of tissue manipulation, considering the interaction between the haptics-enabled forceps and soft tissue. The efficacy of this force control approach is validated through a series of experiments comparing targeted, estimated, and actual reference forces. To verify the feasibility of the proposed method in surgical applications, various tissue resections are conducted on ex vivo tissues employing a dual-arm robotic setup. Finally, we discuss the benefits of multi-force control in tissue traction, evidenced through comparative analyses of various ex vivo tissue resections. The results affirm the feasibility of implementing automatic tissue traction using micro-sized forceps with multi-force control, suggesting its potential to promote autonomous MIS. A video demonstrating the experiments can be found at //youtu.be/8fe8o8IFrjE.
Adversarial examples are one critical security threat to various visual applications, where injected human-imperceptible perturbations can confuse the output.Generating transferable adversarial examples in the black-box setting is crucial but challenging in practice. Existing input-diversity-based methods adopt different image transformations, but may be inefficient due to insufficient input diversity and an identical perturbation step size. Motivated by the fact that different image regions have distinctive weights in classification, this paper proposes a black-box adversarial generative framework by jointly designing enhanced input diversity and adaptive step sizes. We design local mixup to randomly mix a group of transformed adversarial images, strengthening the input diversity. For precise adversarial generation, we project the perturbation into the $tanh$ space to relax the boundary constraint. Moreover, the step sizes of different regions can be dynamically adjusted by integrating a second-order momentum.Extensive experiments on ImageNet validate that our framework can achieve superior transferability compared to state-of-the-art baselines.
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.
Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.