Brain-robot interaction (BRI) empowers individuals to control (semi-)automated machines through their brain activity, either passively or actively. In the past decade, BRI systems have achieved remarkable success, predominantly harnessing electroencephalogram (EEG) signals as the central component. This paper offers an up-to-date and exhaustive examination of 87 curated studies published during the last five years (2018-2023), focusing on identifying the research landscape of EEG-based BRI systems. This review aims to consolidate and underscore methodologies, interaction modes, application contexts, system evaluation, existing challenges, and potential avenues for future investigations in this domain. Based on our analysis, we present a BRI system model with three entities: Brain, Robot, and Interaction, depicting the internal relationships of a BRI system. We especially investigate the essence and principles on interaction modes between human brains and robots, a domain that has not yet been identified anywhere. We then discuss these entities with different dimensions encompassed. Within this model, we scrutinize and classify current research, reveal insights, specify challenges, and provide recommendations for future research trajectories in this field. Meanwhile, we envision our findings offer a design space for future human-robot interaction (HRI) research, informing the creation of efficient BRI frameworks.
Multi-person pose estimation (MPPE), which aims to locate the key points for all persons in the frames, is an active research branch of computer vision. Variable human poses and complex scenes make MPPE dependent on local details and global structures; their absence may cause key point feature misalignment. In this case, high-order spatial interactions that can effectively link the local and global information of features are particularly important. However, most methods do not include spatial interactions. A few methods have low-order spatial interactions, but achieving a good balance between accuracy and complexity is challenging. To address the above problems, a dual-residual spatial interaction network (DRSI-Net) for MPPE with high accuracy and low complexity is proposed herein. Compared to other methods, DRSI-Net recursively performs residual spatial information interactions on the neighbouring features so that more useful spatial information can be retained and more similarities can be obtained between shallow and deep extracted features. The channel and spatial dual attention mechanism introduced in the multi-scale feature fusion also helps the network to adaptively focus on features relevant to the target key points and further refine the generated poses. Simultaneously, by optimising the interactive channel dimensions and dividing the gradient flow, the spatial interaction module is designed to be lightweight, thus reducing the complexity of the network. According to the experimental results on the COCO dataset, the proposed DRSI-Net outperforms other state-of-the-art methods in accuracy and complexity.
The integration of intelligent reflecting surface (IRS) into over-the-air computation (AirComp) is an effective solution for reducing the computational mean squared error (MSE) via its high passive beamforming gain. Prior works on IRS aided AirComp generally rely on the full instantaneous channel state information (I-CSI), which is not applicable to large-scale systems due to its heavy signalling overhead. To address this issue, we propose a novel multi-timescale transmission protocol. In particular, the receive beamforming at the access point (AP) is pre-determined based on the static angle information and the IRS phase-shifts are optimized relying on the long-term statistical CSI. With the obtained AP receive beamforming and IRS phase-shifts, the effective low-dimensional I-CSI is exploited to determine devices' transmit power in each coherence block, thus substantially reducing the signalling overhead. Theoretical analysis unveils that the achievable MSE scales on the order of ${\cal O}\left( {K/\left( {{N^2}M} \right)} \right)$, where $M$, $N$, and $K$ are the number of AP antennas, IRS elements, and devices, respectively. We also prove that the channel-inversion power control is asymptotically optimal for large $N$, which reveals that the full power transmission policy is not needed for lowering the power consumption of energy-limited devices.
In the realm of robotics, the quest for achieving real-world autonomy, capable of executing large-scale and long-term operations, has positioned place recognition (PR) as a cornerstone technology. Despite the PR community's remarkable strides over the past two decades, garnering attention from fields like computer vision and robotics, the development of PR methods that sufficiently support real-world robotic systems remains a challenge. This paper aims to bridge this gap by highlighting the crucial role of PR within the framework of Simultaneous Localization and Mapping (SLAM) 2.0. This new phase in robotic navigation calls for scalable, adaptable, and efficient PR solutions by integrating advanced artificial intelligence (AI) technologies. For this goal, we provide a comprehensive review of the current state-of-the-art (SOTA) advancements in PR, alongside the remaining challenges, and underscore its broad applications in robotics. This paper begins with an exploration of PR's formulation and key research challenges. We extensively review literature, focusing on related methods on place representation and solutions to various PR challenges. Applications showcasing PR's potential in robotics, key PR datasets, and open-source libraries are discussed. We also emphasizes our open-source package, aimed at new development and benchmark for general PR. We conclude with a discussion on PR's future directions, accompanied by a summary of the literature covered and access to our open-source library, available to the robotics community at: //github.com/MetaSLAM/GPRS.
We present GoalGrasp, a simple yet effective 6-DOF robot grasp pose detection method that does not rely on grasp pose annotations and grasp training. Our approach enables user-specified object grasping in partially occluded scenes. By combining 3D bounding boxes and simple human grasp priors, our method introduces a novel paradigm for robot grasp pose detection. First, we employ a 3D object detector named RCV, which requires no 3D annotations, to achieve rapid 3D detection in new scenes. Leveraging the 3D bounding box and human grasp priors, our method achieves dense grasp pose detection. The experimental evaluation involves 18 common objects categorized into 7 classes based on shape. Without grasp training, our method generates dense grasp poses for 1000 scenes. We compare our method's grasp poses to existing approaches using a novel stability metric, demonstrating significantly higher grasp pose stability. In user-specified robot grasping experiments, our approach achieves a 94% grasp success rate. Moreover, in user-specified grasping experiments under partial occlusion, the success rate reaches 92%.
Over the past two decades, machine analysis of medical imaging has advanced rapidly, opening up significant potential for several important medical applications. As complicated diseases increase and the number of cases rises, the role of machine-based imaging analysis has become indispensable. It serves as both a tool and an assistant to medical experts, providing valuable insights and guidance. A particularly challenging task in this area is lesion segmentation, a task that is challenging even for experienced radiologists. The complexity of this task highlights the urgent need for robust machine learning approaches to support medical staff. In response, we present our novel solution: the D-TrAttUnet architecture. This framework is based on the observation that different diseases often target specific organs. Our architecture includes an encoder-decoder structure with a composite Transformer-CNN encoder and dual decoders. The encoder includes two paths: the Transformer path and the Encoders Fusion Module path. The Dual-Decoder configuration uses two identical decoders, each with attention gates. This allows the model to simultaneously segment lesions and organs and integrate their segmentation losses. To validate our approach, we performed evaluations on the Covid-19 and Bone Metastasis segmentation tasks. We also investigated the adaptability of the model by testing it without the second decoder in the segmentation of glands and nuclei. The results confirmed the superiority of our approach, especially in Covid-19 infections and the segmentation of bone metastases. In addition, the hybrid encoder showed exceptional performance in the segmentation of glands and nuclei, solidifying its role in modern medical image analysis.
As spiking neural networks (SNNs) gain traction in deploying neuromorphic computing solutions, protecting their intellectual property (IP) has become crucial. Without adequate safeguards, proprietary SNN architectures are at risk of theft, replication, or misuse, which could lead to significant financial losses for the owners. While IP protection techniques have been extensively explored for artificial neural networks (ANNs), their applicability and effectiveness for the unique characteristics of SNNs remain largely unexplored. In this work, we pioneer an investigation into adapting two prominent watermarking approaches, namely, fingerprint-based and backdoor-based mechanisms to secure proprietary SNN architectures. We conduct thorough experiments to evaluate the impact on fidelity, resilience against overwrite threats, and resistance to compression attacks when applying these watermarking techniques to SNNs, drawing comparisons with their ANN counterparts. This study lays the groundwork for developing neuromorphic-aware IP protection strategies tailored to the distinctive dynamics of SNNs.
The emergence of quantum computing poses a formidable security challenge to network protocols traditionally safeguarded by classical cryptographic algorithms. This paper provides an exhaustive analysis of vulnerabilities introduced by quantum computing in a diverse array of widely utilized security protocols across the layers of the TCP/IP model, including TLS, IPsec, SSH, PGP, and more. Our investigation focuses on precisely identifying vulnerabilities susceptible to exploitation by quantum adversaries at various migration stages for each protocol while also assessing the associated risks and consequences for secure communication. We delve deep into the impact of quantum computing on each protocol, emphasizing potential threats posed by quantum attacks and scrutinizing the effectiveness of post-quantum cryptographic solutions. Through carefully evaluating vulnerabilities and risks that network protocols face in the post-quantum era, this study provides invaluable insights to guide the development of appropriate countermeasures. Our findings contribute to a broader comprehension of quantum computing's influence on network security and offer practical guidance for protocol designers, implementers, and policymakers in addressing the challenges stemming from the advancement of quantum computing. This comprehensive study is a crucial step toward fortifying the security of networked environments in the quantum age.
Despite the current surge of interest in autonomous robotic systems, robot activity recognition within restricted indoor environments remains a formidable challenge. Conventional methods for detecting and recognizing robotic arms' activities often rely on vision-based or light detection and ranging (LiDAR) sensors, which require line-of-sight (LoS) access and may raise privacy concerns, for example, in nursing facilities. This research pioneers an innovative approach harnessing channel state information (CSI) measured from WiFi signals, subtly influenced by the activity of robotic arms. We developed an attention-based network to classify eight distinct activities performed by a Franka Emika robotic arm in different situations. Our proposed bidirectional vision transformer-concatenated (BiVTC) methodology aspires to predict robotic arm activities accurately, even when trained on activities with different velocities, all without dependency on external or internal sensors or visual aids. Considering the high dependency of CSI data on the environment motivated us to study the problem of sniffer location selection, by systematically changing the sniffer's location and collecting different sets of data. Finally, this paper also marks the first publication of the CSI data of eight distinct robotic arm activities, collectively referred to as RoboFiSense. This initiative aims to provide a benchmark dataset and baselines to the research community, fostering advancements in the field of robotics sensing.
With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning hard to be practical in a wide range of areas. Plenty of methods have been developed for sample efficient deep reinforcement learning, such as environment modeling, experience transfer, and distributed modifications, amongst which, distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming, and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods, and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analyzing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, hoping this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.
We study the problem of efficient semantic segmentation for large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Extensive experiments show that our RandLA-Net can process 1 million points in a single pass with up to 200X faster than existing approaches. Moreover, our RandLA-Net clearly surpasses state-of-the-art approaches for semantic segmentation on two large-scale benchmarks Semantic3D and SemanticKITTI.