Autonomous robot navigation within the dynamic unknown environment is of crucial significance for mobile robotic applications including robot navigation in last-mile delivery and robot-enabled automated supplies in industrial and hospital delivery applications. Current solutions still suffer from limitations, such as the robot cannot recognize unknown objects in real-time and cannot navigate freely in a dynamic, narrow, and complex environment. We propose a complete software framework for autonomous robot perception and navigation within very dense obstacles and dense human crowds. First, we propose a framework that accurately detects and segments open-world object categories in a zero-shot manner, which overcomes the over-segmentation limitation of the current SAM model. Second, we proposed the distillation strategy to distill the knowledge to segment the free space of the walkway for robot navigation without the label. In the meantime, we design the trimming strategy that works collaboratively with distillation to enable lightweight inference to deploy the neural network on edge devices such as NVIDIA-TX2 or Xavier NX during autonomous navigation. Integrated into the robot navigation system, extensive experiments demonstrate that our proposed framework has achieved superior performance in terms of both accuracy and efficiency in robot scene perception and autonomous robot navigation.
This work considers an uplink wireless communication system where multiple users with multiple antennas transmit data frames over dynamic channels. Previous studies have shown that multiple transmit and receive antennas can substantially enhance the sum-capacity of all users when the channel is known at the transmitter and in the case of uncorrelated transmit and receive antennas. However, spatial correlations stemming from close proximity of transmit antennas and channel variation between pilot and data time slots, known as channel aging, can substantially degrade the transmission rate if they are not properly into account. In this work, we provide an analytical framework to concurrently exploit both of these features. Specifically, we first propose a beamforming framework to capture spatial correlations. Then, based on random matrix theory tools, we introduce a deterministic expression that approximates the average sum-capacity of all users. Subsequently, we obtain the optimal values of pilot spacing and beamforming vectors upon maximizing this expression. Simulation results show the impacts of path loss, velocity of mobile users and Rician factor on the resulting sum-capacity and underscore the efficacy of our methodology compared to prior works.
Brain-computer interfaces (BCIs) harness electroencephalographic signals for direct neural control of devices, offering a significant benefit for individuals with motor impairments. Traditional machine learning methods for EEG-based motor imagery (MI) classification encounter challenges such as manual feature extraction and susceptibility to noise.This paper introduces EEGEncoder, a deep learning framework that employs modified transformers and TCNs to surmount these limitations. We innovatively propose a fusion architecture, namely Dual-Stream Temporal-Spatial Block (DSTS), to capture temporal and spatial features, improving the accuracy of Motor Imagery classification task. Additionally, we use multiple parallel structures to enhance the performance of the model. When tested on the BCI Competition IV-2a dataset, our model results outperform current state-of-the-art techniques.
Advancements in 6G wireless technology have elevated the importance of beamforming, especially for attaining ultra-high data rates via millimeter-wave (mmWave) frequency deployment. Although promising, mmWave bands require substantial beam training to achieve precise beamforming. While initial deep learning models that use RGB camera images demonstrated promise in reducing beam training overhead, their performance suffers due to sensitivity to lighting and environmental variations. Due to this sensitivity, Quality of Service (QoS) fluctuates, eventually affecting the stability and dependability of networks in dynamic environments. This emphasizes a critical need for more robust solutions. This paper proposes a robust beamforming technique to ensure consistent QoS under varying environmental conditions. An optimization problem has been formulated to maximize users' data rates. To solve the formulated NP-hard optimization problem, we decompose it into two subproblems: the semantic localization problem and the optimal beam selection problem. To solve the semantic localization problem, we propose a novel method that leverages the k-means clustering and YOLOv8 model. To solve the beam selection problem, we propose a novel lightweight hybrid architecture that utilizes various data sources and a weighted entropy-based mechanism to predict the optimal beams. Rapid and accurate beam predictions are needed to maintain QoS. A novel metric, Accuracy-Complexity Efficiency (ACE), has been proposed to quantify this. Six testing scenarios have been developed to evaluate the robustness of the proposed model. Finally, the simulation result demonstrates that the proposed model outperforms several state-of-the-art baselines regarding beam prediction accuracy, received power, and ACE in the developed test scenarios.
Edge computing processes data near its source, reducing latency and enhancing security compared to traditional cloud computing while providing its benefits. This paper explores edge computing for migrating an existing safety-critical robotics use case from an onboard dedicated hardware solution. We propose an edge robotics architecture based on Linux, Docker containers, Kubernetes, and a local wireless area network based on the TTWiFi protocol. Inspired by previous work on real-time cloud, we complement the architecture with a resource management and orchestration layer to help Linux manage, and Kubernetes orchestrate the system-wide shared resources (e.g., caches, memory bandwidth, and network). Our architecture aims to ensure the fault-tolerant and predictable execution of robotic applications (e.g., path planning) on the edge while upper-bounding the end-to-end latency and ensuring the best possible quality of service without jeopardizing safety and security.
Multi-robot target tracking finds extensive applications in different scenarios, such as environmental surveillance and wildfire management, which require the robustness of the practical deployment of multi-robot systems in uncertain and dangerous environments. Traditional approaches often focus on the performance of tracking accuracy with no modeling and assumption of the environments, neglecting potential environmental hazards which result in system failures in real-world deployments. To address this challenge, we investigate multi-robot target tracking in the adversarial environment considering sensing and communication attacks with uncertainty. We design specific strategies to avoid different danger zones and proposed a multi-agent tracking framework under the perilous environment. We approximate the probabilistic constraints and formulate practical optimization strategies to address computational challenges efficiently. We evaluate the performance of our proposed methods in simulations to demonstrate the ability of robots to adjust their risk-aware behaviors under different levels of environmental uncertainty and risk confidence. The proposed method is further validated via real-world robot experiments where a team of drones successfully track dynamic ground robots while being risk-aware of the sensing and/or communication danger zones.
Diffusion models have enabled remarkably high-quality medical image generation, yet it is challenging to enforce anatomical constraints in generated images. To this end, we propose a diffusion model-based method that supports anatomically-controllable medical image generation, by following a multi-class anatomical segmentation mask at each sampling step. We additionally introduce a random mask ablation training algorithm to enable conditioning on a selected combination of anatomical constraints while allowing flexibility in other anatomical areas. We compare our method ("SegGuidedDiff") to existing methods on breast MRI and abdominal/neck-to-pelvis CT datasets with a wide range of anatomical objects. Results show that our method reaches a new state-of-the-art in the faithfulness of generated images to input anatomical masks on both datasets, and is on par for general anatomical realism. Finally, our model also enjoys the extra benefit of being able to adjust the anatomical similarity of generated images to real images of choice through interpolation in its latent space. SegGuidedDiff has many applications, including cross-modality translation, and the generation of paired or counterfactual data. Our code is available at //github.com/mazurowski-lab/segmentation-guided-diffusion.
With the development of cloud-native technologies, microservice-based software systems face challenges in accurately localizing root causes when failures occur. Additionally, the cloud-edge collaborative environment introduces more difficulties, such as unstable networks and high latency across network segments. Accurately identifying the root cause of microservices in a cloud-edge collaborative environment has thus become an urgent problem. In this paper, we propose MicroCERCL, a novel approach that pinpoints root causes at the kernel and application level in the cloud-edge collaborative environment. Our key insight is that failures propagate through direct invocations and indirect resource-competition dependencies in a cloud-edge collaborative environment characterized by instability and high latency. This will become more complex in the hybrid deployment that simultaneously involves multiple microservice systems. Leveraging this insight, we extract valid contents from kernel-level logs to prioritize localizing the kernel-level root cause. Moreover, we construct a heterogeneous dynamic topology stack and train a graph neural network model to accurately localize the application-level root cause without relying on historical data. Notably, we released the first benchmark hybrid deployment microservice system in a cloud-edge collaborative environment (the largest and most complex within our knowledge). Experiments conducted on the dataset collected from the benchmark show that MicroCERCL can accurately localize the root cause of microservice systems in such environments, significantly outperforming state-of-the-art approaches with an increase of at least 24.1% in top-1 accuracy.
Communication is essential for successful interaction. In human-robot interaction, implicit communication enhances robots' understanding of human needs, emotions, and intentions. This paper introduces a method to foster implicit communication in HRI without explicitly modeling human intentions or relying on pre-existing knowledge. Leveraging Transfer Entropy, we modulate influence between agents in social interactions in scenarios involving either collaboration or competition. By integrating influence into agents' rewards within a partially observable Markov decision process, we demonstrate that boosting influence enhances collaboration or competition performance, while resisting influence diminishes performance. Our findings are validated through simulations and real-world experiments with human participants.
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.
Autonomic computing investigates how systems can achieve (user) specified control outcomes on their own, without the intervention of a human operator. Autonomic computing fundamentals have been substantially influenced by those of control theory for closed and open-loop systems. In practice, complex systems may exhibit a number of concurrent and inter-dependent control loops. Despite research into autonomic models for managing computer resources, ranging from individual resources (e.g., web servers) to a resource ensemble (e.g., multiple resources within a data center), research into integrating Artificial Intelligence (AI) and Machine Learning (ML) to improve resource autonomy and performance at scale continues to be a fundamental challenge. The integration of AI/ML to achieve such autonomic and self-management of systems can be achieved at different levels of granularity, from full to human-in-the-loop automation. In this article, leading academics, researchers, practitioners, engineers, and scientists in the fields of cloud computing, AI/ML, and quantum computing join to discuss current research and potential future directions for these fields. Further, we discuss challenges and opportunities for leveraging AI and ML in next generation computing for emerging computing paradigms, including cloud, fog, edge, serverless and quantum computing environments.