Nano-sized unmanned aerial vehicles (UAVs) are well-fit for indoor applications and for close proximity to humans. To enable autonomy, the nano-UAV must be able to self-localize in its operating environment. This is a particularly-challenging task due to the limited sensing and compute resources on board. This work presents an online and onboard approach for localization in floor plans annotated with semantic information. Unlike sensor-based maps, floor plans are readily-available, and do not increase the cost and time of deployment. To overcome the difficulty of localizing in sparse maps, the proposed approach fuses geometric information from miniaturized time-of-flight sensors and semantic cues. The semantic information is extracted from images by deploying a state-of-the-art object detection model on a high-performance multi-core microcontroller onboard the drone, consuming only 2.5mJ per frame and executing in 38ms. In our evaluation, we globally localize in a real-world office environment, achieving 90% success rate. We also release an open-source implementation of our work.
A self-contained calibration procedure that can be performed automatically without additional external sensors or tools is a significant advantage, especially for complex robotic systems. Here, we show that the kinematics of a multi-fingered robotic hand can be precisely calibrated only by moving the tips of the fingers pairwise into contact. The only prerequisite for this is sensitive contact detection, e.g., by torque-sensing in the joints (as in our DLR-Hand II) or tactile skin. The measurement function for a given joint configuration is the distance between the modeled fingertip geometries, but the actual measurement is always zero. In an in-depth analysis, we prove that this contact-based calibration determines all quantities needed for manipulating objects with the hand, i.e., the difference vectors of the fingertips, and that it is as sensitive as a calibration using an external visual tracking system and markers. We describe the complete calibration scheme, including the selection of optimal sample joint configurations and search motions for the contacts despite the initial kinematic uncertainties. In a real-world calibration experiment for the torque-controlled four-fingered DLR-Hand II, the maximal error of 17.7mm can be reduced to only 3.7mm.
Small-sized unmanned aerial vehicles (UAVs) have been widely investigated for use in a variety of applications such as remote sensing and aerial surveying. Direct three-dimensional (3D) mapping using a small-sized UAV equipped with a laser scanner is required for numerous remote sensing applications. In direct 3D mapping, the precise information about the position and attitude of the UAV is necessary for constructing 3D maps. In this study, we propose a novel and robust technique for estimating the position and attitude of small-sized UAVs by employing multiple low-cost and light-weight global navigation satellite system (GNSS) antennas/receivers. Using the "redundancy" of multiple GNSS receivers, we enhance the performance of real-time kinematic (RTK)-GNSS by employing single-frequency GNSS receivers. This method consists of two approaches: hybrid GNSS fix solutions and consistency examination of the GNSS signal strength. The fix rate of RTK-GNSS using single-frequency GNSS receivers can be highly enhanced to combine multiple RTK-GNSS to fix solutions in the multiple antennas. In addition, positioning accuracy and fix rate can be further enhanced to detect multipath signals by using multiple GNSS antennas. In this study, we developed a prototype UAV that is equipped with six GNSS antennas/receivers. From the static test results, we conclude that the proposed technique can enhance the accuracy of the position and attitude estimation in multipath environments. From the flight test, the proposed system could generate a 3D map with an accuracy of 5 cm.
With the development of the Internet of Things (IoT), certain IoT devices have the capability to not only accomplish their own tasks but also simultaneously assist other resource-constrained devices. Therefore, this paper considers a device-assisted mobile edge computing system that leverages auxiliary IoT devices to alleviate the computational burden on the edge computing server and enhance the overall system performance. In this study, computationally intensive tasks are decomposed into multiple partitions, and each task partition can be processed in parallel on an IoT device or the edge server. The objective of this research is to develop an efficient online algorithm that addresses the joint optimization of task partitioning and parallel scheduling under time-varying system states, posing challenges to conventional numerical optimization methods. To address these challenges, a framework called online task partitioning action and parallel scheduling policy generation (OTPPS) is proposed, which is based on deep reinforcement learning (DRL). Specifically, the framework leverages a deep neural network (DNN) to learn the optimal partitioning action for each task by mapping input states. Furthermore, it is demonstrated that the remaining parallel scheduling problem exhibits NP-hard complexity when considering a specific task partitioning action. To address this subproblem, a fair and delay-minimized task scheduling (FDMTS) algorithm is designed. Extensive evaluation results demonstrate that OTPPS achieves near-optimal average delay performance and consistently high fairness levels in various environmental states compared to other baseline schemes.
Publish/subscribe systems play a key role in enabling communication between numerous devices in distributed and large-scale architectures. While widely adopted, securing such systems often trades portability for additional integrity and attestation guarantees. Trusted Execution Environments (TEEs) offer a potential solution with enclaves to enhance security and trust. However, application development for TEEs is complex, and many existing solutions are tied to specific TEE architectures, limiting adaptability. Current communication protocols also inadequately manage attestation proofs or expose essential attestation information. This paper introduces a novel approach using WebAssembly to address these issues, a key enabling technology nowadays capturing academia and industry attention. We present the design of a portable and fully attested publish/subscribe middleware system as a holistic approach for trustworthy and distributed communication between various systems. Based on this proposal, we have implemented and evaluated in-depth a fully-fledged publish/subscribe broker running within Intel SGX, compiled in WebAssembly, and built on top of industry-battled frameworks and standards, i.e., MQTT and TLS protocols. Our extended TLS protocol preserves the privacy of attestation information, among other benefits. Our experimental results showcase most overheads, revealing a 1.55x decrease in message throughput when using a trusted broker. We open-source the contributions of this work to the research community to facilitate experimental reproducibility.
Deep Material Network (DMN) has recently emerged as a data-driven surrogate model for heterogeneous materials. Given a particular microstructural morphology, the effective linear and nonlinear behaviors can be successfully approximated by such physics-based neural-network like architecture. In this work, a novel micromechanics-informed parametric DMN (MIpDMN) architecture is proposed for multiscale materials with a varying microstructure characterized by several parameters. A single-layer feedforward neural network is used to account for the dependence of DMN fitting parameters on the microstructural ones. Micromechanical constraints are prescribed both on the architecture and the outputs of this new neural network. The proposed MIpDMN is also recast in a multiple physics setting, where physical properties other than the mechanical ones can also be predicted. In the numerical simulations conducted on three parameterized microstructures, MIpDMN demonstrates satisfying generalization capabilities when morphology varies. The effective behaviors of such parametric multiscale materials can thus be predicted and encoded by MIpDMN with high accuracy and efficiency.
Human activity recognition (HAR) is a key challenge in pervasive computing and its solutions have been presented based on various disciplines. Specifically, for HAR in a smart space without privacy and accessibility issues, data streams generated by deployed pervasive sensors are leveraged. In this paper, we focus on a group activity by which a group of users perform a collaborative task without user identification and propose an efficient group activity recognition scheme which extracts causality patterns from pervasive sensor event sequences generated by a group of users to support as good recognition accuracy as the state-of-the-art graphical model. To filter out irrelevant noise events from a given data stream, a set of rules is leveraged to highlight causally related events. Then, a pattern-tree algorithm extracts frequent causal patterns by means of a growing tree structure. Based on the extracted patterns, a weighted sum-based pattern matching algorithm computes the likelihoods of stored group activities to the given test event sequence by means of matched event pattern counts for group activity recognition. We evaluate the proposed scheme using the data collected from our testbed and CASAS datasets where users perform their tasks on a daily basis and validate its effectiveness in a real environment. Experiment results show that the proposed scheme performs higher recognition accuracy and with a small amount of runtime overhead than the existing schemes.
Data-driven machine learning approaches are being increasingly used to solve partial differential equations (PDEs). They have shown particularly striking successes when training an operator, which takes as input a PDE in some family, and outputs its solution. However, the architectural design space, especially given structural knowledge of the PDE family of interest, is still poorly understood. We seek to remedy this gap by studying the benefits of weight-tied neural network architectures for steady-state PDEs. To achieve this, we first demonstrate that the solution of most steady-state PDEs can be expressed as a fixed point of a non-linear operator. Motivated by this observation, we propose FNO-DEQ, a deep equilibrium variant of the FNO architecture that directly solves for the solution of a steady-state PDE as the infinite-depth fixed point of an implicit operator layer using a black-box root solver and differentiates analytically through this fixed point resulting in $\mathcal{O}(1)$ training memory. Our experiments indicate that FNO-DEQ-based architectures outperform FNO-based baselines with $4\times$ the number of parameters in predicting the solution to steady-state PDEs such as Darcy Flow and steady-state incompressible Navier-Stokes. Finally, we show FNO-DEQ is more robust when trained with datasets with more noisy observations than the FNO-based baselines, demonstrating the benefits of using appropriate inductive biases in architectural design for different neural network based PDE solvers. Further, we show a universal approximation result that demonstrates that FNO-DEQ can approximate the solution to any steady-state PDE that can be written as a fixed point equation.
Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction. Compared to conventional single-modal 3D understanding, introducing an additional modality not only elevates the richness and precision of scene interpretation but also ensures a more robust and resilient understanding. This becomes especially crucial in varied and challenging environments where solely relying on 3D data might be inadequate. While there has been a surge in the development of multi-modal 3D methods over past three years, especially those integrating multi-camera images (3D+2D) and textual descriptions (3D+language), a comprehensive and in-depth review is notably absent. In this article, we present a systematic survey of recent progress to bridge this gap. We begin by briefly introducing a background that formally defines various 3D multi-modal tasks and summarizes their inherent challenges. After that, we present a novel taxonomy that delivers a thorough categorization of existing methods according to modalities and tasks, exploring their respective strengths and limitations. Furthermore, comparative results of recent approaches on several benchmark datasets, together with insightful analysis, are offered. Finally, we discuss the unresolved issues and provide several potential avenues for future research.
The development of unmanned aerial vehicles (UAVs) has been gaining momentum in recent years owing to technological advances and a significant reduction in their cost. UAV technology can be used in a wide range of domains, including communication, agriculture, security, and transportation. It may be useful to group the UAVs into clusters/flocks in certain domains, and various challenges associated with UAV usage can be alleviated by clustering. Several computational challenges arise in UAV flock management, which can be solved by using machine learning (ML) methods. In this survey, we describe the basic terms relating to UAVS and modern ML methods, and we provide an overview of related tutorials and surveys. We subsequently consider the different challenges that appear in UAV flocks. For each issue, we survey several machine learning-based methods that have been suggested in the literature to handle the associated challenges. Thereafter, we describe various open issues in which ML can be applied to solve the different challenges of flocks, and we suggest means of using ML methods for this purpose. This comprehensive review may be useful for both researchers and developers in providing a wide view of various aspects of state-of-the-art ML technologies that are applicable to flock management.
Deployment of Internet of Things (IoT) devices and Data Fusion techniques have gained popularity in public and government domains. This usually requires capturing and consolidating data from multiple sources. As datasets do not necessarily originate from identical sensors, fused data typically results in a complex data problem. Because military is investigating how heterogeneous IoT devices can aid processes and tasks, we investigate a multi-sensor approach. Moreover, we propose a signal to image encoding approach to transform information (signal) to integrate (fuse) data from IoT wearable devices to an image which is invertible and easier to visualize supporting decision making. Furthermore, we investigate the challenge of enabling an intelligent identification and detection operation and demonstrate the feasibility of the proposed Deep Learning and Anomaly Detection models that can support future application that utilizes hand gesture data from wearable devices.