The Unmanned Aerial Vehicle (UAV) swarm networks will play a crucial role in the B5G/6G network thanks to its appealing features, such as wide coverage and on-demand deployment. Emergency communication (EC) is essential to promptly inform UAVs of potential danger to avoid accidents, whereas the conventional communication-only feedback-based methods, which separate the digital and physical identities (DPI), bring intolerable latency and disturb the unintended receivers. In this paper, we present a novel DPI-Mapping solution to match the identities (IDs) of UAVs from dual domains for visual networking, which is the first solution that enables UAVs to communicate promptly with what they see without the tedious exchange of beacons. The IDs are distinguished dynamically by defining feature similarity, and the asymmetric IDs from different domains are matched via the proposed bio-inspired matching algorithm. We also consider Kalman filtering to combine the IDs and predict the states for accurate mapping. Experiment results show that the DPI-Mapping reduces individual inaccuracy of features and significantly outperforms the conventional broadcast-based and feedback-based methods in EC latency. Furthermore, it also reduces the disturbing messages without sacrificing the hit rate.
Progressing towards a new era of Artificial Intelligence (AI) - enabled wireless networks, concerns regarding the environmental impact of AI have been raised both in industry and academia. Federated Learning (FL) has emerged as a key privacy preserving decentralized AI technique. Despite efforts currently being made in FL, its environmental impact is still an open problem. Targeting the minimization of the overall energy consumption of an FL process, we propose the orchestration of computational and communication resources of the involved devices to minimize the total energy required, while guaranteeing a certain performance of the model. To this end, we propose a Soft Actor Critic Deep Reinforcement Learning (DRL) solution, where a penalty function is introduced during training, penalizing the strategies that violate the constraints of the environment, and ensuring a safe RL process. A device level synchronization method, along with a computationally cost effective FL environment are proposed, with the goal of further reducing the energy consumption and communication overhead. Evaluation results show the effectiveness of the proposed scheme compared to four state-of-the-art baseline solutions in both static and dynamic environments, achieving a decrease of up to 94% in the total energy consumption.
Neural ODEs demonstrate strong performance in generative and time-series modelling. However, training them via the adjoint method is slow compared to discrete models due to the requirement of numerically solving ODEs. To speed neural ODEs up, a common approach is to regularise the solutions. However, this approach may affect the expressivity of the model; when the trajectory itself matters, this is particularly important. In this paper, we propose an alternative way to speed up the training of neural ODEs. The key idea is to speed up the adjoint method by using Gau{\ss}-Legendre quadrature to solve integrals faster than ODE-based methods while remaining memory efficient. We also extend the idea to training SDEs using the Wong-Zakai theorem, by training a corresponding ODE and transferring the parameters. Our approach leads to faster training of neural ODEs, especially for large models. It also presents a new way to train SDE-based models.
This paper presents a new Metacognitive Decision Making (MDM) framework inspired by human-like metacognitive principles. The MDM framework is incorporated in unmanned aerial vehicles (UAVs) deployed for decentralized stochastic search without communication for detecting stationary targets (fixed/sudden pop-up) and dynamic targets. The UAVs are equipped with multiple sensors (varying sensing capability) and search for targets in a largely unknown area. The MDM framework consists of a metacognitive component and a self-cognitive component. The metacognitive component helps to self-regulate the search with multiple sensors addressing the issues of "which-sensor-to-use", "when-to-switch-sensor", and "how-to-search". Each sensor possesses inverse characteristics for the sensing attributes like sensing range and accuracy. Based on the information gathered by multiple sensors carried by each UAV, the self-cognitive component regulates different levels of stochastic search and switching levels for effective searching. The lower levels of search aim to localize the search space for the possible presence of a target (detection) with different sensors. The highest level of a search exploits the search space for target confirmation using the sensor with the highest accuracy among all sensors. The performance of the MDM framework with two sensors having low accuracy with wide range sensor for detection and increased accuracy with low range sensor for confirmation is evaluated through Monte-Carlo simulations and compared with six multi-UAV stochastic search algorithms (three self-cognitive searches and three self and social-cognitive based search). The results indicate that the MDM framework is efficient in detecting and confirming targets in an unknown environment.
Machine Reading Comprehension (MRC) models tend to take advantage of spurious correlations (also known as dataset bias or annotation artifacts in the research community). Consequently, these models may perform the MRC task without fully comprehending the given context and question, which is undesirable since it may result in low robustness against distribution shift. This paper delves into the concept of answer-position bias, where a significant percentage of training questions have answers located solely in the first sentence of the context. We propose a Single-Sentence Reader as a new approach for addressing answer position bias in MRC. We implement this approach using six different models and thoroughly analyze their performance. Remarkably, our proposed Single-Sentence Readers achieve results that nearly match those of models trained on conventional training sets, proving their effectiveness. Our study also discusses several challenges our Single-Sentence Readers encounter and proposes a potential solution.
This work addresses 3D human pose reconstruction in single images. We present a method that combines Forward Kinematics (FK) with neural networks to ensure a fast and valid prediction of 3D pose. Pose is represented as a hierarchical tree/graph with nodes corresponding to human joints that model their physical limits. Given a 2D detection of keypoints in the image, we lift the skeleton to 3D using neural networks to predict both the joint rotations and bone lengths. These predictions are then combined with skeletal constraints using an FK layer implemented as a network layer in PyTorch. The result is a fast and accurate approach to the estimation of 3D skeletal pose. Through quantitative and qualitative evaluation, we demonstrate the method is significantly more accurate than MediaPipe in terms of both per joint positional error and visual appearance. Furthermore, we demonstrate generalization over different datasets. The implementation in PyTorch runs at between 100-200 milliseconds per image (including CNN detection) using CPU only.
Graph neural networks (GNNs) is widely used to learn a powerful representation of graph-structured data. Recent work demonstrates that transferring knowledge from self-supervised tasks to downstream tasks could further improve graph representation. However, there is an inherent gap between self-supervised tasks and downstream tasks in terms of optimization objective and training data. Conventional pre-training methods may be not effective enough on knowledge transfer since they do not make any adaptation for downstream tasks. To solve such problems, we propose a new transfer learning paradigm on GNNs which could effectively leverage self-supervised tasks as auxiliary tasks to help the target task. Our methods would adaptively select and combine different auxiliary tasks with the target task in the fine-tuning stage. We design an adaptive auxiliary loss weighting model to learn the weights of auxiliary tasks by quantifying the consistency between auxiliary tasks and the target task. In addition, we learn the weighting model through meta-learning. Our methods can be applied to various transfer learning approaches, it performs well not only in multi-task learning but also in pre-training and fine-tuning. Comprehensive experiments on multiple downstream tasks demonstrate that the proposed methods can effectively combine auxiliary tasks with the target task and significantly improve the performance compared to state-of-the-art methods.
Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model -- Cross-Node Federated Graph Neural Network (CNFGNN) -- which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.
Convolutional neural networks (CNNs) have shown dramatic improvements in single image super-resolution (SISR) by using large-scale external samples. Despite their remarkable performance based on the external dataset, they cannot exploit internal information within a specific image. Another problem is that they are applicable only to the specific condition of data that they are supervised. For instance, the low-resolution (LR) image should be a "bicubic" downsampled noise-free image from a high-resolution (HR) one. To address both issues, zero-shot super-resolution (ZSSR) has been proposed for flexible internal learning. However, they require thousands of gradient updates, i.e., long inference time. In this paper, we present Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR), which leverages ZSSR. Precisely, it is based on finding a generic initial parameter that is suitable for internal learning. Thus, we can exploit both external and internal information, where one single gradient update can yield quite considerable results. (See Figure 1). With our method, the network can quickly adapt to a given image condition. In this respect, our method can be applied to a large spectrum of image conditions within a fast adaptation process.
Convolutional networks (ConvNets) have achieved great successes in various challenging vision tasks. However, the performance of ConvNets would degrade when encountering the domain shift. The domain adaptation is more significant while challenging in the field of biomedical image analysis, where cross-modality data have largely different distributions. Given that annotating the medical data is especially expensive, the supervised transfer learning approaches are not quite optimal. In this paper, we propose an unsupervised domain adaptation framework with adversarial learning for cross-modality biomedical image segmentations. Specifically, our model is based on a dilated fully convolutional network for pixel-wise prediction. Moreover, we build a plug-and-play domain adaptation module (DAM) to map the target input to features which are aligned with source domain feature space. A domain critic module (DCM) is set up for discriminating the feature space of both domains. We optimize the DAM and DCM via an adversarial loss without using any target domain label. Our proposed method is validated by adapting a ConvNet trained with MRI images to unpaired CT data for cardiac structures segmentations, and achieved very promising results.
We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.