Virtual reality (VR) over wireless is expected to be one of the killer applications in next-generation communication networks. Nevertheless, the huge data volume along with stringent requirements on latency and reliability under limited bandwidth resources makes untethered wireless VR delivery increasingly challenging. Such bottlenecks, therefore, motivate this work to seek the potential of using semantic communication, a new paradigm that promises to significantly ease the resource pressure, for efficient VR delivery. To this end, we propose a novel framework, namely WIreless SEmantic deliveRy for VR (WiserVR), for delivering consecutive 360{\deg} video frames to VR users. Specifically, deep learning-based multiple modules are well-devised for the transceiver in WiserVR to realize high-performance feature extraction and semantic recovery. Among them, we dedicatedly develop a concept of semantic location graph and leverage the joint-semantic-channel-coding method with knowledge sharing to not only substantially reduce communication latency, but also to guarantee adequate transmission reliability and resilience under various channel states. Moreover, implementation of WiserVR is presented, followed by corresponding initial simulations for performance evaluation compared with benchmarks. Finally, we discuss several open issues and offer feasible solutions to unlock the full potential of WiserVR.
This work introduces a new perspective for physical media sharing in multiuser communication by jointly considering (i) the meaning of the transmitted message and (ii) its function at the end user. Specifically, we have defined a scenario where multiple users (sensors) are continuously transmitting their own states concerning a predetermined event. On the receiver side there is an alarm monitoring system, whose function is to decide whether such a predetermined event has happened in a certain time period and, if yes, in which user. The media access control protocol proposed constitutes an alternative approach to the conventional physical layer methods, because the receiver does not decode the received waveform directly; rather, the relative position of the absence or presence of energy within a multidimensional resource space carries the (semantic) information. The protocol introduced here provides high efficiency in multiuser networks that operate with event-triggered sampling by enabling a constructive reconstruction of transmission collisions. We have demonstrated that the proposed method leads to a better event transmission efficiency than conventional methods like TDMA and slotted ALOHA. Remarkably, the proposed method achieves 100\% efficiency and 0\% error probability in almost all the studied cases, while consistently outperforming TDMA and slotted ALOHA.
Remote monitoring systems analyze the environment dynamics in different smart industrial applications, such as occupational health and safety, and environmental monitoring. Specifically, in industrial Internet of Things (IoT) systems, the huge number of devices and the expected performance put pressure on resources, such as computational, network, and device energy. Distributed training of Machine and Deep Learning (ML/DL) models for intelligent industrial IoT applications is very challenging for resource limited devices over heterogeneous wireless networks (HetNets). Hierarchical Federated Learning (HFL) performs training at multiple layers offloading the tasks to nearby Multi-Access Edge Computing (MEC) units. In this paper, we propose a novel energy-efficient HFL framework enabled by Wireless Energy Transfer (WET) and designed for heterogeneous networks with massive Multiple-Input Multiple-Output (MIMO) wireless backhaul. Our energy-efficiency approach is formulated as a Mixed-Integer Non-Linear Programming (MINLP) problem, where we optimize the HFL device association and manage the wireless transmitted energy. However due to its high complexity, we design a Heuristic Resource Management Algorithm, namely H2RMA, that respects energy, channel quality, and accuracy constraints, while presenting a low computational complexity. We also improve the energy consumption of the network using an efficient device scheduling scheme. Finally, we investigate device mobility and its impact on the HFL performance. Our extensive experiments confirm the high performance of the proposed resource management approach in HFL over HetNets, in terms of training loss and grid energy costs.
Automated detection of contraband items in X-ray images can significantly increase public safety, by enhancing the productivity and alleviating the mental load of security officers in airports, subways, customs/post offices, etc. The large volume and high throughput of passengers, mailed parcels, etc., during rush hours make it a Big Data analysis task. Modern computer vision algorithms relying on Deep Neural Networks (DNNs) have proven capable of undertaking this task even under resource-constrained and embedded execution scenarios, e.g., as is the case with fast, single-stage, anchor-based object detectors. This paper proposes a two-fold improvement of such algorithms for the X-ray analysis domain, introducing two complementary novelties. Firstly, more efficient anchors are obtained by hierarchical clustering the sizes of the ground-truth training set bounding boxes; thus, the resulting anchors follow a natural hierarchy aligned with the semantic structure of the data. Secondly, the default Non-Maximum Suppression (NMS) algorithm at the end of the object detection pipeline is modified to better handle occluded object detection and to reduce the number of false predictions, by inserting the Efficient Intersection over Union (E-IoU) metric into the Weighted Cluster NMS method. E-IoU provides more discriminative geometrical correlations between the candidate bounding boxes/Regions-of-Interest (RoIs). The proposed method is implemented on a common single-stage object detector (YOLOv5) and its experimental evaluation on a relevant public dataset indicates significant accuracy gains over both the baseline and competing approaches. This highlights the potential of Big Data analysis in enhancing public safety.
Scaling neural network models has delivered dramatic quality gains across ML problems. However, this scaling has increased the reliance on efficient distributed training techniques. Accordingly, as with other distributed computing scenarios, it is important to understand how will compute and communication scale relative to one another as models scale and hardware evolves? A careful study which answers this question can better guide the design of future systems which can efficiently train future large models. Accordingly, this work provides a comprehensive multi-axial (algorithmic, empirical, hardware evolution) analysis of compute vs. communication (Comp-vs.-Comm) scaling for future Transformer models on future hardware. First, our algorithmic analysis shows that compute generally enjoys an edge over communication as models scale. However, since memory capacity scales slower than compute, these trends are being stressed. Next, we quantify this edge by empirically studying how Comp-vs.-Comm scales for future models on future hardware. To avoid profiling numerous Transformer models across many setups, we extract execution regions and project costs using operator models. This allows a spectrum (hundreds) of future model/hardware scenarios to be accurately studied ($<$15% error), and reduces profiling costs by 2100$\times$. Our experiments show that communication will be a significant portion (40-75%) of runtime as models and hardware evolve. Moreover, communication which is hidden by overlapped computation in today's models often cannot be hidden in future, larger models. Overall, this work highlights the increasingly large role communication will play as models scale and discusses techniques and upcoming technologies that can help address it.
Integrated sensing and communication (ISAC) has recently merged as a promising technique to provide sensing services in future wireless networks. In the literature, numerous works have adopted a monostatic radar architecture to realize ISAC, i.e., employing the same base station (BS) to transmit the ISAC signal and receive the echo. Yet, the concurrent information transmission causes severe self-interference (SI) to the radar echo at the BS which cannot be effectively suppressed. To overcome this difficulty, in this paper, we propose a coordinated cellular network-supported multistatic radar architecture to implement ISAC. In particular, among all the coordinated BSs, we select a BS as the multistatic receiver to receive the sensing echo signal, while the other BSs act as the multistatic transmitters to collaborate with each other to facilitate cooperative ISAC. This allows us to spatially separate the ISAC signal transmission and radar echo reception, intrinsically circumventing the problem of SI. To this end, we jointly optimize the transmit and receive beamforming policy to minimize the sensing beam pattern mismatch error subject to both the communication and sensing quality-of-service requirements. The resulting non-convex optimization problem is tackled by a low-complexity alternating optimization-based suboptimal algorithm. Simulation results showed that the proposed scheme outperforms the two baseline schemes adopting conventional designs. Moreover, our results confirm that the proposed architecture is promising in achieving high-quality ISAC.
In the Internet of Things (IoT) networks, edge learning for data-driven tasks provides intelligent applications and services. As the network size becomes large, different users may generate distinct datasets. Thus, to suit multiple edge learning tasks for large-scale IoT networks, this paper performs efficient communication under the task-oriented principle by using the collaborative design of wireless resource allocation and edge learning error prediction. In particular, we start with multi-user scheduling to alleviate co-channel interference in dense networks. Then, we perform optimal power allocation in parallel for different learning tasks. Thanks to the high parallelization of the designed algorithm, extensive experimental results corroborate that the multi-user scheduling and task-oriented power allocation improve the performance of distinct edge learning tasks efficiently compared with the state-of-the-art benchmark algorithms.
It is anticipated that integrated sensing and communications (ISAC) would be one of the key enablers of next-generation wireless networks (such as beyond 5G (B5G) and 6G) for supporting a variety of emerging applications. In this paper, we provide a comprehensive review of the recent advances in ISAC systems, with a particular focus on their foundations, system design, networking aspects and ISAC applications. Furthermore, we discuss the corresponding open questions of the above that emerged in each issue. Hence, we commence with the information theory of sensing and communications (S$\&$C), followed by the information-theoretic limits of ISAC systems by shedding light on the fundamental performance metrics. Next, we discuss their clock synchronization and phase offset problems, the associated Pareto-optimal signaling strategies, as well as the associated super-resolution ISAC system design. Moreover, we envision that ISAC ushers in a paradigm shift for the future cellular networks relying on network sensing, transforming the classic cellular architecture, cross-layer resource management methods, and transmission protocols. In ISAC applications, we further highlight the security and privacy issues of wireless sensing. Finally, we close by studying the recent advances in a representative ISAC use case, namely the multi-object multi-task (MOMT) recognition problem using wireless signals.
Semantic communication represents a promising roadmap toward achieving end-to-end communication with reduced communication overhead and an enhanced user experience. The integration of semantic concepts with wireless communications presents novel challenges. This paper proposes a flexible simulation software that automatically transmits semantic segmentation map images over a communication channel. An additive white Gaussian noise (AWGN) channel using binary phase-shift keying (BPSK) modulation is considered as the channel setup. The well-known polar codes are chosen as the channel coding scheme. The popular COCO-Stuff dataset is used as an example to generate semantic map images corresponding to different signal-to-noise ratios (SNRs). To evaluate the proposed software, we have generated four small datasets, each containing a thousand semantic map samples, accompanied by comprehensive information corresponding to each image, including the polar code specifications, detailed image attributes, bit error rate (BER), and frame error rate (FER). The capacity to generate an unlimited number of semantic maps utilizing desired channel coding parameters and preferred SNR, in conjunction with the flexibility of using alternative datasets, renders our simulation software highly adaptable and transferable to a broad range of use cases.
Current 3D object detection models follow a single dataset-specific training and testing paradigm, which often faces a serious detection accuracy drop when they are directly deployed in another dataset. In this paper, we study the task of training a unified 3D detector from multiple datasets. We observe that this appears to be a challenging task, which is mainly due to that these datasets present substantial data-level differences and taxonomy-level variations caused by different LiDAR types and data acquisition standards. Inspired by such observation, we present a Uni3D which leverages a simple data-level correction operation and a designed semantic-level coupling-and-recoupling module to alleviate the unavoidable data-level and taxonomy-level differences, respectively. Our method is simple and easily combined with many 3D object detection baselines such as PV-RCNN and Voxel-RCNN, enabling them to effectively learn from multiple off-the-shelf 3D datasets to obtain more discriminative and generalizable representations. Experiments are conducted on many dataset consolidation settings including Waymo-nuScenes, nuScenes-KITTI, Waymo-KITTI, and Waymo-nuScenes-KITTI consolidations. Their results demonstrate that Uni3D exceeds a series of individual detectors trained on a single dataset, with a 1.04x parameter increase over a selected baseline detector. We expect this work will inspire the research of 3D generalization since it will push the limits of perceptual performance.
Games and simulators can be a valuable platform to execute complex multi-agent, multiplayer, imperfect information scenarios with significant parallels to military applications: multiple participants manage resources and make decisions that command assets to secure specific areas of a map or neutralize opposing forces. These characteristics have attracted the artificial intelligence (AI) community by supporting development of algorithms with complex benchmarks and the capability to rapidly iterate over new ideas. The success of artificial intelligence algorithms in real-time strategy games such as StarCraft II have also attracted the attention of the military research community aiming to explore similar techniques in military counterpart scenarios. Aiming to bridge the connection between games and military applications, this work discusses past and current efforts on how games and simulators, together with the artificial intelligence algorithms, have been adapted to simulate certain aspects of military missions and how they might impact the future battlefield. This paper also investigates how advances in virtual reality and visual augmentation systems open new possibilities in human interfaces with gaming platforms and their military parallels.