The evolution of wireless communications has been significantly influenced by remarkable advancements in multiple access (MA) technologies over the past five decades, shaping the landscape of modern connectivity. Within this context, a comprehensive tutorial review is presented, focusing on representative MA techniques developed over the past 50 years. The following areas are explored: i) The foundational principles and information-theoretic capacity limits of power-domain non-orthogonal multiple access (NOMA) are characterized, along with its extension to multiple-input multiple-output (MIMO)-NOMA. ii) Several MA transmission schemes exploiting the spatial domain are investigated, encompassing both conventional space-division multiple access (SDMA)/MIMO-NOMA systems and near-field MA systems utilizing spherical-wave propagation models. iii) The application of NOMA to integrated sensing and communications (ISAC) systems is studied. This includes an introduction to typical NOMA-based downlink/uplink ISAC frameworks, followed by an evaluation of their performance limits using a mutual information (MI)-based analytical framework. iv) Major issues and research opportunities associated with the integration of MA with other emerging technologies are identified to facilitate MA in next-generation networks, i.e., next-generation multiple access (NGMA). Throughout the paper, promising directions are highlighted to inspire future research endeavors in the realm of MA and NGMA.
2D image understanding is a complex problem within computer vision, but it holds the key to providing human-level scene comprehension. It goes further than identifying the objects in an image, and instead, it attempts to understand the scene. Solutions to this problem form the underpinning of a range of tasks, including image captioning, visual question answering (VQA), and image retrieval. Graphs provide a natural way to represent the relational arrangement between objects in an image, and thus, in recent years graph neural networks (GNNs) have become a standard component of many 2D image understanding pipelines, becoming a core architectural component, especially in the VQA group of tasks. In this survey, we review this rapidly evolving field and we provide a taxonomy of graph types used in 2D image understanding approaches, a comprehensive list of the GNN models used in this domain, and a roadmap of future potential developments. To the best of our knowledge, this is the first comprehensive survey that covers image captioning, visual question answering, and image retrieval techniques that focus on using GNNs as the main part of their architecture.
The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images. However, these navigation models often handle only a single input modality as the target. With the progress achieved so far, it is time to move towards universal navigation models capable of handling various goal types, enabling more effective user interaction with robots. To facilitate this goal, we propose GOAT-Bench, a benchmark for the universal navigation task referred to as GO to AnyThing (GOAT). In this task, the agent is directed to navigate to a sequence of targets specified by the category name, language description, or image in an open-vocabulary fashion. We benchmark monolithic RL and modular methods on the GOAT task, analyzing their performance across modalities, the role of explicit and implicit scene memories, their robustness to noise in goal specifications, and the impact of memory in lifelong scenarios.
In-band network telemetry (INT), empowered by programmable dataplanes such as P4, comprises a viable approach to network monitoring and telemetry analysis. However, P4-INT as well as other existing frameworks for INT yield a substantial transmission overhead, which grows linearly with the number of hops and the number of telemetry values. To address this issue, we present a deterministic and a probabilistic technique for lightweight INT, termed as DLINT and PLINT,respectively. In particular, DLINT exercises per-flow aggregation by spreading the telemetry values across the packets of a flow. DLINT relies on switch coordination through the use of per-flow telemetry states, maintained within P4 switches. Furthermore, DLINT utilizes Bloom Filters (BF) in order to compress the state lookup tables within P4 switches. On the other hand, PLINT employs a probabilistic approach based on reservoir sampling. PLINT essentially empowers every INT node to insert telemetry values with equal probability within each packet. Our evaluation results corroborate that both proposed techniques alleviate the transmission overhead of P4-INT, while maintaining a high degree of monitoring accuracy. In addition, we perform a comparative evaluation between DLINT and PLINT. DLINT is more effective in conveying path traces to the telemetry server, whereas PLINT detects more promptly path updates exploiting its more efficient INT header space utilization
Average consensus is essential for multi-agent systems to achieve specific functions and is widely used in network control, information fusion, etc. In conventional average consensus algorithms, all agents reach an agreement by individual calculations and sharing information with their respective neighbors. Nevertheless, the information interactions that occur in the communication network may make sensitive information be revealed. In this paper, we develop a new privacy-preserving average consensus method for unbalanced digraphs. Specifically, we ensure privacy preservation by carefully embedding randomness in mixing weights to confuse communications and introducing an extra auxiliary parameter to mask the state-updated rule in the initial several iterations. In parallel, we exploit the intrinsic robustness of consensus dynamics to guarantee that the average consensus is precisely achieved. Theoretical results demonstrate that the designed algorithms can converge linearly to the exact average consensus value and can guarantee privacy preservation of agents against both honest-but-curious and eavesdropping attacks. The designed algorithms are fundamentally different compared to differential privacy based algorithms that enable privacy preservation via sacrificing consensus performance. Finally, numerical experiments validate the correctness of the theoretical findings.
The End-to-end (E2E) learning-based approach has great potential to reshape the existing communication systems by replacing the transceivers with deep neural networks. To this end, the E2E learning approach needs to assume the availability of prior channel information to mathematically formulate a differentiable channel layer for the backpropagation (BP) of the error gradients, thereby jointly optimizing the transmitter and the receiver. However, accurate and instantaneous channel state information is hardly obtained in practical wireless communication scenarios. Moreover, the existing E2E learning-based solutions exhibit limited performance in data transmissions with large block lengths. In this article, these practical issues are addressed by our proposed deep deterministic policy gradient-based E2E communication system. In particular, the proposed solution utilizes a reward feedback mechanism to train both the transmitter and the receiver, which alleviates the information loss of error gradients during BP. In addition, a convolutional neural network (CNN)-based architecture is developed to mitigate the curse of dimensionality problem when transmitting messages with large block lengths. Extensive simulations then demonstrate that our proposed solution can not only jointly train the transmitter and the receiver simultaneously without requiring the prior channel knowledge but also can obtain significant performance improvement on block error rate compared to state-of-the-art solutions.
Incorporating language comprehension into robotic operations unlocks significant advancements in robotics, but also presents distinct challenges, particularly in executing spatially oriented tasks like pattern formation. This paper introduces ZeroCAP, a novel system that integrates large language models with multi-robot systems for zero-shot context aware pattern formation. Grounded in the principles of language-conditioned robotics, ZeroCAP leverages the interpretative power of language models to translate natural language instructions into actionable robotic configurations. This approach combines the synergy of vision-language models, cutting-edge segmentation techniques and shape descriptors, enabling the realization of complex, context-driven pattern formations in the realm of multi robot coordination. Through extensive experiments, we demonstrate the systems proficiency in executing complex context aware pattern formations across a spectrum of tasks, from surrounding and caging objects to infilling regions. This not only validates the system's capability to interpret and implement intricate context-driven tasks but also underscores its adaptability and effectiveness across varied environments and scenarios. More details about this work are available at: //sites.google.com/view/zerocap/home
Reconfigurable intelligent surface (RIS) is a promising technique to improve the performance of future wireless communication systems at low energy consumption. To reap the potential benefits of RIS-aided beamforming, it is vital to enhance the accuracy of channel estimation. In this paper, we consider an RIS-aided multiuser system with non-ideal reflecting elements, each of which has a phase-dependent reflecting amplitude, and we aim to minimize the mean-squared error (MSE) of the channel estimation by jointly optimizing the training signals at the user equipments (UEs) and the reflection pattern at the RIS. As examples the least squares (LS) and linear minimum MSE (LMMSE) estimators are considered. The considered problems do not admit simple solution mainly due to the complicated constraints pertaining to the non-ideal RIS reflecting elements. As far as the LS criterion is concerned, we tackle this difficulty by first proving the optimality of orthogonal training symbols and then propose a majorization-minimization (MM)-based iterative method to design the reflection pattern, where a semi-closed form solution is obtained in each iteration. As for the LMMSE criterion, we address the joint training and reflection pattern optimization problem with an MM-based alternating algorithm, where a closed-form solution to the training symbols and a semi-closed form solution to the RIS reflecting coefficients are derived, respectively. Furthermore, an acceleration scheme is proposed to improve the convergence rate of the proposed MM algorithms. Finally, simulation results demonstrate the performance advantages of our proposed joint training and reflection pattern designs.
As the evolution of wireless communication progresses towards 6G networks, extreme bandwidth communication (EBC) emerges as a key enabler to meet the ambitious key performance indicator set for this next-generation technology. 6G aims for peak data rates of 1 Tb/s, peak spectral efficiency of 60 b/s/Hz, maximum bandwidth of 100 GHz, and mobility support up to 1000 km/h, while maintaining a high level of security. The capability of 6G to manage enormous data volumes introduces heightened security vulnerabilities, such as jamming attacks, highlighting the critical need for in-depth research into jamming in EBC. Understanding these attacks is vital for developing robust countermeasures, ensuring 6G networks can maintain their integrity and reliability amidst these advanced threats. Recognizing the paramount importance of security in 6G applications, this survey paper explores prevalent jamming attacks and the corresponding countermeasures in EBC technologies such as millimeter wave, terahertz, free-space optical, and visible light communications. By comprehensively reviewing the literature on jamming in EBC, this survey paper aims to provide a valuable resource for researchers, engineers, and policymakers involved in the development and deployment of 6G networks. Understanding the nuances of jamming in different EBC technologies is essential for devising robust security mechanisms and ensuring the success of 6G communication systems in the face of emerging threats.
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution, but can significantly fail otherwise. Therefore, eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models. Conventional methods assume either the known heterogeneity of training data (e.g. domain labels) or the approximately equal capacities of different domains. In this paper, we consider a more challenging case where neither of the above assumptions holds. We propose to address this problem by removing the dependencies between features via learning weights for training samples, which helps deep models get rid of spurious correlations and, in turn, concentrate more on the true connection between discriminative features and labels. Extensive experiments clearly demonstrate the effectiveness of our method on multiple distribution generalization benchmarks compared with state-of-the-art counterparts. Through extensive experiments on distribution generalization benchmarks including PACS, VLCS, MNIST-M, and NICO, we show the effectiveness of our method compared with state-of-the-art counterparts.