The next sixth generation (6G) networks are envisioned to integrate sensing and communications in a single system, thus greatly improving spectrum utilization and reducing hardware costs. Low earth orbit (LEO) satellite communications combined with massive multiple-input multiple-output (MIMO) technology holds significant promise in offering ubiquitous and seamless connectivity with high data rates. Existing integrated sensing and communications (ISAC) studies mainly focus on terrestrial systems, while operating ISAC in massive MIMO LEO satellite systems is promising to provide high-capacity communication and flexible sensing ubiquitously. In this paper, we first give an overview of LEO satellite systems and ISAC and consider adopting ISAC in the massive MIMO LEO satellite systems. Then, the recent research advances are presented. A discussion on related challenges and key enabling technologies follows. Finally, we point out some open issues and promising research directions.
Deep neural networks (DNNs) were shown to facilitate the operation of uplink multiple-input multiple-output (MIMO) receivers, with emerging architectures augmenting modules of classic receiver processing. Current designs consider static DNNs, whose architecture is fixed and weights are pre-trained. This induces a notable challenge, as the resulting MIMO receiver is suitable for a given configuration, i.e., channel distribution and number of users, while in practice these parameters change frequently with network variations and users leaving and joining the network. In this work, we tackle this core challenge of DNN-aided MIMO receivers. We build upon the concept of hypernetworks, augmenting the receiver with a pre-trained deep model whose purpose is to update the weights of the DNN-aided receiver upon instantaneous channel variations. We design our hypernetwork to augment modular deep receivers, leveraging their modularity to have the hypernetwork adapt not only the weights, but also the architecture. Our modular hypernetwork leads to a DNN-aided receiver whose architecture and resulting complexity adapts to the number of users, in addition to channel variations, without retraining. Our numerical studies demonstrate superior error-rate performance of modular hypernetworks in time-varying channels compared to static pre-trained receivers, while providing rapid adaptivity and scalability to network variations.
Despite significant progress made in the last decade, deep neural network (DNN) based speech enhancement (SE) still faces the challenge of notable degradation in the quality of recovered speech under low signal-to-noise ratio (SNR) conditions. In this letter, we propose an SNR-progressive speech enhancement model with harmonic compensation for low-SNR SE. Reliable pitch estimation is obtained from the intermediate output, which has the benefit of retaining more speech components than the coarse estimate while possessing a significant higher SNR than the input noisy speech. An effective harmonic compensation mechanism is introduced for better harmonic recovery. Extensive ex-periments demonstrate the advantage of our proposed model. A multi-modal speech extraction system based on the proposed backbone model ranks first in the ICASSP 2024 MISP Challenge: //mispchallenge.github.io/mispchallenge2023/index.html.
To address the limitation in multimodal emotion recognition (MER) performance arising from inter-modal information fusion, we propose a novel MER framework based on multitask learning where fusion occurs after alignment, called Foal-Net. The framework is designed to enhance the effectiveness of modality fusion and includes two auxiliary tasks: audio-video emotion alignment (AVEL) and cross-modal emotion label matching (MEM). First, AVEL achieves alignment of emotional information in audio-video representations through contrastive learning. Then, a modal fusion network integrates the aligned features. Meanwhile, MEM assesses whether the emotions of the current sample pair are the same, providing assistance for modal information fusion and guiding the model to focus more on emotional information. The experimental results conducted on IEMOCAP corpus show that Foal-Net outperforms the state-of-the-art methods and emotion alignment is necessary before modal fusion.
The capabilities of transformer networks such as ChatGPT and other Large Language Models (LLMs) have captured the world's attention. The crucial computational mechanism underlying their performance relies on transforming a complete input sequence - for example, all the words in a sentence - into a long "encoding vector" that allows transformers to learn long-range temporal dependencies in naturalistic sequences. Specifically, "self-attention" applied to this encoding vector enhances temporal context in transformers by computing associations between pairs of words in the input sequence. We suggest that waves of neural activity traveling across single cortical areas or multiple regions at the whole-brain scale could implement a similar encoding principle. By encapsulating recent input history into a single spatial pattern at each moment in time, cortical waves may enable temporal context to be extracted from sequences of sensory inputs, the same computational principle used in transformers.
In this article, we present the bivariate and multivariate functional Moran's I statistics and multivariate functional areal spatial principal component analysis (mfasPCA). These methods are the first of their kind in the field of multivariate areal spatial functional data analysis. The multivariate functional Moran's I statistic is employed to assess spatial autocorrelation, while mfasPCA is utilized for dimension reduction in both univariate and multivariate functional areal data. Through simulation studies and real-world examples, we demonstrate that the multivariate functional Moran's I statistic and mfasPCA are powerful tools for evaluating spatial autocorrelation in univariate and multivariate functional areal data analysis.
In recent years, large language models (LLMs) have seen rapid advancements, significantly impacting various fields such as natural language processing, and software engineering. These LLMs, exemplified by OpenAI's ChatGPT, have revolutionized the way we approach language understanding and generation tasks. However, in contrast to traditional software development practices, LLM development introduces new challenges for AI developers in design, implementation, and deployment. These challenges span different areas (such as prompts, APIs, and plugins), requiring developers to navigate unique methodologies and considerations specific to LLM development. Despite the profound influence of LLMs, to the best of our knowledge, these challenges have not been thoroughly investigated in previous empirical studies. To fill this gap, we present the first comprehensive study on understanding the challenges faced by LLM developers. Specifically, we crawl and analyze 29,057 relevant questions from a popular OpenAI developer forum. We first examine their popularity and difficulty. After manually analyzing 2,364 sampled questions, we construct a taxonomy of challenges faced by LLM developers. Based on this taxonomy, we summarize a set of findings and actionable implications for LLM-related stakeholders, including developers and providers (especially the OpenAI organization).
The future sixth-generation (6G) of wireless networks is expected to surpass its predecessors by offering ubiquitous coverage through integrated air-ground facility deployments in both communication and computing domains. In this network, aerial facilities, such as unmanned aerial vehicles (UAVs), conduct artificial intelligence (AI) computations based on multi-modal data to support diverse applications including surveillance and environment construction. However, these multi-domain inference and content generation tasks require large AI models, demanding powerful computing capabilities, thus posing significant challenges for UAVs. To tackle this problem, we propose an integrated edge-cloud model evolution framework, where UAVs serve as edge nodes for data collection and edge model computation. Through wireless channels, UAVs collaborate with ground cloud servers, providing cloud model computation and model updating for edge UAVs. With limited wireless communication bandwidth, the proposed framework faces the challenge of information exchange scheduling between the edge UAVs and the cloud server. To tackle this, we present joint task allocation, transmission resource allocation, transmission data quantization design, and edge model update design to enhance the inference accuracy of the integrated air-ground edge-cloud model evolution framework by mean average precision (mAP) maximization. A closed-form lower bound on the mAP of the proposed framework is derived, and the solution to the mAP maximization problem is optimized accordingly. Simulations, based on results from vision-based classification experiments, consistently demonstrate that the mAP of the proposed framework outperforms both a centralized cloud model framework and a distributed edge model framework across various communication bandwidths and data sizes.
Satellite Communications (SatCom) are a backbone of worldwide development. In contrast with the past, when the GEO satellites were the only means for such connectivity, nowadays the multi-orbital connectivity is emerging, especially with the use of satellite constellations. Simultaneously, SatCom enabled the so-called In-Flight Connectivity, while with the advent of 5G-NTN, the development of this market is being accelerated. However, there are still various missing points before such a technology becomes mainstream, especially in the case of Rotary Wing Aircraft (RWA). Indeed, due to their particular characteristics, such as the low altitude flights and the blade interference, there are still open challenges. In this work, an End-to-End (E2E) analysis for the performance of SatCom under 5G-NTN for manned and unmanned RWA is performed. Various scenarios are examined, and related requirements are shown. The effects of blades and other characteristics of the RWA are established, and simulations for these cases are developed. Results along with related discussion are presented, while future directions for development are suggested. This work is part of the ESA ACROSS-AIR project.
Deep neural networks (DNNs) have significantly boosted the performance of many challenging tasks. Despite the great development, DNNs have also exposed their vulnerability. Recent studies have shown that adversaries can manipulate the predictions of DNNs by adding a universal adversarial perturbation (UAP) to benign samples. On the other hand, increasing efforts have been made to help users understand and explain the inner working of DNNs by highlighting the most informative parts (i.e., attribution maps) of samples with respect to their predictions. Moreover, we first empirically find that such attribution maps between benign and adversarial examples have a significant discrepancy, which has the potential to detect universal adversarial perturbations for defending against adversarial attacks. This finding motivates us to further investigate a new research problem: whether there exist universal adversarial perturbations that are able to jointly attack DNNs classifier and its interpretation with malicious desires. It is challenging to give an explicit answer since these two objectives are seemingly conflicting. In this paper, we propose a novel attacking framework to generate joint universal adversarial perturbations (JUAP), which can fool the DNNs model and misguide the inspection from interpreters simultaneously. Comprehensive experiments on various datasets demonstrate the effectiveness of the proposed method JUAP for joint attacks. To the best of our knowledge, this is the first effort to study UAP for jointly attacking both DNNs and interpretations.
We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.