State space models (SSMs) have emerged as a powerful framework for modelling long-range dependencies in sequence data. Unlike traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs), SSMs offer a structured and stable approach to sequence modelling, leveraging principles from control theory and dynamical systems. However, a key challenge in sequence modelling is compressing long-term dependencies into a compact hidden state representation without losing critical information. In this paper, we develop a rigorous mathematical framework for understanding memory compression in selective state space models. We introduce a selective gating mechanism that dynamically filters and updates the hidden state based on input relevance, allowing for efficient memory compression. We formalize the trade-off between memory efficiency and information retention using information-theoretic tools, such as mutual information and rate-distortion theory. Our analysis provides theoretical bounds on the amount of information that can be compressed without sacrificing model performance. We also derive theorems that prove the stability and convergence of the hidden state in selective SSMs, ensuring reliable long-term memory retention. Computational complexity analysis reveals that selective SSMs offer significant improvements in memory efficiency and processing speed compared to traditional RNN-based models. Through empirical validation on sequence modelling tasks such as time-series forecasting and natural language processing, we demonstrate that selective SSMs achieve state-of-the-art performance while using less memory and computational resources.
Rate splitting multiple access (RSMA) is regarded as a crucial and powerful physical layer (PHY) paradigm for next-generation communication systems. Particularly, users employ successive interference cancellation (SIC) to decode part of the interference while treating the remainder as noise. However, conventional RSMA systems rely on fixed-position antenna arrays, limiting their ability to fully exploit spatial diversity. This constraint reduces beamforming gain and significantly impairs RSMA performance. To address this problem, we propose a movable antenna (MA)-aided RSMA scheme that allows the antennas at the base station (BS) to dynamically adjust their positions. Our objective is to maximize the system sum rate of common and private messages by jointly optimizing the MA positions, beamforming matrix, and common rate allocation. To tackle the formulated non-convex problem, we apply fractional programming (FP) and develop an efficient two-stage, coarse-to-fine-grained searching (CFGS) algorithm to obtain high-quality solutions. Numerical results demonstrate that, with optimized antenna adjustments, the MA-enabled system achieves substantial performance and reliability improvements in RSMA over fixed-position antenna setups.
Mueller matrix polarimetry captures essential information about polarized light interactions with a sample, presenting unique challenges for data augmentation in deep learning due to its distinct structure. While augmentations are an effective and affordable way to enhance dataset diversity and reduce overfitting, standard transformations like rotations and flips do not preserve the polarization properties in Mueller matrix images. To this end, we introduce a versatile simulation framework that applies physically consistent rotations and flips to Mueller matrices, tailored to maintain polarization fidelity. Our experimental results across multiple datasets reveal that conventional augmentations can lead to misleading results when applied to polarimetric data, underscoring the necessity of our physics-based approach. In our experiments, we first compare our polarization-specific augmentations against real-world captures to validate their physical consistency. We then apply these augmentations in a semantic segmentation task, achieving substantial improvements in model generalization and performance. This study underscores the necessity of physics-informed data augmentation for polarimetric imaging in deep learning (DL), paving the way for broader adoption and more robust applications across diverse research in the field. In particular, our framework unlocks the potential of DL models for polarimetric datasets with limited sample sizes. Our code implementation is available at github.com/hahnec/polar_augment.
Relation extraction as an important natural Language processing (NLP) task is to identify relations between named entities in text. Recently, graph convolutional networks over dependency trees have been widely used to capture syntactic features and achieved attractive performance. However, most existing dependency-based approaches ignore the positive influence of the words outside the dependency trees, sometimes conveying rich and useful information on relation extraction. In this paper, we propose a novel model, Entity-aware Self-attention Contextualized GCN (ESC-GCN), which efficiently incorporates syntactic structure of input sentences and semantic context of sequences. To be specific, relative position self-attention obtains the overall semantic pairwise correlation related to word position, and contextualized graph convolutional networks capture rich intra-sentence dependencies between words by adequately pruning operations. Furthermore, entity-aware attention layer dynamically selects which token is more decisive to make final relation prediction. In this way, our proposed model not only reduces the noisy impact from dependency trees, but also obtains easily-ignored entity-related semantic representation. Extensive experiments on various tasks demonstrate that our model achieves encouraging performance as compared to existing dependency-based and sequence-based models. Specially, our model excels in extracting relations between entities of long sentences.
A method is introduced for approximate marginal likelihood inference via adaptive Gaussian quadrature in mixed models with a single grouping factor. The core technical contribution is an algorithm for computing the exact gradient of the approximate log-marginal likelihood. This leads to efficient maximum likelihood via quasi-Newton optimization that is demonstrated to be faster than existing approaches based on finite-differenced gradients or derivative-free optimization. The method is specialized to Bernoulli mixed models with multivariate, correlated Gaussian random effects; here computations are performed using an inverse log-Cholesky parameterization of the Gaussian density that involves no matrix decomposition during model fitting, while Wald confidence intervals are provided for variance parameters on the original scale. Simulations give evidence of these intervals attaining nominal coverage if enough quadrature points are used, for data comprised of a large number of very small groups exhibiting large between-group heterogeneity. The Laplace approximation is well-known to give especially poor coverage and high bias for data comprised of a large number of small groups. Adaptive quadrature mitigates this, and the methods in this paper improve the computational feasibility of this more accurate method. All results may be reproduced using code available at \url{//github.com/awstringer1/aghmm-paper-code}.
Autonomous reconfigurable intelligent surface (RIS) offers the potential to simplify deployment by reducing the need for real-time remote control between a base station (BS) and an RIS. However, we highlight two major challenges posed by autonomy. The first is implementation complexity, as autonomy requires hybrid RISs (HRISs) equipped with additional on-board hardware to monitor the propagation environment and conduct local channel estimation (CHEST), a process known as probing. The second challenge, termed probe distortion, reflects a form of the observer effect: during probing, an HRIS can inadvertently alter the propagation environment, potentially disrupting the operations of other communicating devices. While implementation complexity has been extensively studied, probe distortion remains largely unexplored. To further assess the potential of autonomous RISs, this paper comprehensively and pragmatically studies fundamental trade-offs posed by these challenges. We examine the robustness of an HRIS-assisted massive multiple-input multiple-output (mMIMO) system under minimal design choices that reflect the essential elements and stringent conditions, including (a) two extremes of implementation complexity realized through minimalist operational designs of two HRIS hardware architectures, and (b) an oblivious BS that fully embraces probe distortion. To make our analysis possible, we propose a physical-layer orchestration framework that aligns HRIS and mMIMO operations. We provide empirical evidence showing that autonomous RIS holds promise even under these strict conditions and propose new research directions, particularly for advancing the understanding of probe distortion.
In most multiple-input multiple-output (MIMO) communication systems, antennas are spaced at least half a wavelength apart to reduce mutual coupling. In this configuration, the maximum array gain is equal to the number of antennas. However, when the antenna spacing is significantly reduced, the array gain of a compact array can become proportional to the square of the number of antennas, greatly exceeding that of traditional MIMO systems. Achieving this "superdirectivity" requires complex calculations of the excitation coefficients (beamforming vector), which is a challenging task. In this paper, we address this problem with a novel double coupling-based superdirective beamforming method. In particular, we categorize the antenna coupling effects to impedance coupling and field coupling. By characterizing these two coupling in model, we derive the beamforming vector for superdirective arrays. We prove that the field coupling matrix has the unique solution for an antenna array, and itself has the ability to fully characterize the distorted coupling field. Based on this proven theorem, we propose a method that accurately calculates the coupling matrix using only a number of angle sampling points on the order of the number of antennas. Moreover, a prototype of an independently-controlled superdirective antenna array is developed. Full-wave electromagnetic simulations and real-world experiments validate the effectiveness of our proposed approaches, and superdirectivity is achieved in reality by a compact array with 4 and 8 dipole antennas.
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.
Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
Graph Neural Networks (GNNs) have recently been used for node and graph classification tasks with great success, but GNNs model dependencies among the attributes of nearby neighboring nodes rather than dependencies among observed node labels. In this work, we consider the task of inductive node classification using GNNs in supervised and semi-supervised settings, with the goal of incorporating label dependencies. Because current GNNs are not universal (i.e., most-expressive) graph representations, we propose a general collective learning approach to increase the representation power of any existing GNN. Our framework combines ideas from collective classification with self-supervised learning, and uses a Monte Carlo approach to sampling embeddings for inductive learning across graphs. We evaluate performance on five real-world network datasets and demonstrate consistent, significant improvement in node classification accuracy, for a variety of state-of-the-art GNNs.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.