Intelligent reflecting surface (IRS) has emerged as a promising technique to extend the wireless signal coverage of access point (AP) and improve the communication performance cost-effectively. In order to reduce the path-loss of the cascaded user-IRS-AP channels, the IRS-integrated AP architecture has been proposed to deploy the IRSs and the antenna array of the AP within the same antenna radome. To reduce the pilot overhead for estimating all IRS-involved channels, in this paper, we propose a novel codebook-based IRS reflection design for the IRS-integrated AP to enhance the coverage performance in a given area. In particular, the codebook consisting of a small number of codewords is designed offline by employing an efficient sector division strategy based on the azimuth angle. To ensure the performance of each sector, we optimize its corresponding codeword for IRS reflection pattern to maximize the sector-min-average-effective-channel-power (SMAECP) by applying the alternating optimization (AO) and semidefinite relaxation (SDR) methods. With the designed codebook, the AP performs the IRS reflection training by sequentially applying all codewords and selects the one achieving the best communication performance for data transmission. Numerical results show that our proposed codebook design can enhance the average channel power of the whole coverage area, as compared to the system without IRS. Moreover, our proposed codebook-based IRS reflection design is shown to achieve significant performance gain over other benchmark schemes in both single-user and multi-user transmissions.
Uncertainty quantification (UQ) to detect samples with large expected errors (outliers) is applied to reactive molecular potential energy surfaces (PESs). Three methods - Ensembles, Deep Evidential Regression (DER), and Gaussian Mixture Models (GMM) - were applied to the H-transfer reaction between ${\it syn-}$Criegee and vinyl hydroxyperoxide. The results indicate that ensemble models provide the best results for detecting outliers, followed by GMM. For example, from a pool of 1000 structures with the largest uncertainty, the detection quality for outliers is $\sim 90$ \% and $\sim 50$ \%, respectively, if 25 or 1000 structures with large errors are sought. On the contrary, the limitations of the statistical assumptions of DER greatly impacted its prediction capabilities. Finally, a structure-based indicator was found to be correlated with large average error, which may help to rapidly classify new structures into those that provide an advantage for refining the neural network.
Point cloud matching, a crucial technique in computer vision, medical and robotics fields, is primarily concerned with finding correspondences between pairs of point clouds or voxels. In some practical scenarios, emphasizing local differences is crucial for accurately identifying a correct match, thereby enhancing the overall robustness and reliability of the matching process. Commonly used shape descriptors have several limitations and often fail to provide meaningful local insights on the paired geometries. In this work, we propose a new technique, based on graph Laplacian eigenmaps, to match point clouds by taking into account fine local structures. To deal with the order and sign ambiguity of Laplacian eigenmaps, we introduce a new operator, called Coupled Laplacian, that allows to easily generate aligned eigenspaces for multiple rigidly-registered geometries. We show that the similarity between those aligned high-dimensional spaces provides a locally meaningful score to match shapes. We initially evaluate the performance of the proposed technique in a point-wise manner, specifically focusing on the task of object anomaly localization using the MVTec 3D-AD dataset. Additionally, we define a new medical task, called automatic Bone Side Estimation (BSE), which we address through a global similarity score derived from coupled eigenspaces. In order to test it, we propose a benchmark collecting bone surface structures from various public datasets. Our matching technique, based on Coupled Laplacian, outperforms other methods by reaching an impressive accuracy on both tasks. The code to reproduce our experiments is publicly available at //github.com/matteo-bastico/CoupledLaplacian and in the Supplementary Code.
Stacked intelligent metasurfaces (SIM) is a revolutionary technology, which can outperform its single-layer counterparts by performing advanced signal processing relying on wave propagation. In this work, we exploit SIM to enable transmit precoding and receiver combining in holographic multiple-input multiple-output (HMIMO) communications, and we study the achievable rate by formulating a joint optimization problem of the SIM phase shifts at both sides of the transceiver and the covariance matrix of the transmitted signal. Notably, we propose its solution by means of an iterative optimization algorithm that relies on the projected gradient method, and accounts for all optimization parameters simultaneously. We also obtain the step size guaranteeing the convergence of the proposed algorithm. Simulation results provide fundamental insights such the performance improvements compared to the single-RIS counterpart and conventional MIMO system. Remarkably, the proposed algorithm results in the same achievable rate as the alternating optimization (AO) benchmark but with a less number of iterations.
Numerical solution of discrete PDEs corresponding to saddle point problems is highly relevant to physical systems such as Stokes flow. However, scaling up numerical solvers for such systems is often met with challenges in efficiency and convergence. Multigrid is an approach with excellent applicability to elliptic problems such as the Stokes equations, and can be a solution to such challenges of scalability and efficiency. The degree of success of such methods, however, is highly contingent on the design of key components of a multigrid scheme, including the hierarchy of discretizations, and the relaxation scheme used. Additionally, in many practical cases, it may be more effective to use a multigrid scheme as a preconditioner to an iterative Krylov subspace solver, as opposed to striving for maximum efficacy of the relaxation scheme in all foreseeable settings. In this paper, we propose an efficient symmetric multigrid preconditioner for the Stokes Equations on a staggered finite-difference discretization. Our contribution is focused on crafting a preconditioner that (a) is symmetric indefinite, matching the property of the Stokes system itself, (b) is appropriate for preconditioning the SQMR iterative scheme, and (c) has the requisite symmetry properties to be used in this context. In addition, our design is efficient in terms of computational cost and facilitates scaling to large domains.
Topological quantum codes, such as toric and surface codes, are excellent candidates for hardware implementation due to their robustness against errors and their local interactions between qubits. However, decoding these codes efficiently remains a challenge: existing decoders often fall short of meeting requirements such as having low computational complexity (ideally linear in the code's blocklength), low decoding latency, and low power consumption. In this paper we propose a novel bit-flipping (BF) decoder tailored for toric and surface codes. We introduce the proximity vector as a heuristic metric for flipping bits, and we develop a new subroutine for correcting a particular class of harmful degenerate errors. Our algorithm achieves linear complexity growth and it can be efficiently implemented as it only involves simple operations such as bit-wise additions, quasi-cyclic permutations and vector-matrix multiplications. The proposed decoder shows a decoding threshold of 7.5% for the 2D toric code and 7% for the rotated planar code over the binary symmetric channel.
Affine frequency division multiplexing (AFDM), tailored as a novel multicarrier technique utilizing chirp signals for high-mobility communications, exhibits marked advantages compared to traditional orthogonal frequency division multiplexing (OFDM). AFDM is based on the discrete affine Fourier transform (DAFT) with two modifiable parameters of the chirp signals, termed as the pre-chirp parameter and post-chirp parameter, respectively. These parameters can be fine-tuned to avoid overlapping channel paths with different delays or Doppler shifts, leading to performance enhancement especially for doubly dispersive channel. In this paper, we propose a novel AFDM structure with the pre-chirp index modulation (PIM) philosophy (AFDM-PIM), which can embed additional information bits into the pre-chirp parameter design for both spectral and energy efficiency enhancement. Specifically, we first demonstrate that the application of distinct pre-chirp parameters to various subcarriers in the AFDM modulation process maintains the orthogonality among these subcarriers. Then, different pre-chirp parameters are flexibly assigned to each AFDM subcarrier according to the incoming bits. By such arrangement, aside from classical phase/amplitude modulation, extra binary bits can be implicitly conveyed by the indices of selected pre-chirping parameters realizations without additional energy consumption. At the receiver, both a maximum likelihood (ML) detector and a reduced-complexity ML-minimum mean square error (ML-MMSE) detector are employed to recover the information bits. It has been shown via simulations that the proposed AFDM-PIM exhibits superior bit error rate (BER) performance compared to classical AFDM, OFDM and IM-aided OFDM algorithms.
Masked autoencoders (MAEs) have established themselves as a powerful method for unsupervised pre-training for computer vision tasks. While vanilla MAEs put equal emphasis on reconstructing the individual parts of the image, we propose to inform the reconstruction process through an attention-guided loss function. By leveraging advances in unsupervised object discovery, we obtain an attention map of the scene which we employ in the loss function to put increased emphasis on reconstructing relevant objects, thus effectively incentivizing the model to learn more object-focused representations without compromising the established masking strategy. Our evaluations show that our pre-trained models learn better latent representations than the vanilla MAE, demonstrated by improved linear probing and k-NN classification results on several benchmarks while at the same time making ViTs more robust against varying backgrounds.
Transformer-based models have significantly improved performance across a range of multimodal understanding tasks, such as visual question answering and action recognition. However, multimodal Transformers significantly suffer from a quadratic complexity of the multi-head attention with the input sequence length, especially as the number of modalities increases. To address this, we introduce Low-Cost Multimodal Transformer (LoCoMT), a novel multimodal attention mechanism that aims to reduce computational cost during training and inference with minimal performance loss. Specifically, by assigning different multimodal attention patterns to each attention head, LoCoMT can flexibly control multimodal signals and theoretically ensures a reduced computational cost compared to existing multimodal Transformer variants. Experimental results on two multimodal datasets, namely Audioset and MedVidCL demonstrate that LoCoMT not only reduces GFLOPs but also matches or even outperforms established models.
Graph Neural Networks (GNN) has demonstrated the superior performance in many challenging applications, including the few-shot learning tasks. Despite its powerful capacity to learn and generalize from few samples, GNN usually suffers from severe over-fitting and over-smoothing as the model becomes deep, which limit the model scalability. In this work, we propose a novel Attentive GNN to tackle these challenges, by incorporating a triple-attention mechanism, \ie node self-attention, neighborhood attention, and layer memory attention. We explain why the proposed attentive modules can improve GNN for few-shot learning with theoretical analysis and illustrations. Extensive experiments show that the proposed Attentive GNN outperforms the state-of-the-art GNN-based methods for few-shot learning over the mini-ImageNet and Tiered-ImageNet datasets, with both inductive and transductive settings.
Learning latent representations of nodes in graphs is an important and ubiquitous task with widespread applications such as link prediction, node classification, and graph visualization. Previous methods on graph representation learning mainly focus on static graphs, however, many real-world graphs are dynamic and evolve over time. In this paper, we present Dynamic Self-Attention Network (DySAT), a novel neural architecture that operates on dynamic graphs and learns node representations that capture both structural properties and temporal evolutionary patterns. Specifically, DySAT computes node representations by jointly employing self-attention layers along two dimensions: structural neighborhood and temporal dynamics. We conduct link prediction experiments on two classes of graphs: communication networks and bipartite rating networks. Our experimental results show that DySAT has a significant performance gain over several different state-of-the-art graph embedding baselines.