Orthogonal time sequency multiplexing (OTSM) has been recently proposed as a single-carrier (SC) waveform offering similar bit error rate (BER) to multi-carrier orthogonal time frequency space (OTFS) modulation in doubly-spread channels under high mobilities; however, with much lower complexity making OTSM a promising candidate for low-power millimeter-wave (mmWave) vehicular communications in 6G wireless networks. In this paper, the performance of OTSM-based homodyne transceiver is explored under hardware impairments (HIs) including in-phase and quadrature imbalance (IQI), direct current offset (DCO), phase noise, power amplifier non-linearity, carrier frequency offset, and synchronization timing offset. First, the discrete-time baseband signal model is obtained in vector form under the mentioned HIs. Then, the system input-output relations are derived in time, delay-time, and delay-sequency (DS) domains in which the parameters of HIs are incorporated. Analytical studies demonstrate that noise stays white Gaussian and effective channel matrix is sparse in the DS domain under HIs. Also, DCO appears as a DC signal at receiver interfering with only the zero sequency over all delay taps in the DS domain; however, IQI redounds to self-conjugated fully-overlapping sequency interference. Simulation results reveal the fact that with no HI compensation (HIC), not only OTSM outperforms plain SC waveform but it performs close to uncompensated OTFS system; however, HIC is essentially needed for OTSM systems operating in mmWave and beyond frequency bands.
Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute. For example, SoViT-400m/14 achieves 90.3% fine-tuning accuracy on ILSRCV2012, surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identical settings, with also less than half the inference cost. We conduct a thorough evaluation across multiple tasks, such as image classification, captioning, VQA and zero-shot transfer, demonstrating the effectiveness of our model across a broad range of domains and identifying limitations. Overall, our findings challenge the prevailing approach of blindly scaling up vision models and pave a path for a more informed scaling.
Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text and/or automatic speech recognition system (ASR) features, to classify speech as device-directed or otherwise, and often have to contend with one or more of these modalities being unavailable when deployed in real-world settings. In this paper, we investigate fusion schemes for DDSD systems that can be made more robust to missing modalities. Concurrently, we study the use of non-verbal cues, specifically prosody features, in addition to verbal cues for DDSD. We present different approaches to combine scores and embeddings from prosody with the corresponding verbal cues, finding that prosody improves DDSD performance by upto 8.5% in terms of false acceptance rate (FA) at a given fixed operating point via non-linear intermediate fusion, while our use of modality dropout techniques improves the performance of these models by 7.4% in terms of FA when evaluated with missing modalities during inference time.
We present a fast adaptive method for the evaluation of heat potentials, which plays a key role in the integral equation approach for the solution of the heat equation, especially in a non-stationary domain. The algorithm utilizes a sum-of-exponential based fast Gauss transform that evaluates the convolution of a Gaussian with either discrete or continuous volume distributions. The latest implementation of the algorithm allows for both periodic and free space boundary conditions. The history dependence is overcome by splitting the heat potentials into a smooth history part and a singular local part. We discuss the resolution of the history part on an adaptive volume grid in detail, providing sharp estimates that allow for the construction of an optimal grid, justifying the efficiency of the bootstrapping scheme. While the discussion in this paper is restricted to one spatial dimension, the generalization to two and three dimensions is straightforward. The performance of the algorithm is illustrated via several numerical examples.
Graph neural networks (GNNs) are widely used for modeling complex interactions between entities represented as vertices of a graph. Despite recent efforts to theoretically analyze the expressive power of GNNs, a formal characterization of their ability to model interactions is lacking. The current paper aims to address this gap. Formalizing strength of interactions through an established measure known as separation rank, we quantify the ability of certain GNNs to model interaction between a given subset of vertices and its complement, i.e. between the sides of a given partition of input vertices. Our results reveal that the ability to model interaction is primarily determined by the partition's walk index -- a graph-theoretical characteristic defined by the number of walks originating from the boundary of the partition. Experiments with common GNN architectures corroborate this finding. As a practical application of our theory, we design an edge sparsification algorithm named Walk Index Sparsification (WIS), which preserves the ability of a GNN to model interactions when input edges are removed. WIS is simple, computationally efficient, and in our experiments has markedly outperformed alternative methods in terms of induced prediction accuracy. More broadly, it showcases the potential of improving GNNs by theoretically analyzing the interactions they can model.
We investigate high-order Convolution Quadratures methods for the solution of the wave equation in unbounded domains in two dimensions that rely on Nystr\"om discretizations for the solution of the ensemble of associated Laplace domain modified Helmholtz problems. We consider two classes of CQ discretizations, one based on linear multistep methods and the other based on Runge-Kutta methods, in conjunction with Nystr\"om discretizations based on Alpert and QBX quadratures of Boundary Integral Equation (BIE) formulations of the Laplace domain Helmholtz problems with complex wavenumbers. We present a variety of accuracy tests that showcase the high-order in time convergence (up to and including fifth order) that the Nystr\"om CQ discretizations are capable of delivering for a variety of two dimensional scatterers and types of boundary conditions.
We consider transformer encoders with hard attention (in which all attention is focused on exactly one position) and strict future masking (in which each position only attends to positions strictly to its left), and prove that the class of languages recognized by these networks is exactly the star-free languages. Adding position embeddings increases the class of recognized languages to other well-studied classes. A key technique in these proofs is Boolean RASP, a variant of RASP that is restricted to Boolean values. Via the star-free languages, we relate transformers to first-order logic, temporal logic, and algebraic automata theory.
For multi-transmission rate environments, access point (AP) connection methods have been proposed for maximizing system throughput, which is the throughput of an entire system, on the basis of the cooperative behavior of users. These methods derive optimal positions for the cooperative behavior of users, which means that new users move to improve the system throughput when connecting to an AP. However, the conventional method only considers the transmission rate of new users and does not consider existing users, even though it is necessary to consider the transmission rate of all users to improve system throughput. In addition, these method do not take into account the frequency of interference between users. In this paper, we propose an AP connection method which maximizes system throughput by considering the interference between users and the initial position of all users. In addition, our proposed method can improve system throughput by about 6% at most compared to conventional methods.
Coded caching (CC) can substantially enhance network performance by leveraging memory as an additional communication resource. However, the use of CC is challenging in various practical applications due to dynamic user behavior. The existing solutions, based on shared caching, cannot directly handle all scenarios where users freely enter and depart the network at any time as they are constrained by specific conditions on network parameters. This paper proposes a universally applicable shared-caching scheme for dynamic setups without any restriction on network parameters. The closed-form expressions for the achievable degrees of freedom (DoF) are computed for the resulting generalized scheme, and are shown to achieve the existing optimal bounds of the shared-cache model. Furthermore, a successive-interference-cancellation-free extension based on a fast iterative optimized beamformer design is devised to optimize the use of excess spatial dimensions freed by cache-aided interference cancellation. Extensive numerical experiments are carried out to assess the performance of the proposed scheme. In particular, the results demonstrate that while a dynamic setup may achieve a DoF substantially lower than the optimal DoF of shared caching, our proposed scheme significantly improves the performance at the finite signal-to-noise ratio compared to unicasting, which only benefits from the local caching gain.
We devise a version of Linear Temporal Logic (LTL) on a denotational domain of streams. We investigate this logic in terms of domain theory, (point-free) topology and geometric logic. This yields the first steps toward an extension of the "Domain Theory in Logical Form" paradigm to temporal liveness properties. We show that the negation-free formulae of LTL induce sober subspaces of streams, but that this is in general not the case in presence of negation. We propose a direct, inductive, translation of negation-free LTL to geometric logic. This translation reflects the approximations used to compute the usual fixpoint representations of LTL modalities. As a motivating example, we handle a natural input-output specification for the usual filter function on streams.
Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.