In this paper, we investigate the problem of resource allocation for fluid antenna relay (FAR) system with antenna location optimization. In the considered model, each user transmits information to a base station (BS) with help of FAR. The antenna location of the FAR is flexible and can be adapted to dynamic location distribution of the users. We formulate a sum rate maximization problem through jointly optimizing the antenna location and bandwidth allocation with meeting the minimum rate requirements, total bandwidth budget, and feasible antenna region constraints. To solve this problem, we obtain the optimal bandwidth in closed form. Based on the optimal bandwidth, the original problem is reduced to the antenna location optimization problem and an alternating algorithm is proposed. Simulation results verify the effectiveness of the proposed algorithm and the sum rate can be increased by up to 125% compared to the conventional schemes.
In this paper, we introduce a variation of the group testing problem where each test is specified by an ordered subset of items, and returns the first defective item in the specified order. We refer to this as \textit{cascaded group testing} and the goal is to identify a small set of $K$ defective items amongst a collection of size $N$, using as few tests as possible. For the adaptive testing regime, we show that a simple scheme is able to find all defective items in at most $K$ tests, which is optimal. For the non-adaptive setting, we first come up with a necessary and sufficient condition for any collection of tests to be feasible for recovering all the defectives. Using this, we are able to show that any feasible non-adaptive strategy requires at least $\Omega(K^2)$ tests. In terms of achievability, it is easy to show that a collection of $O(K^2 \log (N/K))$ randomly constructed tests is feasible. We show via carefully constructed explicit designs that one can do significantly better. We provide two simple schemes for $K = 1, 2$ which only require one and two tests respectively irrespective of the number of items $N$. Note that this is in contrast to standard binary group testing, where at least $\Omega(\log N)$ tests are required. The case of $K \ge 3$ is more challenging and here we come up with an iterative design which requires only $\text{poly}(\log \log N)$ tests.
In anytime-valid sequential inference, it is known that any admissible procedure must be based on e-processes, which are composite generalizations of test martingales that quantify the accumulated evidence against a composite null hypothesis at any arbitrary stopping time. This paper studies methods for combining e-processes constructed using different information sets (filtrations) for the same null. Although e-processes constructed in the same filtration can be combined effortlessly (e.g., by averaging), e-processes constructed in different filtrations cannot, because their validity in a coarser filtration does not translate to validity in a finer filtration. This issue arises in exchangeability tests, independence tests, and tests for comparing forecasts with lags. We first establish that a class of functions called adjusters allows us to lift e-processes from a coarser filtration into any finer filtration. We then introduce a characterization theorem for adjusters, formalizing a sense in which using adjusters is necessary. There are two major implications. First, if we have a powerful e-process in a coarsened filtration, then we readily have a powerful e-process in the original filtration. Second, when we coarsen the filtration to construct an e-process, there is an asymptotically logarithmic cost of recovering anytime-validity in the original filtration.
In the real world, data is often noisy, affecting not only the quality of features but also the accuracy of labels. Current research on mitigating label errors stems primarily from advances in deep learning, and a gap exists in exploring interpretable models, particularly those rooted in decision trees. In this study, we investigate whether ideas from deep learning loss design can be applied to improve the robustness of decision trees. In particular, we show that loss correction and symmetric losses, both standard approaches, are not effective. We argue that other directions need to be explored to improve the robustness of decision trees to label noise.
Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems. However, the proximal sampler, which exhibits much faster convergence than Langevin-based algorithms in the deterministic setting Lee et al. (2021), has yet to be explored in its stochastic variants. In this paper, we study the Stochastic Proximal Samplers (SPS) for sampling from non-log-concave distributions. We first establish a general framework for implementing stochastic proximal samplers and establish the convergence theory accordingly. We show that the convergence to the target distribution can be guaranteed as long as the second moment of the algorithm trajectory is bounded and restricted Gaussian oracles can be well approximated. We then provide two implementable variants based on Stochastic gradient Langevin dynamics (SGLD) and Metropolis-adjusted Langevin algorithm (MALA), giving rise to SPS-SGLD and SPS-MALA. We further show that SPS-SGLD and SPS-MALA can achieve $\epsilon$-sampling error in total variation (TV) distance within $\tilde{\mathcal{O}}(d\epsilon^{-2})$ and $\tilde{\mathcal{O}}(d^{1/2}\epsilon^{-2})$ gradient complexities, which outperform the best-known result by at least an $\tilde{\mathcal{O}}(d^{1/3})$ factor. This enhancement in performance is corroborated by our empirical studies on synthetic data with various dimensions, demonstrating the efficiency of our proposed algorithm.
In this paper, we lay the groundwork on the comparison of phylogenetic networks based on edge contractions and expansions as edit operations, as originally proposed by Robinson and Foulds to compare trees. We prove that these operations connect the space of all phylogenetic networks on the same set of leaves, even if we forbid contractions that create cycles. This allows to define an operational distance on this space, as the minimum number of contractions and expansions required to transform one network into another. We highlight the difference between this distance and the computation of the maximum common contraction between two networks. Given its ability to outline a common structure between them, which can provide valuable biological insights, we study the algorithmic aspects of the latter. We first prove that computing a maximum common contraction between two networks is NP-hard, even when the maximum degree, the size of the common contraction, or the number of leaves is bounded. We also provide lower bounds to the problem based on the Exponential-Time Hypothesis. Nonetheless, we do provide a polynomial-time algorithm for weakly-galled networks, a generalization of galled trees.
In this paper, we investigate the performance of ambient backscatter communication non-orthogonal multiple access (AmBC-NOMA)-assisted short packet communication for high-mobility vehicle-to-everything transmissions. In the proposed system, a roadside unit (RSU) transmits a superimposed signal to a typical NOMA user pair. Simultaneously, the backscatter device (BD) transmits its own signal towards the user pair by reflecting and modulating the RSU's superimposed signals. Due to vehicles' mobility, we consider realistic assumptions of time-selective fading and channel estimation errors. Theoretical expressions for the average block error rates (BLERs) of both users are derived. Furthermore, analysis and insights on transmit signal-to-noise ratio, vehicles' mobility, imperfect channel estimation, the reflection efficiency at the BD, and blocklength are provided. Numerical results validate the theoretical findings and reveal that the AmBC-NOMA system outperforms its orthogonal multiple access counterpart in terms of BLER performance.
We propose a novel approach to the problem of mutual information (MI) estimation via introducing a family of estimators based on normalizing flows. The estimator maps original data to the target distribution, for which MI is easier to estimate. We additionally explore the target distributions with known closed-form expressions for MI. Theoretical guarantees are provided to demonstrate that our approach yields MI estimates for the original data. Experiments with high-dimensional data are conducted to highlight the practical advantages of the proposed method.
Change point detection (CPD) and anomaly detection (AD) are essential techniques in various fields to identify abrupt changes or abnormal data instances. However, existing methods are often constrained to univariate data, face scalability challenges with large datasets due to computational demands, and experience reduced performance with high-dimensional or intricate data, as well as hidden anomalies. Furthermore, they often lack interpretability and adaptability to domain-specific knowledge, which limits their versatility across different fields. In this work, we propose a deep learning-based CPD/AD method called Probabilistic Predictive Coding (PPC) that jointly learns to encode sequential data to low dimensional latent space representations and to predict the subsequent data representations as well as the corresponding prediction uncertainties. The model parameters are optimized with maximum likelihood estimation by comparing these predictions with the true encodings. At the time of application, the true and predicted encodings are used to determine the probability of conformity, an interpretable and meaningful anomaly score. Furthermore, our approach has linear time complexity, scalability issues are prevented, and the method can easily be adjusted to a wide range of data types and intricate applications. We demonstrate the effectiveness and adaptability of our proposed method across synthetic time series experiments, image data, and real-world magnetic resonance spectroscopic imaging data.
Graph learning architectures based on the k-dimensional Weisfeiler-Leman (k-WL) hierarchy offer a theoretically well-understood expressive power. However, such architectures often fail to deliver solid predictive performance on real-world tasks, limiting their practical impact. In contrast, global attention-based models such as graph transformers demonstrate strong performance in practice, but comparing their expressive power with the k-WL hierarchy remains challenging, particularly since these architectures rely on positional or structural encodings for their expressivity and predictive performance. To address this, we show that the recently proposed Edge Transformer, a global attention model operating on node pairs instead of nodes, has at least 3-WL expressive power. Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance while not relying on positional or structural encodings. Our code is available at //github.com/luis-mueller/towards-principled-gts
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.