The uplink sum-throughput of distributed massive multiple-input-multiple-output (mMIMO) networks depends majorly on Access point (AP)-User Equipment (UE) association and power control. The AP-UE association and power control both are important problems in their own right in distributed mMIMO networks to improve scalability and reduce front-haul load of the network, and to enhance the system performance by mitigating the interference and boosting the desired signals, respectively. Unlike previous studies, which focused primarily on addressing these two problems separately, this work addresses the uplink sum-throughput maximization problem in distributed mMIMO networks by solving the joint AP-UE association and power control problem, while maintaining Quality-of-Service (QoS) requirements for each UE. To improve scalability, we present an l1-penalty function that delicately balances the trade-off between spectral efficiency (SE) and front-haul signaling load. Our proposed methodology leverages fractional programming, Lagrangian dual formation, and penalty functions to provide an elegant and effective iterative solution with guaranteed convergence. Extensive numerical simulations validate the efficacy of the proposed technique for maximizing sum-throughput while considering the joint AP-UE association and power control problem, demonstrating its superiority over approaches that address these problems individually. Furthermore, the results show that the introduced penalty function can help us effectively control the maximum front-haul load.
Achieving high-quality wireless interactive Extended Reality (XR) will require multi-gigabit throughput at extremely low latency. The Millimeter-Wave (mmWave) frequency bands, between 24 and 300GHz, can achieve such extreme performance. However, maintaining a consistently high Quality of Experience with highly mobile users is challenging, as mmWave communications are inherently directional. In this work, we present and evaluate an end-to-end approach to such a mmWave-based mobile XR system. We perform a highly realistic simulation of the system, incorporating accurate XR data traffic, detailed mmWave propagation models and actual user motion. We evaluate the impact of the beamforming strategy and frequency on the overall performance. In addition, we provide the first system-level evaluation of the CoVRage algorithm, a proactive and spatially aware user-side beamforming approach designed specifically for highly mobile XR environments.
We study channel estimation for a beyond diagonal reconfigurable intelligent surface (BD-RIS) aided multiple input single output system. We first describe the channel estimation strategy based on the least square (LS) method, derive the mean square error (MSE) of the LS estimator, and formulate the BD-RIS design problem that minimizes the estimation MSE with unique constraints induced by group-connected architectures of BD-RIS. Then, we propose an efficient BD-RIS design which theoretically guarantees to achieve the MSE lower bound. Finally, we provide simulation results to verify the effectiveness of the proposed channel estimation scheme.
This work introduces Neural Elevations Models (NEMos), which adapt Neural Radiance Fields to a 2.5D continuous and differentiable terrain model. In contrast to traditional terrain representations such as digital elevation models, NEMos can be readily generated from imagery, a low-cost data source, and provide a lightweight representation of terrain through an implicit continuous and differentiable height field. We propose a novel method for jointly training a height field and radiance field within a NeRF framework, leveraging quantile regression. Additionally, we introduce a path planning algorithm that performs gradient-based optimization of a continuous cost function for minimizing distance, slope changes, and control effort, enabled by differentiability of the height field. We perform experiments on simulated and real-world terrain imagery, demonstrating NEMos ability to generate high-quality reconstructions and produce smoother paths compared to discrete path planning methods. Future work will explore the incorporation of features and semantics into the height field, creating a generalized terrain model.
Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language model alignment. In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy. Our approach, dubbed \textit{Self-play Probabilistic Preference Optimization} (SPPO), approximates the Nash equilibrium through iterative policy updates and enjoys a theoretical convergence guarantee. Our method can effectively increase the log-likelihood of the chosen response and decrease that of the rejected response, which cannot be trivially achieved by symmetric pairwise loss such as Direct Preference Optimization (DPO) and Identity Preference Optimization (IPO). In our experiments, using only 60k prompts (without responses) from the UltraFeedback dataset and without any prompt augmentation, by leveraging a pre-trained preference model PairRM with only 0.4B parameters, SPPO can obtain a model from fine-tuning Mistral-7B-Instruct-v0.2 that achieves the state-of-the-art length-controlled win-rate of 28.53\% against GPT-4-Turbo on AlpacaEval 2.0. It also outperforms the (iterative) DPO and IPO on MT-Bench and the Open LLM Leaderboard. Notably, the strong performance of SPPO is achieved without additional external supervision (e.g., responses, preferences, etc.) from GPT-4 or other stronger language models.
We present EGN, a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges to an $\epsilon$-stationary point at a linear rate. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, and SGN optimizers across various supervised and reinforcement learning tasks.
A major open problem in proof complexity is to show that random 3-CNFs with linear number of clauses require super-polynomial size refutations in bounded depth Frege. We make a first step towards this question by showing a super-linear lower bound: for every $k$, there exists $\epsilon > 0$ such that any depth-$k$ Frege refutation of a random $n$-variable 3-CNF with $\Theta(n)$ clauses has $\Omega(n^{1 + \epsilon})$ steps w.h.p. Our proof involves a novel adaptation of the deterministic restriction technique of Chaudhuri and Radhakrishnan (STOC'96).
High-level synthesis, source-to-source compilers, and various Design Space Exploration techniques for pragma insertion have significantly improved the Quality of Results of generated designs. These tools offer benefits such as reduced development time and enhanced performance. However, achieving high-quality results often requires additional manual code transformations and tiling selections, which are typically performed separately or as pre-processing steps. Although DSE techniques enable code transformation upfront, the vastness of the search space often limits the exploration of all possible code transformations, making it challenging to determine which transformations are necessary. Additionally, ensuring correctness remains challenging, especially for complex transformations and optimizations. To tackle this obstacle, we first propose a comprehensive framework leveraging HLS compilers. Our system streamlines code transformation, pragma insertion, and tiles size selection for on-chip data caching through a unified optimization problem, aiming to enhance parallelization, particularly beneficial for computation-bound kernels. Them employing a novel Non-Linear Programming (NLP) approach, we simultaneously ascertain transformations, pragmas, and tile sizes, focusing on regular loop-based kernels. Our evaluation demonstrates that our framework adeptly identifies the appropriate transformations, including scenarios where no transformation is necessary, and inserts pragmas to achieve a favorable Quality of Results.
Addressing the statistical challenge of computing the multivariate normal (MVN) probability in high dimensions holds significant potential for enhancing various applications. One common way to compute high-dimensional MVN probabilities is the Separation-of-Variables (SOV) algorithm. This algorithm is known for its high computational complexity of O(n^3) and space complexity of O(n^2), mainly due to a Cholesky factorization operation for an n X n covariance matrix, where $n$ represents the dimensionality of the MVN problem. This work proposes a high-performance computing framework that allows scaling the SOV algorithm and, subsequently, the confidence region detection algorithm. The framework leverages parallel linear algebra algorithms with a task-based programming model to achieve performance scalability in computing process probabilities, especially on large-scale systems. In addition, we enhance our implementation by incorporating Tile Low-Rank (TLR) approximation techniques to reduce algorithmic complexity without compromising the necessary accuracy. To evaluate the performance and accuracy of our framework, we conduct assessments using simulated data and a wind speed dataset. Our proposed implementation effectively handles high-dimensional multivariate normal (MVN) probability computations on shared and distributed-memory systems using finite precision arithmetics and TLR approximation computation. Performance results show a significant speedup of up to 20X in solving the MVN problem using TLR approximation compared to the reference dense solution without sacrificing the application's accuracy. The qualitative results on synthetic and real datasets demonstrate how we maintain high accuracy in detecting confidence regions even when relying on TLR approximation to perform the underlying linear algebra operations.
Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enhancing the accuracy and reliability of their outputs through the use of real-world data. As RAG grows in complexity and incorporates multiple concepts that can influence its performance, this paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation, offering a detailed perspective from the retrieval viewpoint. It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies. Additionally, the paper introduces evaluation methods for RAG, addressing the challenges faced and proposing future research directions. By offering an organized framework and categorization, the study aims to consolidate existing research on RAG, clarify its technological underpinnings, and highlight its potential to broaden the adaptability and applications of LLMs.
We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.