In this paper, we introduce an improved approach of speculative decoding aimed at enhancing the efficiency of serving large language models. Our method capitalizes on the strengths of two established techniques: the classic two-model speculative decoding approach, and the more recent single-model approach, Medusa. Drawing inspiration from Medusa, our approach adopts a single-model strategy for speculative decoding. However, our method distinguishes itself by employing a single, lightweight draft head with a recurrent dependency design, akin in essence to the small, draft model uses in classic speculative decoding, but without the complexities of the full transformer architecture. And because of the recurrent dependency, we can use beam search to swiftly filter out undesired candidates with the draft head. The outcome is a method that combines the simplicity of single-model design and avoids the need to create a data-dependent tree attention structure only for inference in Medusa. We empirically demonstrate the effectiveness of the proposed method on several popular open source language models, along with a comprehensive analysis of the trade-offs involved in adopting this approach.
In this paper, we prove that with high probability, random Reed-Solomon codes approach the half-Singleton bound - the optimal rate versus error tradeoff for linear insdel codes - with linear-sized alphabets. More precisely, we prove that, for any $\epsilon>0$ and positive integers $n$ and $k$, with high probability, random Reed--Solomon codes of length $n$ and dimension $k$ can correct $(1-\varepsilon)n-2k+1$ adversarial insdel errors over alphabets of size $n+2^{\mathsf{poly}(1/\varepsilon)}k$. This significantly improves upon the alphabet size demonstrated in the work of Con, Shpilka, and Tamo (IEEE TIT, 2023), who showed the existence of Reed--Solomon codes with exponential alphabet size $\widetilde O\left(\binom{n}{2k-1}^2\right)$ precisely achieving the half-Singleton bound. Our methods are inspired by recent works on list-decoding Reed-Solomon codes. Brakensiek-Gopi-Makam (STOC 2023) showed that random Reed-Solomon codes are list-decodable up to capacity with exponential-sized alphabets, and Guo-Zhang (FOCS 2023) and Alrabiah-Guruswami-Li (STOC 2024) improved the alphabet-size to linear. We achieve a similar alphabet-size reduction by similarly establishing strong bounds on the probability that certain random rectangular matrices are full rank. To accomplish this in our insdel context, our proof combines the random matrix techniques from list-decoding with structural properties of Longest Common Subsequences.
In this paper, we first give an introduction to the theoretical basis of the privacy-utility equilibrium in federated learning based on Bayesian privacy definitions and total variation distance privacy definitions. We then present the \textit{Learn-to-Distort-Data} framework, which provides a principled approach to navigate the privacy-utility equilibrium by explicitly modeling the distortion introduced by the privacy-preserving mechanism as a learnable variable and optimizing it jointly with the model parameters. We demonstrate the applicability of our framework to a variety of privacy-preserving mechanisms on the basis of data distortion and highlight its connections to related areas such as adversarial training, input robustness, and unlearnable examples. These connections enable leveraging techniques from these areas to design effective algorithms for privacy-utility equilibrium in federated learning under the \textit{Learn-to-Distort-Data} framework.
In this work, we propose a method to improve the energy efficiency and fairness of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) for mobile users, ensuring reduced power consumption while maintaining reliable communication. To achieve this, we introduce a new parameter known as the subsurface assignment variable, which determines the number of STAR-RIS elements allocated to each user. We then formulate a novel optimization problem by concurrently optimizing the phase shifts of the STAR-RIS and subsurface assignment variable. We leverage the deep reinforcement learning (DRL) technique to address this optimization problem. The DRL model predicts the phase shifts of the STAR-RIS and efficiently allocates elements of STAR-RIS to the users. Additionally, we incorporate a penalty term in the DRL model to facilitate intelligent deactivation of STAR-RIS elements when not in use to enhance energy efficiency. Through extensive experiments, we show that the proposed method can achieve fairly high and nearly equal data rates for all users in both the transmission and reflection spaces in an energy-efficient manner.
In this paper, we present a study of a Federated Learning (FL) system, based on the use of decentralized architectures to ensure trust and increase reliability. The system is based on the idea that the FL collaborators upload the (ciphered) model parameters on the Inter-Planetary File System (IPFS) and interact with a dedicated smart contract to track their behavior. Thank to this smart contract, the phases of parameter updates are managed efficiently, thereby strengthening data security. We have carried out an experimental study that exploits two different methods of weight aggregation, i.e., a classic averaging scheme and a federated proximal aggregation. The results confirm the feasibility of the proposal.
In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement rate-splitting multiple access (RSMA), where the common stream is designed as a useful signal and an artificial noise, while taking into account the imperfect channel state information and modeling the channel for the illegal users in a fine-grained manner as well as giving an upper bound on the error. We introduce the secrecy outage probability and construct an optimization problem with secrecy sum-rate as the objective functions to optimize the common stream beamforming matrix, the private stream beamforming matrix and the timeslot duration variable. Due to the coupling of the optimization variables and the infinity of the error set, the proposed problem is a nonconvex optimization problem that cannot be solved directly. In order to address the above challenges, the block coordinate descent-based second-order cone programming algorithm is used to decouple the optimization variables and solving the problem. Specifically, the problem is decoupled into two subproblems concerning the common stream beamforming matrix, the private stream beamforming matrix, and the timeslot duration variable, which are solved by alternating optimization until convergence is reached. To solve the problem, S-procedure, Bernstein's inequality and successive convex approximation are employed to deal with the objective function and non-convex constraints. Numerical simulation results verify the superiority of the proposed scheme in improving the secrecy energy efficiency and the Cram\'{e}r-Rao boundary.
In this work, we propose a distributed hierarchical locomotion control strategy for whole-body cooperation and demonstrate the potential for migration into large numbers of agents. Our method utilizes a hierarchical structure to break down complex tasks into smaller, manageable sub-tasks. By incorporating spatiotemporal continuity features, we establish the sequential logic necessary for causal inference and cooperative behaviour in sequential tasks, thereby facilitating efficient and coordinated control strategies. Through training within this framework, we demonstrate enhanced adaptability and cooperation, leading to superior performance in task completion compared to the original methods. Moreover, we construct a set of environments as the benchmark for embodied cooperation.
In this paper, we describe a new hybrid algorithm for computing all singular triplets above a given threshold and provide its implementation in MATLAB/Octave and R. The high performance of our codes and ease at which they can be used, either independently or within a larger numerical scheme, are illustrated through several numerical examples with applications to matrix completion and image compression. Well-documented MATLAB and R codes are provided for public use.
In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models. To this end, we propose to utilize prompt learning and mitigate the above two challenges together. Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model. We further explore the effect of different prompt configurations and analyze the robustness to missing modality. Extensive experiments are conducted to show the effectiveness of our prompt learning framework that improves the performance under various missing-modality cases, while alleviating the requirement of heavy model re-training. Code is available.
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.