In this paper, we propose the use of self-supervised pretraining on a large unlabelled data set to improve the performance of a personalized voice activity detection (VAD) model in adverse conditions. We pretrain a long short-term memory (LSTM)-encoder using the autoregressive predictive coding (APC) framework and fine-tune it for personalized VAD. We also propose a denoising variant of APC, with the goal of improving the robustness of personalized VAD. The trained models are systematically evaluated on both clean speech and speech contaminated by various types of noise at different SNR-levels and compared to a purely supervised model. Our experiments show that self-supervised pretraining not only improves performance in clean conditions, but also yields models which are more robust to adverse conditions compared to purely supervised learning.
In this paper we reassess the parameterized complexity and approximability of the well-studied Steiner Forest problem in several graph classes of bounded width. The problem takes an edge-weighted graph and pairs of vertices as input, and the aim is to find a minimum cost subgraph in which each given vertex pair lies in the same connected component. It is known that this problem is APX-hard in general, and NP-hard on graphs of treewidth 3, treedepth 4, and feedback vertex set size 2. However, Bateni, Hajiaghayi and Marx [JACM, 2011] gave an approximation scheme with a runtime of $n^{O(\frac{k^2}{\varepsilon})}$ on graphs of treewidth $k$. Our main result is a much faster efficient parameterized approximation scheme (EPAS) with a runtime of $2^{O(\frac{k^2}{\varepsilon} \log \frac{k^2}{\varepsilon})} \cdot n^{O(1)}$. If $k$ instead is the vertex cover number of the input graph, we show how to compute the optimum solution in $2^{O(k \log k)} \cdot n^{O(1)}$ time, and we also prove that this runtime dependence on $k$ is asymptotically best possible, under ETH. Furthermore, if $k$ is the size of a feedback edge set, then we obtain a faster $2^{O(k)} \cdot n^{O(1)}$ time algorithm, which again cannot be improved under ETH.
In this paper we present a new gap-creating randomized self-reduction for parameterized Maximum Likelihood Decoding problem over $\mathbb{F}_p$ ($k$-MLD$_p$). The reduction takes a $k$-MLD$_p$ instance with $k\cdot n$ vectors as input, runs in time $f(k)n^{O(1)}$ for some computable function $f$, outputs a $(3/2-\varepsilon)$-Gap-$k'$-MLD$_p$ instance for any $\varepsilon>0$, where $k'=O(k^2\log k)$. Using this reduction, we show that assuming the randomized Exponential Time Hypothesis (ETH), no algorithms can approximate $k$-MLD$_p$ (and therefore its dual problem $k$-NCP$_p$) within factor $(3/2-\varepsilon)$ in $f(k)\cdot n^{o(\sqrt{k/\log k})}$ time for any $\varepsilon>0$. We then use reduction by Bhattacharyya, Ghoshal, Karthik and Manurangsi (ICALP 2018) to amplify the $(3/2-\varepsilon)$-gap to any constant. As a result, we show that assuming ETH, no algorithms can approximate $k$-NCP$_p$ and $k$-MDP$_p$ within $\gamma$-factor in $f(k)n^{o(k^{\varepsilon_\gamma})}$ time for some constant $\varepsilon_\gamma>0$. Combining with the gap-preserving reduction by Bennett, Cheraghchi, Guruswami and Ribeiro (STOC 2023), we also obtain similar lower bounds for $k$-MDP$_p$, $k$-CVP$_p$ and $k$-SVP$_p$. These results improve upon the previous $f(k)n^{\Omega(\mathsf{poly} \log k)}$ lower bounds for these problems under ETH using reductions by Bhattacharyya et al. (J.ACM 2021) and Bennett et al. (STOC 2023).
Many computational factors limit broader deployment of large language models. In this paper, we focus on a memory bottleneck imposed by the key-value (KV) cache, a computational shortcut that requires storing previous KV pairs during decoding. While existing KV cache methods approach this problem by pruning or evicting large swaths of relatively less important KV pairs to dramatically reduce the memory footprint of the cache, they can have limited success in tasks that require recollecting a majority of previous tokens. To alleviate this issue, we propose LESS, a simple integration of a (nearly free) constant sized cache with eviction-based cache methods, such that all tokens can be queried at later decoding steps. Its ability to retain information throughout time shows merit on a variety of tasks where we demonstrate LESS can help reduce the performance gap from caching everything, sometimes even matching it, all while being efficient.
In this paper, we propose a new deinterleaving method for mixtures of discrete renewal Markov chains. This method relies on the maximization of a penalized likelihood score. It exploits all available information about both the sequence of the different symbols and their arrival times. A theoretical analysis is carried out to prove that minimizing this score allows to recover the true partition of symbols in the large sample limit, under mild conditions on the component processes. This theoretical analysis is then validated by experiments on synthetic data. Finally, the method is applied to deinterleave pulse trains received from different emitters in a RESM (Radar Electronic Support Measurements) context and we show that the proposed method competes favorably with state-of-the-art methods on simulated warfare datasets.
In this paper, we propose a class of non-parametric classifiers, that learn arbitrary boundaries and generalize well. Our approach is based on a novel way to regularize 1NN classifiers using a greedy approach. We refer to this class of classifiers as Watershed Classifiers. 1NN classifiers are known to trivially over-fit but have very large VC dimension, hence do not generalize well. We show that watershed classifiers can find arbitrary boundaries on any dense enough dataset, and, at the same time, have very small VC dimension; hence a watershed classifier leads to good generalization. Traditional approaches to regularize 1NN classifiers are to consider $K$ nearest neighbours. Neighbourhood component analysis (NCA) proposes a way to learn representations consistent with ($n-1$) nearest neighbour classifier, where $n$ denotes the size of the dataset. In this article, we propose a loss function which can learn representations consistent with watershed classifiers, and show that it outperforms the NCA baseline.
This paper presents the design and implementation of a self-reconfigurable V-shape formation controller for multiple unmanned aerial vehicles (UAVs) navigating through narrow spaces in a dense obstacle environment. The selection of the V-shape formation is motivated by its maneuverability and visibility advantages. The main objective is to develop an effective formation control strategy that allows UAVs to autonomously adjust their positions to form the desired formation while navigating through obstacles. To achieve this, we propose a distributed behavior-based control algorithm that combines the behaviors designed for individual UAVs so that they together navigate the UAVs to their desired positions. The reconfiguration process is automatic, utilizing individual UAV sensing within the formation, allowing for dynamic adaptations such as opening/closing wings or merging into a straight line. Simulation results show that the self-reconfigurable V-shape formation offers adaptability and effectiveness for UAV formations in complex operational scenarios.
In this paper, we introduce an authorship attribution method called Authorial Language Models (ALMs) that involves identifying the most likely author of a questioned document based on the perplexity of the questioned document calculated for a set of causal language models fine-tuned on the writings of a set of candidate author. We benchmarked ALMs against state-of-art-systems using the CCAT50 dataset and the Blogs50 datasets. We find that ALMs achieves a macro-average accuracy score of 83.6% on Blogs50, outperforming all other methods, and 74.9% on CCAT50, matching the performance of the best method. To assess the performance of ALMs on shorter texts, we also conducted text ablation testing. We found that to reach a macro-average accuracy of 70%, ALMs needs 40 tokens on Blogs50 and 400 tokens on CCAT50, while to reach 60% ALMs requires 20 tokens on Blogs50 and 70 tokens on CCAT50.
Random binning is a powerful and widely used tool in information theory. In this paper, considering the Tsallis measures, we examine the output statistics of random binning (OSRB). Using the OSRB framework, the achievable rate region of the wiretap channel with Tsallis divergence as a security measure is investigated.
In this paper, we introduce \emph{refined Direct Preference Optimization} (rDPO), a method for improving the behavioral alignment of Large Language Models (LLMs) without the need for human-annotated data. The method involves creating synthetic data using self-critique prompting by a teacher LLM and then utilising a generalized DPO loss function to distil to a student LLM. The loss function incorporates an additional external reward model to improve the quality of synthetic data, making rDPO robust to potential noise in the synthetic dataset. rDPO is shown to be effective in a diverse set of behavioural alignment tasks, such as improved safety, robustness against role-playing, and reduced sycophancy. Code to be released at //github.com/vicgalle/refined-dpo.
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.