The problem of transmitting a parameter value over an additive white Gaussian noise (AWGN) channel is considered, where, in addition to the transmitter and the receiver, there is a helper that observes the noise non-causally and provides a description of limited rate $R_h$ to the transmitter and/or the receiver. We derive upper and lower bounds on the optimal achievable $\alpha$-th moment of the estimation error and show that they coincide for small values of $\alpha$ and for high values of $R_h$. The upper bound relies on a recently proposed channel-coding scheme that effectively conveys $R_h$ bits essentially error-free and the rest of the rate - over the same AWGN channel without help, with the error-free bits being allocated to the most significant bits of the quantized parameter. We then concentrate on the setting with a total transmit energy constraint, for which we derive achievability results for both channel coding and parameter modulation for several scenarios: when the helper assists only the transmitter or only the receiver and knows the noise, and when the helper assists the transmitter and/or the receiver and knows both the noise and the message. In particular, for the message-informed helper that assists both the receiver and the transmitter, it is shown that the error probability in the channel-coding task decays doubly exponentially. Finally, we translate these results to those for continuous-time power-limited AWGN channels with unconstrained bandwidth. As a byproduct, we show that the capacity with a message-informed helper that is available only at the transmitter can exceed the sum of the capacity without help and the help rate $R_h$, when the helper knows only the noise but not the message.
Self-Supervised Learning (SSL) has proven to be useful in various speech tasks. However, these methods are generally very demanding in terms of data, memory, and computational resources. BERT-based Speech pre-Training with Random-projection Quantizer (BEST-RQ), is an SSL method that has shown great performance on Automatic Speech Recognition (ASR) while being simpler than other SSL methods, such as wav2vec 2.0. Despite BEST-RQ's great performance, details are lacking in the original paper, such as the amount of GPU/TPU hours used in pre-training, and there is no official easy-to-use open-source implementation. Furthermore, BEST-RQ has not been evaluated on other downstream tasks aside from ASR and speech translation. In this work, we describe a re-implementation of a Random-projection quantizer and perform a preliminary study with a comparison to wav2vec 2.0 on four downstream tasks. We discuss the details and differences of our implementation. We show that a random projection quantizer can achieve similar downstream performance as wav2vec 2.0 while decreasing training time by over a factor of two.
The notion of variation is introduced for the Boolean set and based on which Boolean logic backpropagation principle is developed. Using this concept, deep models can be built with weights and activations being Boolean numbers and operated with Boolean logic instead of real arithmetic. In particular, Boolean deep models can be trained directly in the Boolean domain without latent weights. No gradient but logic is synthesized and backpropagated through layers.
Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources. This paper introduces Step-Skipping Alignment (SkipAlign), a new technique designed to enhance the long-context capabilities of LLMs in the phase of alignment without the need for additional efforts beyond training with original data length. SkipAlign is developed on the premise that long-range dependencies are fundamental to enhancing an LLM's capacity of long context. Departing from merely expanding the length of input samples, SkipAlign synthesizes long-range dependencies from the aspect of positions indices. This is achieved by the strategic insertion of skipped positions within instruction-following samples, which utilizes the semantic structure of the data to effectively expand the context. Through extensive experiments on base models with a variety of context window sizes, SkipAlign demonstrates its effectiveness across a spectrum of long-context tasks. Particularly noteworthy is that with a careful selection of the base model and alignment datasets, SkipAlign with only 6B parameters achieves it's best performance and comparable with strong baselines like GPT-3.5-Turbo-16K on LongBench.
We propose the on-the-fly ensembling of a machine translation model with an LLM, prompted on the same task and input. We perform experiments on 4 language pairs (both directions) with varying data amounts. We find that a slightly weaker-at-translation LLM can improve translations of a NMT model, and ensembling with an LLM can produce better translations than ensembling two stronger MT models. We combine our method with various techniques from LLM prompting, such as in context learning and translation context.
Hypervolume improvement (HVI) is commonly employed in multi-objective Bayesian optimization algorithms to define acquisition functions due to its Pareto-compliant property. Rather than focusing on specific statistical moments of HVI, this work aims to provide the exact expression of HVI's probability distribution for bi-objective problems. Considering a bi-variate Gaussian random variable resulting from Gaussian process (GP) modeling, we derive the probability distribution of its hypervolume improvement via a cell partition-based method. Our exact expression is superior in numerical accuracy and computation efficiency compared to the Monte Carlo approximation of HVI's distribution. Utilizing this distribution, we propose a novel acquisition function - $\varepsilon$-probability of hypervolume improvement ($\varepsilon$-PoHVI). Experimentally, we show that on many widely-applied bi-objective test problems, $\varepsilon$-PoHVI significantly outperforms other related acquisition functions, e.g., $\varepsilon$-PoI, and expected hypervolume improvement, when the GP model exhibits a large the prediction uncertainty.
In continuation of an earlier study, we explore a Neymann-Pearson hypothesis testing scenario where, under the null hypothesis ($\cal{H}_0$), the received signal is a white noise process $N_t$, which is not Gaussian in general, and under the alternative hypothesis ($\cal{H}_1$), the received signal comprises a deterministic transmitted signal $s_t$ corrupted by additive white noise, the sum of $N_t$ and another noise process originating from the transmitter, denoted as $Z_t$, which is not necessarily Gaussian either. Our approach focuses on detectors that are based on the correlation and energy of the received signal, which are motivated by implementation simplicity. We optimize the detector parameters to achieve the best trade-off between missed-detection and false-alarm error exponents. First, we optimize the detectors for a given signal, resulting in a non-linear relation between the signal and correlator weights to be optimized. Subsequently, we optimize the transmitted signal and the detector parameters jointly, revealing that the optimal signal is a balanced ternary signal and the correlator has at most three different coefficients, thus facilitating a computationally feasible solution.
We consider channel coding for discrete memoryless channels (DMCs) with a novel cost constraint that constrains both the mean and the variance of the cost of the codewords. We show that the maximum (asymptotically) achievable rate under the new cost formulation is equal to the capacity-cost function; in particular, the strong converse holds. We further characterize the optimal second-order coding rate of these cost-constrained codes; in particular, the optimal second-order coding rate is finite. We then show that the second-order coding performance is strictly improved with feedback using a new variation of timid/bold coding, significantly broadening the applicability of timid/bold coding schemes from unconstrained compound-dispersion channels to all cost-constrained channels. Equivalent results on the minimum average probability of error are also given.
We consider a missing data problem in the context of automatic segmentation methods for Magnetic Resonance Imaging (MRI) brain scans. Usually, automated MRI scan segmentation is based on multiple scans (e.g., T1-weighted, T2-weighted, T1CE, FLAIR). However, quite often a scan is blurry, missing or otherwise unusable. We investigate the question whether a missing scan can be synthesized. We exemplify that this is in principle possible by synthesizing a T2-weighted scan from a given T1-weighted scan. Our first aim is to compute a picture that resembles the missing scan closely, measured by average mean squared error (MSE). We develop/use several methods for this, including a random baseline approach, a clustering-based method and pixel-to-pixel translation method by Isola et al. (Pix2Pix) which is based on conditional GANs. The lowest MSE is achieved by our clustering-based method. Our second aim is to compare the methods with respect to the effect that using the synthesized scan has on the segmentation process. For this, we use a DeepMedic model trained with the four input scan modalities named above. We replace the T2-weighted scan by the synthesized picture and evaluate the segmentations with respect to the tumor identification, using Dice scores as numerical evaluation. The evaluation shows that the segmentation works well with synthesized scans (in particular, with Pix2Pix methods) in many cases.
Detecting location-correlated groups in point sets is an important task in a wide variety of applications areas. In addition to merely detecting such groups, the group's shape carries meaning as well. In this paper, we represent a group's shape using a simple geometric object, a line segment. Specifically, given a radius $r$, we say a line segment is representative of a point set $P$ if it is within distance $r$ of each point $p \in P$. We aim to find the shortest such line segment. This problem is equivalent to stabbing a set of circles of radius $r$ using the shortest line segment. We describe an algorithm to find the shortest representative segment in $O(n \log h + h \log^3 h)$ time. Additionally, we show how to maintain a stable approximation of the shortest representative segment when the points in $P$ move.
Contrastive loss has been increasingly used in learning representations from multiple modalities. In the limit, the nature of the contrastive loss encourages modalities to exactly match each other in the latent space. Yet it remains an open question how the modality alignment affects the downstream task performance. In this paper, based on an information-theoretic argument, we first prove that exact modality alignment is sub-optimal in general for downstream prediction tasks. Hence we advocate that the key of better performance lies in meaningful latent modality structures instead of perfect modality alignment. To this end, we propose three general approaches to construct latent modality structures. Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization. Extensive experiments are conducted on two popular multi-modal representation learning frameworks: the CLIP-based two-tower model and the ALBEF-based fusion model. We test our model on a variety of tasks including zero/few-shot image classification, image-text retrieval, visual question answering, visual reasoning, and visual entailment. Our method achieves consistent improvements over existing methods, demonstrating the effectiveness and generalizability of our proposed approach on latent modality structure regularization.