We generalize the leverage score sampling sketch for $\ell_2$-subspace embeddings, to accommodate sampling subsets of the transformed data, so that the sketching approach is appropriate for distributed settings. This is then used to derive an approximate coded computing approach for first-order methods; known as gradient coding, to accelerate linear regression in the presence of failures in distributed computational networks, \textit{i.e.} stragglers. We replicate the data across the distributed network, to attain the approximation guarantees through the induced sampling distribution. The significance and main contribution of this work, is that it unifies randomized numerical linear algebra with approximate coded computing, while attaining an induced $\ell_2$-subspace embedding through uniform sampling. The transition to uniform sampling is done without applying a random projection, as in the case of the subsampled randomized Hadamard transform. Furthermore, by incorporating this technique to coded computing, our scheme is an iterative sketching approach to approximately solving linear regression. We also propose weighting when sketching takes place through sampling with replacement, for further compression.
Relative entropy coding (REC) algorithms encode a sample from a target distribution $Q$ using a proposal distribution $P$ using as few bits as possible. Unlike entropy coding, REC does not assume discrete distributions or require quantisation. As such, it can be naturally integrated into communication pipelines such as learnt compression and differentially private federated learning. Unfortunately, despite their practical benefits, REC algorithms have not seen widespread application, due to their prohibitively slow runtimes or restrictive assumptions. In this paper, we make progress towards addressing these issues. We introduce Greedy Rejection Coding (GRC), which generalises the rejection based-algorithm of Harsha et al. (2007) to arbitrary probability spaces and partitioning schemes. We first show that GRC terminates almost surely and returns unbiased samples from $Q$, after which we focus on two of its variants: GRCS and GRCD. We show that for continuous $Q$ and $P$ over $\mathbb{R}$ with unimodal density ratio $dQ/dP$, the expected runtime of GRCS is upper bounded by $\beta D_{KL}[Q || P] + O(1)$ where $\beta \approx 4.82$, and its expected codelength is optimal. This makes GRCS the first REC algorithm with guaranteed optimal runtime for this class of distributions, up to the multiplicative constant $\beta$. This significantly improves upon the previous state-of-the-art method, A* coding (Flamich et al., 2022). Under the same assumptions, we experimentally observe and conjecture that the expected runtime and codelength of GRCD are upper bounded by $D_{KL}[Q || P] + O(1)$. Finally, we evaluate GRC in a variational autoencoder-based compression pipeline on MNIST, and show that a modified ELBO and an index-compression method can further improve compression efficiency.
We explore the ability of large language models (LLMs) to act as ASR post-processors that perform rescoring and error correction. Our focus is on instruction prompting to let LLMs perform these task without fine-tuning, for which we evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task-activating prompting (TAP) method that combines instruction and demonstration. Using a pre-trained first-pass system and rescoring output on two out-of-domain tasks (ATIS and WSJ), we show that rescoring only by in-context learning with frozen LLMs achieves results that are competitive with rescoring by domain-tuned LMs. By combining prompting techniques with fine-tuning we achieve error rates below the N-best oracle level, showcasing the generalization power of the LLMs.
We show how variations of range-restriction and also the Horn property can be passed from inputs to outputs of Craig interpolation in first-order logic. The proof system is clausal tableaux, which stems from first-order ATP. Our results are induced by a restriction of the clausal tableau structure, which can be achieved in general by a proof transformation, also if the source proof is by resolution/paramodulation. Primarily addressed applications are query synthesis and reformulation with interpolation. Our methodical approach combines operations on proof structures with the immediate perspective of feasible implementation through incorporating highly optimized first-order provers.
Urban flood risk emerges from complex and nonlinear interactions among multiple features related to flood hazard, flood exposure, and social and physical vulnerabilities, along with the complex spatial flood dependence relationships. Existing approaches for characterizing urban flood risk, however, are primarily based on flood plain maps, focusing on a limited number of features, primarily hazard and exposure features, without consideration of feature interactions or the dependence relationships among spatial areas. To address this gap, this study presents an integrated urban flood-risk rating model based on a novel unsupervised graph deep learning model (called FloodRisk-Net). FloodRisk-Net is capable of capturing spatial dependence among areas and complex and nonlinear interactions among flood hazards and urban features for specifying emergent flood risk. Using data from multiple metropolitan statistical areas (MSAs) in the United States, the model characterizes their flood risk into six distinct city-specific levels. The model is interpretable and enables feature analysis of areas within each flood-risk level, allowing for the identification of the three archetypes shaping the highest flood risk within each MSA. Flood risk is found to be spatially distributed in a hierarchical structure within each MSA, where the core city disproportionately bears the highest flood risk. Multiple cities are found to have high overall flood-risk levels and low spatial inequality, indicating limited options for balancing urban development and flood-risk reduction. Relevant flood-risk reduction strategies are discussed considering ways that the highest flood risk and uneven spatial distribution of flood risk are formed.
We can define the error distribution as the limiting distribution of the error between the solution $Y$ of a given stochastic differential equation (SDE) and its numerical approximation $\hat{Y}^{(m)}$, weighted by the convergence rate between the two. A goal when studying the error distribution is to provide a way of determination for error distributions for any SDE and numerical scheme that converge to the exact solution. By dividing the error into a main term and a remainder term in a particular way, the author shows that the remainder term can be negligible compared to the main term under certain suitable conditions. Under these conditions, deriving the error distribution reduces to deriving the limiting distribution of the main term. Even if the dimension is one, there are unsolved problems about the asymptotic behavior of the error when the SDE has a drift term and $0<H\leq 1/3$, but our result in the one-dimensional case can be adapted to any Hurst exponent. The main idea of the proof is to define a stochastic process $Y^{m, \rho}$ with the parameter $\rho$ interpolating between $Y$ and $\hat{Y}^{(m)}$ and to estimate the asymptotic expansion for it. Using this estimate, we determine the error distribution of the ($k$)-Milstein scheme and of the Crank-Nicholson scheme in unsolved cases.
We present the first $\varepsilon$-differentially private, computationally efficient algorithm that estimates the means of product distributions over $\{0,1\}^d$ accurately in total-variation distance, whilst attaining the optimal sample complexity to within polylogarithmic factors. The prior work had either solved this problem efficiently and optimally under weaker notions of privacy, or had solved it optimally while having exponential running times.
Holographic multiple-input multiple-output (MIMO) communications are widely recognized as a promising candidate for the next-generation air interface. With holographic MIMO surface, the number of the spatial degrees-of-freedom (DoFs) considerably increases and also significantly varies as the user moves. To fully employ the large and varying number of spatial DoFs, the number of equipped RF chains has to be larger than or equal to the largest number of spatial DoFs. However, this causes much waste as radio frequency (RF) chains (especially the transmit RF chains) are costly and power-hungry. To avoid the heavy burden, this paper investigates green holographic MIMO communications with a few transmit RF chains under an electromagnetic-based communication model. We not only look at the fundamental capacity limits but also propose an effective transmission, namely non-uniform holographic pattern modulation (NUHPM), to achieve the capacity limit in the high signal-to-noise (SNR) regime. The analytical result sheds light on the green evaluation of MIMO communications, which can be realized by increasing the size of the antenna aperture without increasing the number of transmit RF chains. Numerical results are provided to verify our analysis and to show the great performance gain by employing the additional spatial DoFs as modulation resources.
Domain generalization studies the problem of training a model with samples from several domains (or distributions) and then testing the model with samples from a new, unseen domain. In this paper, we propose a novel approach for domain generalization that leverages recent advances in large vision-language models, specifically a CLIP teacher model, to train a smaller model that generalizes to unseen domains. The key technical contribution is a new type of regularization that requires the student's learned image representations to be close to the teacher's learned text representations obtained from encoding the corresponding text descriptions of images. We introduce two designs of the loss function, absolute and relative distance, which provide specific guidance on how the training process of the student model should be regularized. We evaluate our proposed method, dubbed RISE (Regularized Invariance with Semantic Embeddings), on various benchmark datasets and show that it outperforms several state-of-the-art domain generalization methods. To our knowledge, our work is the first to leverage knowledge distillation using a large vision-language model for domain generalization. By incorporating text-based information, RISE improves the generalization capability of machine learning models.
Abstract machines for strong evaluation of the $\lambda$-calculus enter into arguments and have a set of transitions for backtracking out of an evaluated argument. We study a new abstract machine which avoids backtracking by splitting the run of the machine in smaller jobs, one for argument, and that jumps directly to the next job once one is finished. Usually, machines are also deterministic and implement deterministic strategies. Here we weaken this aspect and consider a light form of non-determinism, namely the diamond property, for both the machine and the strategy. For the machine, this introduces a modular management of jobs, parametric in a scheduling policy. We then show how to obtain various strategies, among which leftmost-outermost evaluation.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.