Prompts are crucial to large language models as they provide context information such as topic or logical relationships. Inspired by this, we propose PromptASR, a framework that integrates prompts in end-to-end automatic speech recognition (E2E ASR) systems to achieve contextualized ASR with controllable style of transcriptions. Specifically, a dedicated text encoder encodes the text prompts and the encodings are injected into the speech encoder by cross-attending the features from two modalities. When using the ground truth text from preceding utterances as content prompt, the proposed system achieves 21.9% and 6.8% relative word error rate reductions on a book reading dataset and an in-house dataset compared to a baseline ASR system. The system can also take word-level biasing lists as prompt to improve recognition accuracy on rare words. An additional style prompt can be given to the text encoder and guide the ASR system to output different styles of transcriptions. The code is available at icefall.
In this article, we introduce a notion of depth functions for data types that are not given in statistical standard data formats. Data depth functions have been intensively studied for normed vector spaces. However, a discussion on depth functions on data where one specific data structure cannot be presupposed is lacking. We call such data non-standard data. To define depth functions for non-standard data, we represent the data via formal concept analysis which leads to a unified data representation. Besides introducing these depth functions, we give a systematic basis of depth functions for non-standard using formal concept analysis by introducing structural properties. Furthermore, we embed the generalised Tukey depth into our concept of data depth and analyse it using the introduced structural properties. Thus, this article provides the mathematical formalisation of centrality and outlyingness for non-standard data. Thereby, we increase the number of spaces in which centrality can be discussed. In particular, it gives a basis to define further depth functions and statistical inference methods for non-standard data.
Robust Markov Decision Processes (RMDPs) are a widely used framework for sequential decision-making under parameter uncertainty. RMDPs have been extensively studied when the objective is to maximize the discounted return, but little is known for average optimality (optimizing the long-run average of the rewards obtained over time) and Blackwell optimality (remaining discount optimal for all discount factors sufficiently close to 1). In this paper, we prove several foundational results for RMDPs beyond the discounted return. We show that average optimal policies can be chosen stationary and deterministic for sa-rectangular RMDPs but, perhaps surprisingly, that history-dependent (Markovian) policies strictly outperform stationary policies for average optimality in s-rectangular RMDPs. We also study Blackwell optimality for sa-rectangular RMDPs, where we show that {\em approximate} Blackwell optimal policies always exist, although Blackwell optimal policies may not exist. We also provide a sufficient condition for their existence, which encompasses virtually any examples from the literature. We then discuss the connection between average and Blackwell optimality, and we describe several algorithms to compute the optimal average return. Interestingly, our approach leverages the connections between RMDPs and stochastic games.
We often rely on censuses of triangulations to guide our intuition in $3$-manifold topology. However, this can lead to misplaced faith in conjectures if the smallest counterexamples are too large to appear in our census. Since the number of triangulations increases super-exponentially with size, there is no way to expand a census beyond relatively small triangulations; the current census only goes up to $10$ tetrahedra. Here, we show that it is feasible to search for large and hard-to-find counterexamples by using heuristics to selectively (rather than exhaustively) enumerate triangulations. We use this idea to find counterexamples to three conjectures which ask, for certain $3$-manifolds, whether one-vertex triangulations always have a "distinctive" edge that would allow us to recognise the $3$-manifold.
Implicit solvers for atmospheric models are often accelerated via the solution of a preconditioned system. For block preconditioners this typically involves the factorisation of the (approximate) Jacobian for the coupled system into a Helmholtz equation for some function of the pressure. Here we present a preconditioner for the compressible Euler equations with a flux form representation of the potential temperature on the Lorenz grid using mixed finite elements. This formulation allows for spatial discretisations that conserve both energy and potential temperature variance. By introducing the dry thermodynamic entropy as an auxiliary variable for the solution of the algebraic system, the resulting preconditioner is shown to have a similar block structure to an existing preconditioner for the material form transport of potential temperature on the Charney-Phillips grid, and to be more efficient and stable than either this or a previous Helmholtz preconditioner for the flux form transport of density weighted potential temperature on the Lorenz grid for a one dimensional thermal bubble configuration. The new preconditioner is further verified against standard two dimensional test cases in a vertical slice geometry.
In this paper, we propose a reduced-order modeling strategy for two-way Dirichlet-Neumann parametric coupled problems solved with domain-decomposition (DD) sub-structuring methods. We split the original coupled differential problem into two sub-problems with Dirichlet and Neumann interface conditions, respectively. After discretization by, e.g., the finite element method, the full-order model (FOM) is solved by Dirichlet-Neumann iterations between the two sub-problems until interface convergence is reached. We then apply the reduced basis (RB) method to obtain a low-dimensional representation of the solution of each sub-problem. Furthermore, we apply the discrete empirical interpolation method (DEIM) at the interface level to achieve a fully reduced-order representation of the DD techniques implemented. To deal with non-conforming FE interface discretizations, we employ the INTERNODES method combined with the interface DEIM reduction. The reduced-order model (ROM) is then solved by sub-iterating between the two reduced-order sub-problems until the convergence of the approximated high-fidelity interface solutions. The ROM scheme is numerically verified on both steady and unsteady coupled problems, in the case of non-conforming FE interfaces.
We introduce a family of discrete context-specific models, which we call decomposable. We construct this family from the subclass of staged tree models known as CStree models. We give an algebraic and combinatorial characterization of all context-specific independence relations that hold in a decomposable context-specific model, which yields a Markov basis. We prove that the moralization operation applied to the graphical representation of a context-specific model does not affect the implied independence relations, thus affirming that these models are algebraically described by a finite collection of decomposable graphical models. More generally, we establish that several algebraic, combinatorial, and geometric properties of decomposable context-specific models generalize those of decomposable graphical models to the context-specific setting.
This study addresses a class of linear mixed-integer programming (MILP) problems that involve uncertainty in the objective function parameters. The parameters are assumed to form a random vector, whose probability distribution can only be observed through a finite training data set. Unlike most of the related studies in the literature, we also consider uncertainty in the underlying data set. The data uncertainty is described by a set of linear constraints for each random sample, and the uncertainty in the distribution (for a fixed realization of data) is defined using a type-1 Wasserstein ball centered at the empirical distribution of the data. The overall problem is formulated as a three-level distributionally robust optimization (DRO) problem. First, we prove that the three-level problem admits a single-level MILP reformulation, if the class of loss functions is restricted to biaffine functions. Secondly, it turns out that for several particular forms of data uncertainty, the outlined problem can be solved reasonably fast by leveraging the nominal MILP problem. Finally, we conduct a computational study, where the out-of-sample performance of our model and computational complexity of the proposed MILP reformulation are explored numerically for several application domains.
We study the multivariate integration problem for periodic functions from the weighted Korobov space in the randomized setting. We introduce a new randomized rank-1 lattice rule with a randomly chosen number of points, which avoids the need for component-by-component construction in the search for good generating vectors while still achieving nearly the optimal rate of the randomized error. Our idea is to exploit the fact that at least half of the possible generating vectors yield nearly the optimal rate of the worst-case error in the deterministic setting. By randomly choosing generating vectors $r$ times and comparing their corresponding worst-case errors, one can find one generating vector with a desired worst-case error bound with a very high probability, and the (small) failure probability can be controlled by increasing $r$ logarithmically as a function of the number of points. Numerical experiments are conducted to support our theoretical findings.
We consider the problem of zero-error function computation with side information. Alice has a source $X$ and Bob has correlated source $Y$ and they can communicate via either classical or a quantum channel. Bob wants to calculate $f(X,Y)$ with zero error. We aim to characterize the minimum amount of information that Alice needs to send to Bob for this to happen with zero-error. In the classical setting, this quantity depends on the asymptotic growth of $\chi(G^{(m)})$, the chromatic number of an appropriately defined $m$-instance "confusion graph". In this work we present structural characterizations of $G^{(m)}$ and demonstrate two function computation scenarios that have the same single-instance confusion graph. However, in one case there a strict advantage in using quantum transmission as against classical transmission, whereas there is no such advantage in the other case.
In recent years, object detection has experienced impressive progress. Despite these improvements, there is still a significant gap in the performance between the detection of small and large objects. We analyze the current state-of-the-art model, Mask-RCNN, on a challenging dataset, MS COCO. We show that the overlap between small ground-truth objects and the predicted anchors is much lower than the expected IoU threshold. We conjecture this is due to two factors; (1) only a few images are containing small objects, and (2) small objects do not appear enough even within each image containing them. We thus propose to oversample those images with small objects and augment each of those images by copy-pasting small objects many times. It allows us to trade off the quality of the detector on large objects with that on small objects. We evaluate different pasting augmentation strategies, and ultimately, we achieve 9.7\% relative improvement on the instance segmentation and 7.1\% on the object detection of small objects, compared to the current state of the art method on MS COCO.