The emergence of industrial-scale speech recognition (ASR) models such as Whisper and USM, trained on 1M hours of weakly labelled and 12M hours of audio only proprietary data respectively, has led to a stronger need for large scale public ASR corpora and competitive open source pipelines. Unlike the said models, large language models are typically based on Transformer decoders, and it remains unclear if decoder-only models trained on public data alone can deliver competitive performance. In this work, we investigate factors such as choice of training datasets and modeling components necessary for obtaining the best performance using public English ASR corpora alone. Our Decoder-Only Transformer for ASR (DOTA) model comprehensively outperforms the encoder-decoder open source replication of Whisper (OWSM) on nearly all English ASR benchmarks and outperforms Whisper large-v3 on 7 out of 15 test sets. We release our codebase and model checkpoints under permissive license.
Regression models that incorporate smooth functions of predictor variables to explain the relationships with a response variable have gained widespread usage and proved successful in various applications. By incorporating smooth functions of predictor variables, these models can capture complex relationships between the response and predictors while still allowing for interpretation of the results. In situations where the relationships between a response variable and predictors are explored, it is not uncommon to assume that these relationships adhere to certain shape constraints. Examples of such constraints include monotonicity and convexity. The scam package for R has become a popular package to carry out the full fitting of exponential family generalized additive modelling with shape restrictions on smooths. The paper aims to extend the existing framework of shape-constrained generalized additive models (SCAM) to accommodate smooth interactions of covariates, linear functionals of shape-constrained smooths and incorporation of residual autocorrelation. The methods described in this paper are implemented in the recent version of the package scam, available on the Comprehensive R Archive Network (CRAN).
For regression model selection under the maximum likelihood framework, we study the likelihood ratio confidence region for the regression parameter vector of a full regression model. We show that, when the confidence level increases with the sample size at a certain speed, with probability tending to one, the confidence region contains only vectors representing models having all active variables, including the parameter vector of the true model. This result leads to a consistent model selection criterion with a sparse maximum likelihood interpretation and certain advantages over popular information criteria. It also provides a large-sample characterization of models of maximum likelihood at different model sizes which shows that, for selection consistency, it suffices to consider only this small set of models.
The well-conditioned multi-product formula (MPF), proposed by [Low, Kliuchnikov, and Wiebe, 2019], is a simple high-order time-independent Hamiltonian simulation algorithm that implements a linear combination of standard product formulas of low order. While the MPF aims to simultaneously exploit commutator scaling among Hamiltonians and achieve near-optimal time and precision dependence, its lack of a rigorous error bound on the nested commutators renders its practical advantage ambiguous. In this work, we conduct a rigorous complexity analysis of the well-conditioned MPF, demonstrating explicit commutator scaling and near-optimal time and precision dependence at the same time. Using our improved complexity analysis, we present several applications of practical interest where the MPF based on a second-order product formula can achieve a polynomial speedup in both system size and evolution time, as well as an exponential speedup in precision, compared to second-order and even higher-order product formulas. Compared to post-Trotter methods, the MPF based on a second-order product formula can achieve polynomially better scaling in system size, with only poly-logarithmic overhead in evolution time and precision.
We propose an implicit Discontinuous Galerkin (DG) discretization for incompressible two-phase flows using an artificial compressibility formulation. The conservative level set (CLS) method is employed in combination with a reinitialization procedure to capture the moving interface. A projection method based on the L-stable TR-BDF2 method is adopted for the time discretization of the Navier-Stokes equations and of the level set method. Adaptive Mesh Refinement (AMR) is employed to enhance the resolution in correspondence of the interface between the two fluids. The effectiveness of the proposed approach is shown in a number of classical benchmarks. A specific analysis on the influence of different choices of the mixture viscosity is also carried out.
Testing the equality of mean vectors across $g$ different groups plays an important role in many scientific fields. In regular frameworks, likelihood-based statistics under the normality assumption offer a general solution to this task. However, the accuracy of standard asymptotic results is not reliable when the dimension $p$ of the data is large relative to the sample size $n_i$ of each group. We propose here an exact directional test for the equality of $g$ normal mean vectors with identical unknown covariance matrix, provided that $\sum_{i=1}^g n_i \ge p+g+1$. In the case of two groups ($g=2$), the directional test is equivalent to the Hotelling's $T^2$ test. In the more general situation where the $g$ independent groups may have different unknown covariance matrices, although exactness does not hold, simulation studies show that the directional test is more accurate than most commonly used likelihood based solutions. Robustness of the directional approach and its competitors under deviation from multivariate normality is also numerically investigated.
The Shallow Ice Approximation (SIA) model on strong form is commonly used for inferring the flow dynamics of grounded ice sheets. The solution to the SIA model is a closed-form expression for the velocity field. When that velocity field is used to advance the ice surface in time, the time steps have to take small values due to quadratic scaling in terms of the horizontal mesh size. In this paper we write the SIA model on weak form, and add in the Free Surface Stabilization Algorithm (FSSA) terms. We find numerically that the time step restriction scaling is improved from quadratic to linear, but only for large horizontal mesh sizes. We then extend the weak form by adding the initially neglected normal stress terms. This allows for a linear time step restriction across the whole range of the horizontal mesh sizes, leading to an improved efficiency. Theoretical analysis demonstrates that the inclusion of FSSA stabilization terms transitions the explicit time stepping treatment of second derivative surface terms to an implicit approach. Moreover, a computational cost analysis, combined with numerical results on stability and accuracy, advocates for preferring the SIA models written on weak form over the standard SIA model.
Periodic autoregressive (PAR) time series with finite variance is considered as one of the most common models of second-order cyclostationary processes. However, in the real applications, the signals with periodic characteristics may be disturbed by additional noise related to measurement device disturbances or to other external sources. Thus, the known estimation techniques dedicated for PAR models may be inefficient for such cases. When the variance of the additive noise is relatively small, it can be ignored and the classical estimation techniques can be applied. However, for extreme cases, the additive noise can have a significant influence on the estimation results. In this paper, we propose four estimation techniques for the noise-corrupted PAR models with finite variance distributions. The methodology is based on Yule-Walker equations utilizing the autocovariance function. It can be used for any type of the finite variance additive noise. The presented simulation study clearly indicates the efficiency of the proposed techniques, also for extreme case, when the additive noise is a sum of the Gaussian additive noise and additive outliers. The proposed estimation techniques are also applied for testing if the data corresponds to noise-corrupted PAR model. This issue is strongly related to the identification of informative component in the data in case when the model is disturbed by additive non-informative noise. The power of the test is studied for simulated data. Finally, the testing procedure is applied for two real time series describing particulate matter concentration in the air.
The Bell regression model (BRM) is a statistical model that is often used in the analysis of count data that exhibits overdispersion. In this study, we propose a Bayesian analysis of the BRM and offer a new perspective on its application. Specifically, we introduce a G-prior distribution for Bayesian inference in BRM, in addition to a flat-normal prior distribution. To compare the performance of the proposed prior distributions, we conduct a simulation study and demonstrate that the G-prior distribution provides superior estimation results for the BRM. Furthermore, we apply the methodology to real data and compare the BRM to the Poisson regression model using various model selection criteria. Our results provide valuable insights into the use of Bayesian methods for estimation and inference of the BRM and highlight the importance of considering the choice of prior distribution in the analysis of count data.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.
In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances from five aspects: feature extraction, model architecture, pre-training objectives, pre-training datasets, and downstream tasks. Then, we summarize the specific VLP models in detail. Finally, we discuss the new frontiers in VLP. To the best of our knowledge, this is the first survey on VLP. We hope that this survey can shed light on future research in the VLP field.