草莓视频在线观看免费完整_精品自在线观看影片天天看_人人操人人干人人上_蜜臀AV国内精品久久久夜夜嗨_欧洲美熟女乱AV在_免费高清视频在线一区二区三区_午夜男女爽爽免费完整版

級聯 · 語音識別 · MoDELS · 知識 (knowledge) · 解碼 ·

2023 年 12 月 15 日

On the compression of shallow non-causal ASR models using knowledge distillation and tied-and-reduced decoder for low-latency on-device speech recognition

Nagaraj Adiga,Jinhwan Park,Chintigari Shiva Kumar,Shatrughan Singh,Kyungmin Lee,Chanwoo Kim,Dhananjaya Gowda

Recently, the cascaded two-pass architecture has emerged as a strong contender for on-device automatic speech recognition (ASR). A cascade of causal and shallow non-causal encoders coupled with a shared decoder enables operation in both streaming and look-ahead modes. In this paper, we propose shallow cascaded model by combining various model compression techniques such as knowledge distillation, shared decoder, and tied-and-reduced transducer network in order to reduce the model footprint. The shared decoder is changed into a tied-and-reduced network. The cascaded two-pass model is further compressed using knowledge distillation using a Kullback-Leibler divergence loss on the model posteriors. We demonstrate a 50% reduction in the size of a 41 M parameter cascaded teacher model with no noticeable degradation in ASR accuracy and a 30% reduction in latency

相關內容

級聯

關注 0

SimPLe · 可交換的 · 樣本 · 統計量 · Excel ·

2024 年 2 月 5 日

Sequential permutation testing by betting

Lasse Fischer,Aaditya Ramdas

from arxiv, 29 pages, 8 figures

We develop an anytime-valid permutation test, where the dataset is fixed and the permutations are sampled sequentially one by one, with the objective of saving computational resources by sampling fewer permutations and stopping early. The core technical advance is the development of new test martingales (nonnegative martingales with initial value one) for testing exchangeability against a very particular alternative. These test martingales are constructed using new and simple betting strategies that smartly bet on the relative ranks of permuted test statistics. The betting strategies are guided by the derivation of a simple log-optimal betting strategy, and display excellent power in practice. In contrast to a well-known method by Besag and Clifford, our method yields a valid e-value or a p-value at any stopping time, and with particular stopping rules, it yields computational gains under both the null and the alternative without compromising power.

異常檢測 · 可辨認的 · Automator · Performer · Principle ·

2024 年 2 月 5 日

One-class anomaly detection through color-to-thermal AI for building envelope inspection

Polina Kurtser,Kailun Feng,Thomas Olofsson,Aitor De Andres

We present a label-free method for detecting anomalies during thermographic inspection of building envelopes. It is based on the AI-driven prediction of thermal distributions from color images. Effectively the method performs as a one-class classifier of the thermal image regions with high mismatch between the predicted and actual thermal distributions. The algorithm can learn to identify certain features as normal or anomalous by selecting the target sample used for training. We demonstrated this principle by training the algorithm with data collected at different outdoors temperature, which lead to the detection of thermal bridges. The method can be implemented to assist human professionals during routine building inspections or combined with mobile platforms for automating examination of large areas.

Processing（編程語言） · Markov · 損失 · MoDELS · 估計/估計量 ·

2024 年 2 月 4 日

Analysis of an aggregate loss model in a Markov renewal regime

Pepa Ramírez-Cobo,Emilio Carrizosa,Rosa Elvira Lillo

In this article we consider an aggregate loss model with dependent losses. The losses occurrence process is governed by a two-state Markovian arrival process (MAP2), a Markov renewal process process that allows for (1) correlated inter-losses times, (2) non-exponentially distributed inter-losses times and, (3) overdisperse losses counts. Some quantities of interest to measure persistence in the loss occurrence process are obtained. Given a real operational risk database, the aggregate loss model is estimated by fitting separately the inter-losses times and severities. The MAP2 is estimated via direct maximization of the likelihood function, and severities are modeled by the heavy-tailed, double-Pareto Lognormal distribution. In comparison with the fit provided by the Poisson process, the results point out that taking into account the dependence and overdispersion in the inter-losses times distribution leads to higher capital charges.

INFORMS · MoDELS · 模型平均 · 準則 · 估計/估計量 ·

2024 年 2 月 2 日

Improved information criteria for Bayesian model averaging in lattice field theory

Ethan T. Neil,Jacob W. Sitison

from arxiv, 70 pages, 13 figures. v2: corrections to data subset formulas for BPIC and PPIC; edits for clarity. v3: updated to journal version

Bayesian model averaging is a practical method for dealing with uncertainty due to model specification. Use of this technique requires the estimation of model probability weights. In this work, we revisit the derivation of estimators for these model weights. Use of the Kullback-Leibler divergence as a starting point leads naturally to a number of alternative information criteria suitable for Bayesian model weight estimation. We explore three such criteria, known to the statistics literature before, in detail: a Bayesian analogue of the Akaike information criterion which we call the BAIC, the Bayesian predictive information criterion (BPIC), and the posterior predictive information criterion (PPIC). We compare the use of these information criteria in numerical analysis problems common in lattice field theory calculations. We find that the PPIC has the most appealing theoretical properties and can give the best performance in terms of model-averaging uncertainty, particularly in the presence of noisy data, while the BAIC is a simple and reliable alternative.

MoDELS · 操作 · CASES · 統計量 · Performer ·

2024 年 2 月 2 日

A test for counting sequences of integer-valued autoregressive models

Yuichi Goto,Kou Fujimori

from arxiv, 24 pages, 2 tables

The integer autoregressive (INAR) model is one of the most commonly used models in nonnegative integer-valued time series analysis and is a counterpart to the traditional autoregressive model for continuous-valued time series. To guarantee the integer-valued nature, the binomial thinning operator or more generally the generalized Steutel and van Harn operator is used to define the INAR model. However, the distributions of the counting sequences used in the operators have been determined by the preference of analyst without statistical verification so far. In this paper, we propose a test based on the mean and variance relationships for distributions of counting sequences and a disturbance process to check if the operator is reasonable. We show that our proposed test has asymptotically correct size and is consistent. Numerical simulation is carried out to evaluate the finite sample performance of our test. As a real data application, we apply our test to the monthly number of anorexia cases in animals submitted to animal health laboratories in New Zealand and we conclude that binomial thinning operator is not appropriate.

有向 · 級聯 · MoDELS · CASE · CASES ·

2024 年 2 月 1 日

Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases

Giulio Zhou,Tsz Kin Lam,Alexandra Birch,Barry Haddow

from arxiv, Accepted at Findings of EACL 2024

Speech-to-Text Translation (S2TT) has typically been addressed with cascade systems, where speech recognition systems generate a transcription that is subsequently passed to a translation model. While there has been a growing interest in developing direct speech translation systems to avoid propagating errors and losing non-verbal content, prior work in direct S2TT has struggled to conclusively establish the advantages of integrating the acoustic signal directly into the translation process. This work proposes using contrastive evaluation to quantitatively measure the ability of direct S2TT systems to disambiguate utterances where prosody plays a crucial role. Specifically, we evaluated Korean-English translation systems on a test set containing wh-phrases, for which prosodic features are necessary to produce translations with the correct intent, whether it's a statement, a yes/no question, a wh-question, and more. Our results clearly demonstrate the value of direct translation systems over cascade translation models, with a notable 12.9% improvement in overall accuracy in ambiguous cases, along with up to a 15.6% increase in F1 scores for one of the major intent categories. To the best of our knowledge, this work stands as the first to provide quantitative evidence that direct S2TT models can effectively leverage prosody. The code for our evaluation is openly accessible and freely available for review and utilisation.

樣本 · Performer · INFORMS · 估計/估計量 · 全 ·

2024 年 2 月 1 日

Bell sampling from quantum circuits

Dominik Hangleiter,Michael J. Gullans

from arxiv, 7+15 pages, 2 figures. Comments welcome. v2: corrected typos, added references v3: added results, improved proofs v4: extended noise analysis

A central challenge in the verification of quantum computers is benchmarking their performance as a whole and demonstrating their computational capabilities. In this work, we find a universal model of quantum computation, Bell sampling, that can be used for both of those tasks and thus provides an ideal stepping stone towards fault-tolerance. In Bell sampling, we measure two copies of a state prepared by a quantum circuit in the transversal Bell basis. We show that the Bell samples are classically intractable to produce and at the same time constitute what we call a circuit shadow: from the Bell samples we can efficiently extract information about the quantum circuit preparing the state, as well as diagnose circuit errors. In addition to known properties that can be efficiently extracted from Bell samples, we give two new and efficient protocols, a test for the depth of the circuit and an algorithm to estimate a lower bound to the number of T gates in the circuit. With some additional measurements, our algorithm learns a full description of states prepared by circuits with low T-count.

語音識別 · MoDELS · 變換 · Performer · 語言模型化 ·

2024 年 1 月 31 日

Exploring the limits of decoder-only models trained on public speech recognition corpora

Ankit Gupta,George Saon,Brian Kingsbury

The emergence of industrial-scale speech recognition (ASR) models such as Whisper and USM, trained on 1M hours of weakly labelled and 12M hours of audio only proprietary data respectively, has led to a stronger need for large scale public ASR corpora and competitive open source pipelines. Unlike the said models, large language models are typically based on Transformer decoders, and it remains unclear if decoder-only models trained on public data alone can deliver competitive performance. In this work, we investigate factors such as choice of training datasets and modeling components necessary for obtaining the best performance using public English ASR corpora alone. Our Decoder-Only Transformer for ASR (DOTA) model comprehensively outperforms the encoder-decoder open source replication of Whisper (OWSM) on nearly all English ASR benchmarks and outperforms Whisper large-v3 on 7 out of 15 test sets. We release our codebase and model checkpoints under permissive license.

估計/估計量 · 均值 · MoDELS · 優化器 · INFORMS ·

2024 年 1 月 31 日

Penalized G-estimation for effect modifier selection in the structural nested mean models for repeated outcomes

Ajmery Jaman,Guanbo Wang,Ashkan Ertefaie,Michèle Bally,Renée Lévesque,Robert Platt,Mireille Schnitzer

Effect modification occurs when the impact of the treatment on an outcome varies based on the levels of other covariates known as effect modifiers. Modeling of these effect differences is important for etiological goals and for purposes of optimizing treatment. Structural nested mean models (SNMMs) are useful causal models for estimating the potentially heterogeneous effect of a time-varying exposure on the mean of an outcome in the presence of time-varying confounding. In longitudinal health studies, information on many demographic, behavioural, biological, and clinical covariates may be available, among which some might cause heterogeneous treatment effects. A data-driven approach for selecting the effect modifiers of an exposure may be necessary if these effect modifiers are \textit{a priori} unknown and need to be identified. Although variable selection techniques are available in the context of estimating conditional average treatment effects using marginal structural models, or in the context of estimating optimal dynamic treatment regimens, all of these methods consider an outcome measured at a single point in time. In the context of an SNMM for repeated outcomes, we propose a doubly robust penalized G-estimator for the causal effect of a time-varying exposure with a simultaneous selection of effect modifiers and prove the oracle property of our estimator. We conduct a simulation study to evaluate the performance of the proposed estimator in finite samples and for verification of its double-robustness property. Our work is motivated by a study of hemodiafiltration for treating patients with end-stage renal disease at the Centre Hospitalier de l'Universit\'e de Montr\'eal.

圖片分類 · 前饋網絡 · INTERACT · Networking · 前饋 ·

2021 年 5 月 7 日

ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron,Piotr Bojanowski,Mathilde Caron,Matthieu Cord,Alaaeldin El-Nouby,Edouard Grave,Armand Joulin,Gabriel Synnaeve,Jakob Verbeek,Hervé Jégou

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.