亚洲色偷偷色噜噜狠狠99网VR-激情狠狠婷婷丁香亚洲综合

Disordered speech recognition is a highly challenging task. The underlying neuro-motor conditions of people with speech disorders, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of speech required for system development. This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation. Both normal and disordered speech were exploited in the augmentation process. Variability among impaired speakers in both the original and augmented data was modeled using learning hidden unit contributions (LHUC) based speaker adaptive training. The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute (9.3% relative) word error rate (WER) reduction over the baseline system without data augmentation, and gave an overall WER of 26.37% on the test set containing 16 dysarthric speakers.

相關內容

數據增強

關注 31

數據增強在機器學習領域多指采用一些方法（比如數據蒸餾，正負樣本均衡等）來提高模型數據集的質量，增強數據。

方差 · Machine Translation · 同分布的 · 數據集 · Performer ·

2022 年 4 月 19 日

Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics

Jiannan Xiang,Huayang Li,Yahui Liu,Lemao Liu,Guoping Huang,Defu Lian,Shuming Shi

from arxiv, Findings of ACL 2022

Current practices in metric evaluation focus on one single dataset, e.g., Newstest dataset in each year's WMT Metrics Shared Task. However, in this paper, we qualitatively and quantitatively show that the performances of metrics are sensitive to data. The ranking of metrics varies when the evaluation is conducted on different datasets. Then this paper further investigates two potential hypotheses, i.e., insignificant data points and the deviation of Independent and Identically Distributed (i.i.d) assumption, which may take responsibility for the issue of data variance. In conclusion, our findings suggest that when evaluating automatic translation metrics, researchers should take data variance into account and be cautious to claim the result on a single dataset, because it may leads to inconsistent results with most of other datasets.

自動語音識別 · 語音識別 · 損失函數（機器學習） · Performer · 泛函 ·

2022 年 4 月 19 日

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

Niko Moritz,Frank Seide,Duc Le,Jay Mahadeokar,Christian Fuegen

from arxiv, Submitted to Interspeech 2022

The two most popular loss functions for streaming end-to-end automatic speech recognition (ASR) are the RNN-Transducer (RNN-T) and the connectionist temporal classification (CTC) objectives. Both perform an alignment-free training by marginalizing over all possible alignments, but use different transition rules. Between these two loss types we can classify the monotonic RNN-T (MonoRNN-T) and the recently proposed CTC-like Transducer (CTC-T), which both can be realized using the graph temporal classification-transducer (GTC-T) loss function. Monotonic transducers have a few advantages. First, RNN-T can suffer from runaway hallucination, where a model keeps emitting non-blank symbols without advancing in time, often in an infinite loop. Secondly, monotonic transducers consume exactly one model score per time step and are therefore more compatible and unifiable with traditional FST-based hybrid ASR decoders. However, the MonoRNN-T so far has been found to have worse accuracy than RNN-T. It does not have to be that way, though: By regularizing the training - via joint LAS training or parameter initialization from RNN-T - both MonoRNN-T and CTC-T perform as well - or better - than RNN-T. This is demonstrated for LibriSpeech and for a large-scale in-house data set.

列 · Automator · 講稿 · 分解的 · 分離的 ·

2022 年 4 月 19 日

Core Box Image Recognition and its Improvement with a New Augmentation Technique

E. E. Baraboshkin,A. E. Demidov,D. M. Orlov,D. A. Koroteev

from arxiv, 20 pages, 16 figures, 1 table, the augmentation pipeline code samples published as Open-Source code for TLA at //github.com/BEEugene/TemplateArtification/, continue of the research from arXiv:1909.10227

Most methods for automated full-bore rock core image analysis (description, colour, properties distribution, etc.) are based on separate core column analyses. The core is usually imaged in a box because of the significant amount of time taken to get an image for each core column. The work presents an innovative method and algorithm for core columns extraction from core boxes. The conditions for core boxes imaging may differ tremendously. Such differences are disastrous for machine learning algorithms which need a large dataset describing all possible data variations. Still, such images have some standard features - a box and core. Thus, we can emulate different environments with a unique augmentation described in this work. It is called template-like augmentation (TLA). The method is described and tested on various environments, and results are compared on an algorithm trained on both 'traditional' data and a mix of traditional and TLA data. The algorithm trained with TLA data provides better metrics and can detect core on most new images, unlike the algorithm trained on data without TLA. The algorithm for core column extraction implemented in an automated core description system speeds up the core box processing by a factor of 20.

contrastive · 對比學習 · 學成 · 結點 · 相似度 ·

2022 年 4 月 19 日

Supervised Contrastive Learning for Recommendation

Chun Yang

In this work, we aim to consider the application of contrastive learning in the scenario of the recommendation system adequately, making it more suitable for recommendation task. We propose a learning paradigm called supervised contrastive learning(SCL) to support the graph convolutional neural network. Specifically, we will calculate the similarity between different nodes in user side and item side respectively during data preprocessing, and then when applying contrastive learning, not only will the augmented views be regarded as the positive samples, but also a certain number of similar samples will be regarded as the positive samples, which is different with SimCLR that treats other samples in a batch as negative samples. We apply SCL on the most advanced LightGCN. In addition, in order to consider the uncertainty of node interaction, we also propose a new data augment method called node replication. Empirical research and ablation study on Gowalla, Yelp2018, Amazon-Book datasets prove the effectiveness of SCL and node replication, which improve the accuracy of recommendations and robustness to interactive noise.

語音識別 · 訓練數據 · MoDELS · 情景 · 掩碼 ·

2022 年 4 月 18 日

Extracting Targeted Training Data from ASR Models, and How to Mitigate It

Ehsan Amid,Om Thakkar,Arun Narayanan,Rajiv Mathews,Fran?oise Beaufays

Recent work has designed methods to demonstrate that model updates in ASR training can leak potentially sensitive attributes of the utterances used in computing the updates. In this work, we design the first method to demonstrate information leakage about training data from trained ASR models. We design Noise Masking, a fill-in-the-blank style method for extracting targeted parts of training data from trained ASR models. We demonstrate the success of Noise Masking by using it in four settings for extracting names from the LibriSpeech dataset used for training a SOTA Conformer model. In particular, we show that we are able to extract the correct names from masked training utterances with 11.8% accuracy, while the model outputs some name from the train set 55.2% of the time. Further, we show that even in a setting that uses synthetic audio and partial transcripts from the test set, our method achieves 2.5% correct name accuracy (47.7% any name success rate). Lastly, we design Word Dropout, a data augmentation method that we show when used in training along with MTR, provides comparable utility as the baseline, along with significantly mitigating extraction via Noise Masking across the four evaluated settings.

Performer · INFORMS · 學成 · 邊緣化 · 試驗 ·

2022 年 4 月 17 日

An Adaptive Task-Related Component Analysis Method for SSVEP recognition

Vangelis P. Oikonomou

from arxiv, 23 pages, 3 Figures, 6 Tables

Steady-state visual evoked potential (SSVEP) recognition methods are equipped with learning from the subject's calibration data, and they can achieve extra high performance in the SSVEP-based brain-computer interfaces (BCIs), however their performance deteriorate drastically if the calibration trials are insufficient. This study develops a new method to learn from limited calibration data and it proposes and evaluates a novel adaptive data-driven spatial filtering approach for enhancing SSVEPs detection. The spatial filter learned from each stimulus utilizes temporal information from the corresponding EEG trials. To introduce the temporal information into the overall procedure, an multitask learning approach, based on the bayesian framework, is adopted. The performance of the proposed method was evaluated into two publicly available benchmark datasets, and the results demonstrated that our method outperform competing methods by a significant margin.

音素 · 可約的 · 數據增強 · 遷移學習 · Continuity ·

2022 年 4 月 16 日

STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation

Saad Naeem,Omer Beg

Phoneme recognition is a largely unsolved problem in NLP, especially for low-resource languages like Urdu. The systems that try to extract the phonemes from audio speech require hand-labeled phonetic transcriptions. This requires expert linguists to annotate speech data with its relevant phonetic representation which is both an expensive and a tedious task. In this paper, we propose STRATA, a framework for supervised phoneme recognition that overcomes the data scarcity issue for low resource languages using a seq2seq neural architecture integrated with transfer learning, attention mechanism, and data augmentation. STRATA employs transfer learning to reduce the network loss in half. It uses attention mechanism for word boundaries and frame alignment detection which further reduces the network loss by 4% and is able to identify the word boundaries with 92.2% accuracy. STRATA uses various data augmentation techniques to further reduce the loss by 1.5% and is more robust towards new signals both in terms of generalization and accuracy. STRATA is able to achieve a Phoneme Error Rate of 16.5% and improves upon the state of the art by 1.1% for TIMIT dataset (English) and 11.5% for CSaLT dataset (Urdu).

2022 年 4 月 15 日

Decoding Neural Correlation of Language-Specific Imagined Speech using EEG Signals

Keon-Woo Lee,Dae-Hyeok Lee,Sung-Jin Kim,Seong-Whan Lee

from arxiv, Accepted in EMBC 2022

Speech impairments due to cerebral lesions and degenerative disorders can be devastating. For humans with severe speech deficits, imagined speech in the brain-computer interface has been a promising hope for reconstructing the neural signals of speech production. However, studies in the EEG-based imagined speech domain still have some limitations due to high variability in spatial and temporal information and low signal-to-noise ratio. In this paper, we investigated the neural signals for two groups of native speakers with two tasks with different languages, English and Chinese. Our assumption was that English, a non-tonal and phonogram-based language, would have spectral differences in neural computation compared to Chinese, a tonal and ideogram-based language. The results showed the significant difference in the relative power spectral density between English and Chinese in specific frequency band groups. Also, the spatial evaluation of Chinese native speakers in the theta band was distinctive during the imagination task. Hence, this paper would suggest the key spectral and spatial information of word imagination with specialized language while decoding the neural signals of speech.

數據增強 · 泛化理論 · 矩 · 規范化的 · surge ·

2020 年 2 月 25 日

On Feature Normalization and Data Augmentation

Boyi Li,Felix Wu,Ser-Nam Lim,Serge Belongie,Kilian Q. Weinberger

Modern neural network training relies heavily on data augmentation for improved generalization. After the initial success of label-preserving augmentations, there has been a recent surge of interest in label-perturbing approaches, which combine features and labels across training samples to smooth the learned decision surface. In this paper, we propose a new augmentation method that leverages the first and second moments extracted and re-injected by feature normalization. We replace the moments of the learned features of one training image by those of another, and also interpolate the target labels. As our approach is fast, operates entirely in feature space, and mixes different signals than prior methods, one can effectively combine it with existing augmentation methods. We demonstrate its efficacy across benchmark data sets in computer vision, speech, and natural language processing, where it consistently improves the generalization performance of highly competitive baseline networks.

entity · Performer · 命名實體識別 · state-of-the-art · 主動學習 ·

2018 年 2 月 4 日

Deep Active Learning for Named Entity Recognition

Yanyao Shen,Hyokun Yun,Zachary C. Lipton,Yakov Kronrod,Animashree Anandkumar

Deep learning has yielded state-of-the-art performance on many natural language processing tasks including named entity recognition (NER). However, this typically requires large amounts of labeled data. In this work, we demonstrate that the amount of labeled training data can be drastically reduced when deep learning is combined with active learning. While active learning is sample-efficient, it can be computationally expensive since it requires iterative retraining. To speed this up, we introduce a lightweight architecture for NER, viz., the CNN-CNN-LSTM model consisting of convolutional character and word encoders and a long short term memory (LSTM) tag decoder. The model achieves nearly state-of-the-art performance on standard datasets for the task while being computationally much more efficient than best performing models. We carry out incremental active learning, during the training process, and are able to nearly match state-of-the-art performance with just 25\% of the original training data.