亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

·

Performer · 數據集增強 · 可約的 · 語音識別 · 自動語音識別 ·

2021 年 11 月 12 日

TS-RIR: Translated synthetic room impulse responses for speech augmentation

Anton Ratnarajah,Zhenyu Tang,Dinesh Manocha

from arxiv, Accepted to IEEE ASRU 2021. Source code is available at //github.com/GAMMA-UMD/TS-RIR

We present a method for improving the quality of synthetic room impulse responses for far-field speech recognition. We bridge the gap between the fidelity of synthetic room impulse responses (RIRs) and the real room impulse responses using our novel, TS-RIRGAN architecture. Given a synthetic RIR in the form of raw audio, we use TS-RIRGAN to translate it into a real RIR. We also perform real-world sub-band room equalization on the translated synthetic RIR. Our overall approach improves the quality of synthetic RIRs by compensating low-frequency wave effects, similar to those in real RIRs. We evaluate the performance of improved synthetic RIRs on a far-field speech dataset augmented by convolving the LibriSpeech clean speech dataset [1] with RIRs and adding background noise. We show that far-field speech augmented using our improved synthetic RIRs reduces the word error rate by up to 19.9% in Kaldi far-field automatic speech recognition benchmark [2].

相關內容

Performer

數據增強 · 語音識別 · 隱藏單元 · 情景 · 錯誤率 ·

2022 年 1 月 14 日

Investigation of Data Augmentation Techniques for Disordered Speech Recognition

Mengzhe Geng,Xurong Xie,Shansong Liu,Jianwei Yu,Shoukang Hu,Xunying Liu,Helen Meng

from arxiv, Proceedings of INTERSPEECH 2020

Disordered speech recognition is a highly challenging task. The underlying neuro-motor conditions of people with speech disorders, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of speech required for system development. This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation. Both normal and disordered speech were exploited in the augmentation process. Variability among impaired speakers in both the original and augmented data was modeled using learning hidden unit contributions (LHUC) based speaker adaptive training. The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute (9.3% relative) word error rate (WER) reduction over the baseline system without data augmentation, and gave an overall WER of 26.37% on the test set containing 16 dysarthric speakers.

Speech Com · 語音翻譯 · 知識共享（Creative Commons） · 未標記 · 數據集 ·

2022 年 1 月 13 日

Speech Resources in the Tamasheq Language

Marcely Zanon Boito,Fethi Bougares,Florentin Barbier,Souhir Gahbiche,Lo?c Barrault,Mickael Rouvier,Yannick Estève

from arxiv, Submitted to LREC 2022

In this paper we present two datasets for Tamasheq, a developing language mainly spoken in Mali and Niger. These two datasets were made available for the IWSLT 2022 low-resource speech translation track, and they consist of collections of radio recordings from the Studio Kalangou (Niger) and Studio Tamani (Mali) daily broadcast news. We share (i) a massive amount of unlabeled audio data (671 hours) in five languages: French from Niger, Fulfulde, Hausa, Tamasheq and Zarma, and (ii) a smaller parallel corpus of audio recordings (17 hours) in Tamasheq, with utterance-level translations in the French language. All this data is shared under the Creative Commons BY-NC-ND 3.0 license. We hope these resources will inspire the speech community to develop and benchmark models using the Tamasheq language.

entity · 命名實體識別 · MoDELS · SOTA · 無監督 ·

2019 年 11 月 22 日

Zero-Resource Cross-Lingual Named Entity Recognition

M Saiful Bari,Shafiq Joty,Prathyusha Jwalapuram

Recently, neural methods have achieved state-of-the-art (SOTA) results in Named Entity Recognition (NER) tasks for many languages without the need for manually crafted features. However, these models still require manually annotated training data, which is not available for many languages. In this paper, we propose an unsupervised cross-lingual NER model that can transfer NER knowledge from one language to another in a completely unsupervised way without relying on any bilingual dictionary or parallel data. Our model achieves this through word-level adversarial learning and augmented fine-tuning with parameter sharing and feature augmentation. Experiments on five different languages demonstrate the effectiveness of our approach, outperforming existing models by a good margin and setting a new SOTA for each language pair.

語言模型化 · state-of-the-art · SimPLe · 數據增強 · 語音識別 ·

2019 年 4 月 18 日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Daniel S. Park,William Chan,Yu Zhang,Chung-Cheng Chiu,Barret Zoph,Ekin D. Cubuk,Quoc V. Le

from arxiv, 6 pages, 3 figures, 6 tables; submitted to Interspeech 2019

We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8% WER on test-other without the use of a language model, and 5.8% WER with shallow fusion with a language model. This compares to the previous state-of-the-art hybrid system of 7.5% WER. For Switchboard, we achieve 7.2%/14.6% on the Switchboard/CallHome portion of the Hub5'00 test set without the use of a language model, and 6.8%/14.1% with shallow fusion, which compares to the previous state-of-the-art hybrid system at 8.3%/17.3% WER.

數據增強 · 圖像分割 · state-of-the-art · 樣例 · 標注 ·

2019 年 2 月 25 日

Data augmentation using learned transforms for one-shot medical image segmentation

Amy Zhao,Guha Balakrishnan,Frédo Durand,John V. Guttag,Adrian V. Dalca

from arxiv, 9 pages, CVPR 2019

Biomedical image segmentation is an important task in many medical applications. Segmentation methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling datasets of medical images requires significant expertise and time, and is infeasible at large scales. To tackle the lack of labeled data, researchers use techniques such as hand-engineered preprocessing steps, hand-tuned architectures, and data augmentation. However, these techniques involve costly engineering efforts, and are typically dataset-specific. We present an automated data augmentation method for medical images. We demonstrate our method on the task of segmenting magnetic resonance imaging (MRI) brain scans, focusing on the one-shot segmentation scenario -- a practical challenge in many medical applications. Our method requires only a single segmented scan, and leverages other unlabeled scans in a semi-supervised approach. We learn a model of transforms from the images, and use the model along with the labeled example to synthesize additional labeled training examples for supervised segmentation. Each transform is comprised of a spatial deformation field and an intensity change, enabling the synthesis of complex effects such as variations in anatomy and image acquisition procedures. Augmenting the training of a supervised segmenter with these new examples provides significant improvements over state-of-the-art methods for one-shot biomedical image segmentation. Our code is available at //github.com/xamyzhao/brainstorm.

生成式對抗網絡 · Networking · 數據增強 · 掩碼 · 生成方法 ·

2019 年 1 月 18 日

Red blood cell image generation for data augmentation using Conditional Generative Adversarial Networks

Oleksandr Bailo,DongShik Ham,Young Min Shin

In this paper, we describe how to apply image-to-image translation techniques to medical blood smear data to generate new data samples and meaningfully increase small datasets. Specifically, given the segmentation mask of the microscopy image, we are able to generate photorealistic images of blood cells which are further used alongside real data during the network training for segmentation and object detection tasks. This image data generation approach is based on conditional generative adversarial networks which have proven capabilities to high-quality image synthesis. In addition to synthesizing blood images, we synthesize segmentation mask as well which leads to a diverse variety of generated samples. The effectiveness of the technique is thoroughly analyzed and quantified through a number of experiments on a manually collected and annotated dataset of blood smear taken under a microscope.

語音增強 · 分離的 · 可約的 · INFORMS · state-of-the-art ·

2018 年 11 月 27 日

Improved Speech Enhancement with the Wave-U-Net

Craig Macartney,Tillman Weyde

from arxiv, 5 pages (including 1 for References), 1 figure, 2 tables

We study the use of the Wave-U-Net architecture for speech enhancement, a model introduced by Stoller et al for the separation of music vocals and accompaniment. This end-to-end learning method for audio source separation operates directly in the time domain, permitting the integrated modelling of phase information and being able to take large temporal contexts into account. Our experiments show that the proposed method improves several metrics, namely PESQ, CSIG, CBAK, COVL and SSNR, over the state-of-the-art with respect to the speech enhancement task on the Voice Bank corpus (VCTK) dataset. We find that a reduced number of hidden layers is sufficient for speech enhancement in comparison to the original system designed for singing voice separation in music. We see this initial result as an encouraging signal to further explore speech enhancement in the time-domain, both as an end in itself and as a pre-processing step to speech recognition systems.

源領域 · 目標領域 · Cycle-GAN · 圖像分割 · 單峰值 ·

2018 年 7 月 12 日

Sem-GAN: Semantically-Consistent Image-to-Image Translation

Anoop Cherian,Alan Sullivan

Unpaired image-to-image translation is the problem of mapping an image in the source domain to one in the target domain, without requiring corresponding image pairs. To ensure the translated images are realistically plausible, recent works, such as Cycle-GAN, demands this mapping to be invertible. While, this requirement demonstrates promising results when the domains are unimodal, its performance is unpredictable in a multi-modal scenario such as in an image segmentation task. This is because, invertibility does not necessarily enforce semantic correctness. To this end, we present a semantically-consistent GAN framework, dubbed Sem-GAN, in which the semantics are defined by the class identities of image segments in the source domain as produced by a semantic segmentation algorithm. Our proposed framework includes consistency constraints on the translation task that, together with the GAN loss and the cycle-constraints, enforces that the images when translated will inherit the appearances of the target domain, while (approximately) maintaining their identities from the source domain. We present experiments on several image-to-image translation tasks and demonstrate that Sem-GAN improves the quality of the translated images significantly, sometimes by more than 20% on the FCN score. Further, we show that semantic segmentation models, trained with synthetic images translated via Sem-GAN, leads to significantly better segmentation results than other variants.

圖像分割 · 簇 · 有偏 · 劃分 · 圖 ·

2018 年 3 月 27 日

Compassionately Conservative Balanced Cuts for Image Segmentation

Nathan D. Cahill,Tyler L. Hayes,Renee T. Meinhold,John F. Hamilton

from arxiv, Long version of paper accepted at CVPR 2018

The Normalized Cut (NCut) objective function, widely used in data clustering and image segmentation, quantifies the cost of graph partitioning in a way that biases clusters or segments that are balanced towards having lower values than unbalanced partitionings. However, this bias is so strong that it avoids any singleton partitions, even when vertices are very weakly connected to the rest of the graph. Motivated by the B\"uhler-Hein family of balanced cut costs, we propose the family of Compassionately Conservative Balanced (CCB) Cut costs, which are indexed by a parameter that can be used to strike a compromise between the desire to avoid too many singleton partitions and the notion that all partitions should be balanced. We show that CCB-Cut minimization can be relaxed into an orthogonally constrained $\ell_{\tau}$-minimization problem that coincides with the problem of computing Piecewise Flat Embeddings (PFE) for one particular index value, and we present an algorithm for solving the relaxed problem by iteratively minimizing a sequence of reweighted Rayleigh quotients (IRRQ). Using images from the BSDS500 database, we show that image segmentation based on CCB-Cut minimization provides better accuracy with respect to ground truth and greater variability in region size than NCut-based image segmentation.

state-of-the-art · 語音識別 · MoDELS · 優化器 · 注意力機制 ·

2018 年 1 月 18 日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Chung-Cheng Chiu,Tara N. Sainath,Yonghui Wu,Rohit Prabhavalkar,Patrick Nguyen,Zhifeng Chen,Anjuli Kannan,Ron J. Weiss,Kanishka Rao,Ekaterina Gonina,Navdeep Jaitly,Bo Li,Jan Chorowski,Michiel Bacchiani

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network. In our previous work, we have shown that such architectures are comparable to state-of-the-art ASR systems on dictation tasks, but it was not clear if such architectures would be practical for more challenging tasks such as voice search. In this work, we explore a variety of structural and optimization improvements to our LAS model which significantly improve performance. On the structural side, we show that word piece models can be used instead of graphemes. We introduce a multi-head attention architecture, which offers improvements over the commonly-used single-head attention. On the optimization side, we explore techniques such as synchronous training, scheduled sampling, label smoothing, and minimum word error rate optimization, which are all shown to improve accuracy. We present results with a unidirectional LSTM encoder for streaming recognition. On a 12,500 hour voice search task, we find that the proposed changes improve the WER of the LAS system from 9.2% to 5.6%, while the best conventional system achieve 6.7% WER. We also test both models on a dictation dataset, and our model provide 4.1% WER while the conventional system provides 5% WER.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

數(shu)據集增強

自動語(yu)音識別

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<dir id='cMJDM'><del id='PDIz1'><del id='8WVbU'></del><pre id='4LKKd'><pre id='696jR'><option id='XK61G'><address id='1RArF'></address><bdo id='jPNDX'><tr id='ZTp9O'><acronym id='pv18d'><pre id='Za2av'></pre></acronym><div id='dRzVY'></div></tr></bdo></option></pre><small id='fHS4a'><address id='pvDFL'><u id='o9hWn'><legend id='lS7FC'><option id='BU9WQ'><abbr id='Z1z6B'></abbr><li id='Cp2OS'><pre id='sbw9y'></pre></li></option></legend><select id='43uvZ'></select></u></address></small></pre></del><sup id='24eWj'></sup><blockquote id='uU5tw'><dt id='vJ6mk'></dt></blockquote><blockquote id='z7Hdj'></blockquote></dir><tt id='0gCe6'></tt><u id='kw8gJ'><tt id='JAM7Z'><form id='6VMWh'></form></tt><td id='RuIrx'><dt id='SgkX0'></dt></td></u>

<code id='QRA8a'><i id='vV09d'><q id='hJT90'><legend id='mekMY'><pre id='mQdmg'><style id='HMTXc'><acronym id='FlWNX'><i id='uXAzY'><form id='pmiz5'><option id='JAO3g'><center id='4hfuR'></center></option></form></i></acronym></style><tt id='aOo1D'></tt></pre></legend></q></i></code><center id='Tcl7H'></center>

<dd id='pjQbt'></dd>

<style id='vxXTh'></style><sub id='heKR1'><dfn id='XfPLo'><abbr id='iMKp3'><big id='oTCSS'><bdo id='hF2x9'></bdo></big></abbr></dfn></sub>_{<dir id='LmmSY'></dir>}