国产亚洲欧美日韩精品色狠二区_日本一区不卡在线观看_99久久免费国产精品男女高_XX福利片在线国产一区_看全色黄大色黄大片女爽一_天天操美女B免费视频_国产精品无码专区第6页

End-to-end text-to-speech (TTS) systems have been developed for European languages like English and Spanish with state-of-the-art speech quality, prosody, and naturalness. However, development of end-to-end TTS for Indian languages is lagging behind in terms of quality. The challenges involved in such a task are: 1) scarcity of quality training data; 2) low efficiency during training and inference; 3) slow convergence in the case of large vocabulary size. In our work reported in this paper, we have investigated the use of fine-tuning the English-pretrained Tacotron2 model with limited Sanskrit data to synthesize natural sounding speech in Sanskrit in low resource settings. Our experiments show encouraging results, achieving an overall MOS of 3.38 from 37 evaluators with good Sanskrit spoken knowledge. This is really a very good result, considering the fact that the speech data we have used is of duration 2.5 hours only.

相關內容

語音合成

關注 491

語音合成(cheng)（Speech Synthesis），也(ye)稱為(wei)文語轉換（Text-to-Speech, TTS,它是(shi)將任意的(de)(de)輸(shu)入(ru)文本轉換成(cheng)自然(ran)流暢的(de)(de)語音輸(shu)出。語音合成(cheng)涉及到(dao)(dao)人工智能、心理學(xue)、聲學(xue)、語言學(xue)、數字信號處理、計算機科(ke)學(xue)等多個學(xue)科(ke)技術，是(shi)信息(xi)處理領域中(zhong)(zhong)的(de)(de)一(yi)項前沿技術。隨著計算機技術的(de)(de)不斷(duan)提高，語音合成(cheng)技術從早期的(de)(de)共振峰合成(cheng),逐(zhu)步發展為(wei)波形拼(pin)接合成(cheng)和統計參(can)數語音合成(cheng)，再發展到(dao)(dao)混(hun)合語音合成(cheng)；合成(cheng)語音的(de)(de)質量、自然(ran)度已經得(de)到(dao)(dao)明顯提高，基本能滿足一(yi)些(xie)特定場(chang)合的(de)(de)應(ying)(ying)用需求。目(mu)前，語音合成(cheng)技術在銀(yin)行、醫院等的(de)(de)信息(xi)播報系(xi)統、汽車導(dao)航系(xi)統、自動(dong)應(ying)(ying)答(da)呼叫中(zhong)(zhong)心等都(dou)有廣(guang)泛應(ying)(ying)用，取得(de)了(le)巨大的(de)(de)經濟效益。另外(wai)，隨著智能手機、MP3、PDA 等與我們(men)生活密切(qie)相(xiang)關的(de)(de)媒介的(de)(de)大量涌現，語音合成(cheng)的(de)(de)應(ying)(ying)用也(ye)在逐(zhu)漸向娛樂、語音教學(xue)、康復治療等領域深(shen)入(ru)。可(ke)以說(shuo)語音合成(cheng)正在影響(xiang)著人們(men)生活的(de)(de)方(fang)方(fang)面(mian)面(mian)。

分解 · RNN · Networking · Neural Networks · 循環神經網絡 ·

2023 年 2 月 9 日

Decomposing a Recurrent Neural Network into Modules for Enabling Reusability and Replacement

Sayem Mohammad Imtiaz,Fraol Batole,Astha Singh,Rangeet Pan,Breno Dantas Cruz,Hridesh Rajan

from arxiv, Accepted at 45th international conference on software engineering (ICSE'2023)

Can we take a recurrent neural network (RNN) trained to translate between languages and augment it to support a new natural language without retraining the model from scratch? Can we fix the faulty behavior of the RNN by replacing portions associated with the faulty behavior? Recent works on decomposing a fully connected neural network (FCNN) and convolutional neural network (CNN) into modules have shown the value of engineering deep models in this manner, which is standard in traditional SE but foreign for deep learning models. However, prior works focus on the image-based multiclass classification problems and cannot be applied to RNN due to (a) different layer structures, (b) loop structures, (c) different types of input-output architectures, and (d) usage of both nonlinear and logistic activation functions. In this work, we propose the first approach to decompose an RNN into modules. We study different types of RNNs, i.e., Vanilla, LSTM, and GRU. Further, we show how such RNN modules can be reused and replaced in various scenarios. We evaluate our approach against 5 canonical datasets (i.e., Math QA, Brown Corpus, Wiki-toxicity, Clinc OOS, and Tatoeba) and 4 model variants for each dataset. We found that decomposing a trained model has a small cost (Accuracy: -0.6%, BLEU score: +0.10%). Also, the decomposed modules can be reused and replaced without needing to retrain.

Learning · 主動學習 · 環 · 模型評估 · 未標記 ·

2023 年 2 月 9 日

Iterative Loop Learning Combining Self-Training and Active Learning for Domain Adaptive Semantic Segmentation

Licong Guan,Xue Yuan

from arxiv, 16 pages,8 figures

Recently, self-training and active learning have been proposed to alleviate this problem. Self-training can improve model accuracy with massive unlabeled data, but some pseudo labels containing noise would be generated with limited or imbalanced training data. And there will be suboptimal models if human guidance is absent. Active learning can select more effective data to intervene, while the model accuracy can not be improved because the massive unlabeled data are not used. And the probability of querying sub-optimal samples will increase when the domain difference is too large, increasing annotation cost. This paper proposes an iterative loop learning method combining Self-Training and Active Learning (STAL) for domain adaptive semantic segmentation. The method first uses self-training to learn massive unlabeled data to improve model accuracy and provide more accurate selection models for active learning. Secondly, combined with the sample selection strategy of active learning, manual intervention is used to correct the self-training learning. Iterative loop to achieve the best performance with minimal label cost. Extensive experiments show that our method establishes state-of-the-art performance on tasks of GTAV to Cityscapes, SYNTHIA to Cityscapes, improving by 4.9% mIoU and 5.2% mIoU, compared to the previous best method, respectively. The code is available at //github.com/licongguan/STAL.

相似度 · 語義相似度 · 規范化的 · 操作 · 可約的 ·

2023 年 2 月 7 日

Impact of Combining Syntactic and Semantic Similarities on Patch Prioritization while using the Insertion Mutation Operators

Mohammed Raihan Ullah,Nazia Sultana Chowdhury,Fazle Mohammed Tawsif

Patch prioritization ranks candidate patches based on their likelihood of being correct. The fixing ingredients that are more likely to be the fix for a bug, share a high contextual similarity. A recent study shows that combining both syntactic and semantic similarity for capturing the contextual similarity, can do better in prioritizing patches. In this study, we evaluate the impact of combining the syntactic and semantic features on patch prioritization using the Insertion mutation operators. This study inspects the result of different combinations of syntactic and semantic features on patch prioritization. As a pilot study, the approach uses genealogical similarity to measure the semantic similarity and normalized longest common subsequence, normalized edit distance, cosine similarity, and Jaccard similarity index to capture the syntactic similarity. It also considers Anti-Pattern to filter out the incorrect plausible patches. The combination of both syntactic and semantic similarity can reduce the search space to a great extent. Also, the approach generates fixes for the bugs before the incorrect plausible one. We evaluate the techniques on the IntroClassJava benchmark using Insertion mutation operators and successfully generate fixes for 6 bugs before the incorrect plausible one. So, considering the previous study, the approach of combining syntactic and semantic similarity can able to solve a total number of 25 bugs from the benchmark, and to the best of our knowledge, it is the highest number of bugs solved than any other approach. The correctness of the generated fixes are further checked using the publicly available results of CapGen and thus for the generated fixes, the approach achieves a precision of 100%

語言模型化 · Learning · MoDELS · BLEURT · Performer ·

2023 年 2 月 7 日

Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

Amirkeivan Mohtashami,Mauro Verzetti,Paul K. Rubenstein

Learned metrics such as BLEURT have in recent years become widely employed to evaluate the quality of machine translation systems. Training such metrics requires data which can be expensive and difficult to acquire, particularly for lower-resource languages. We show how knowledge can be distilled from Large Language Models (LLMs) to improve upon such learned metrics without requiring human annotators, by creating synthetic datasets which can be mixed into existing datasets, requiring only a corpus of text in the target language. We show that the performance of a BLEURT-like model on lower resource languages can be improved in this way.

Learning · MoDELS · 穩健性 · 小樣本學習 · 聯想記憶 ·

2023 年 2 月 7 日

Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs

Yu Duan,Zhongfan Jia,Qian Li,Yi Zhong,Kaisheng Ma

from arxiv, Published as a conference paper at ICLR 2023

Rapidly learning from ongoing experiences and remembering past events with a flexible memory system are two core capacities of biological intelligence. While the underlying neural mechanisms are not fully understood, various evidence supports that synaptic plasticity plays a critical role in memory formation and fast learning. Inspired by these results, we equip Recurrent Neural Networks (RNNs) with plasticity rules to enable them to adapt their parameters according to ongoing experiences. In addition to the traditional local Hebbian plasticity, we propose a global, gradient-based plasticity rule, which allows the model to evolve towards its self-determined target. Our models show promising results on sequential and associative memory tasks, illustrating their ability to robustly form and retain memories. In the meantime, these models can cope with many challenging few-shot learning problems. Comparing different plasticity rules under the same framework shows that Hebbian plasticity is well-suited for several memory and associative learning tasks; however, it is outperformed by gradient-based plasticity on few-shot regression tasks which require the model to infer the underlying mapping. Code is available at //github.com/yuvenduan/PlasticRNNs.

語音識別 · 自動語音識別 · Boosting（一種模型訓練加速方式） · 語音合成 · Extensibility ·

2023 年 2 月 5 日

PAMP: A unified framework boosting low resource automatic speech recognition

Zeping Min,Qian Ge,Zhong Li,Weinan E

We propose a novel text-to-speech (TTS) data augmentation framework for low resource automatic speech recognition (ASR) tasks, named phoneme audio mix up (PAMP). The PAMP method is highly interpretable and can incorporate prior knowledge of pronunciation rules. Furthermore, PAMP can be easily deployed in almost any language, extremely for low resource ASR tasks. Extensive experiments have demonstrated the great effectiveness of PAMP on low resource ASR tasks: we achieve a \textbf{10.84\%} character error rate (CER) on the common voice Cantonese ASR task, bringing a great relative improvement of about \textbf{30\%} compared to the previous state-of-the-art which was achieved by fine-tuning the wav2vec2 pretrained model.

Learning · Continuity · surge · Taxonomy · 知識 (knowledge) ·

2022 年 11 月 8 日

Pretraining in Deep Reinforcement Learning: A Survey

Zhihui Xie,Zichuan Lin,Junyou Li,Shuai Li,Deheng Ye

The past few years have seen rapid progress in combining reinforcement learning (RL) with deep learning. Various breakthroughs ranging from games to robotics have spurred the interest in designing sophisticated RL algorithms and systems. However, the prevailing workflow in RL is to learn tabula rasa, which may incur computational inefficiency. This precludes continuous deployment of RL algorithms and potentially excludes researchers without large-scale computing resources. In many other areas of machine learning, the pretraining paradigm has shown to be effective in acquiring transferable knowledge, which can be utilized for a variety of downstream tasks. Recently, we saw a surge of interest in Pretraining for Deep RL with promising results. However, much of the research has been based on different experimental settings. Due to the nature of RL, pretraining in this field is faced with unique challenges and hence requires new design principles. In this survey, we seek to systematically review existing works in pretraining for deep reinforcement learning, provide a taxonomy of these methods, discuss each sub-field, and bring attention to open problems and future directions.

Learning · Neural Networks · Networking · 可約的 · Networks ·

2022 年 9 月 1 日

Learning with Differentiable Algorithms

Felix Petersen

from arxiv, PhD thesis (summa cum laude), University of Konstanz, 162 pages

Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.

MoDELS · 學成 · SOTA · Continuity · 深度學習 ·

2021 年 11 月 10 日

A Survey on Green Deep Learning

Jingjing Xu,Wangchunshu Zhou,Zhiyi Fu,Hao Zhou,Lei Li

In recent years, larger and deeper models are springing up and continuously pushing state-of-the-art (SOTA) results across various fields like natural language processing (NLP) and computer vision (CV). However, despite promising results, it needs to be noted that the computations required by SOTA models have been increased at an exponential rate. Massive computations not only have a surprisingly large carbon footprint but also have negative effects on research inclusiveness and deployment on real-world applications. Green deep learning is an increasingly hot research field that appeals to researchers to pay attention to energy usage and carbon emission during model training and inference. The target is to yield novel results with lightweight and efficient technologies. Many technologies can be used to achieve this goal, like model compression and knowledge distillation. This paper focuses on presenting a systematic review of the development of Green deep learning technologies. We classify these approaches into four categories: (1) compact networks, (2) energy-efficient training strategies, (3) energy-efficient inference approaches, and (4) efficient data usage. For each category, we discuss the progress that has been achieved and the unresolved challenges.

語音合成 · AIM · 學成 · 可理解性 · 穩健性 ·

2021 年 6 月 30 日

A Survey on Neural Speech Synthesis

Xu Tan,Tao Qin,Frank Soong,Tie-Yan Liu

from arxiv, A comprehensive survey on TTS, 63 pages, 18 tables, 7 figures, 450 references

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry. As the development of deep learning and artificial intelligence, neural network-based TTS has significantly improved the quality of synthesized speech in recent years. In this paper, we conduct a comprehensive survey on neural TTS, aiming to provide a good understanding of current research and future trends. We focus on the key components in neural TTS, including text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc. We further summarize resources related to TTS (e.g., datasets, opensource implementations) and discuss future research directions. This survey can serve both academic researchers and industry practitioners working on TTS.