亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We present an expanded version of our previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus. In the new KazakhTTS2 corpus, the overall size is increased from 93 hours to 271 hours, the number of speakers has risen from two to five (three females and two males), and the topic coverage is diversified with the help of new sources, including a book and Wikipedia articles. This corpus is necessary for building high-quality TTS systems for Kazakh, a Central Asian agglutinative language from the Turkic family, which presents several linguistic challenges. We describe the corpus construction process and provide the details of the training and evaluation procedures for the TTS system. Our experimental results indicate that the constructed corpus is sufficient to build robust TTS models for real-world applications, with a subjective mean opinion score of above 4.0 for all the five speakers. We believe that our corpus will facilitate speech and language research for Kazakh and other Turkic languages, which are widely considered to be low-resource due to the limited availability of free linguistic data. The constructed corpus, code, and pretrained models are publicly available in our GitHub repository.

相關內容

語(yu)(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)(he)成(cheng)(cheng)(cheng)(cheng)(Speech Synthesis),也稱為(wei)文語(yu)(yu)(yu)(yu)(yu)(yu)轉換(Text-to-Speech, TTS,它是將任意的(de)(de)輸(shu)入文本轉換成(cheng)(cheng)(cheng)(cheng)自然流暢的(de)(de)語(yu)(yu)(yu)(yu)(yu)(yu)音(yin)輸(shu)出。語(yu)(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)(he)成(cheng)(cheng)(cheng)(cheng)涉及到(dao)人工智(zhi)能、心(xin)(xin)理(li)學(xue)、聲學(xue)、語(yu)(yu)(yu)(yu)(yu)(yu)言學(xue)、數字信(xin)(xin)號處(chu)理(li)、計算機科學(xue)等(deng)多個學(xue)科技(ji)術,是信(xin)(xin)息(xi)處(chu)理(li)領域中(zhong)的(de)(de)一(yi)項(xiang)前沿技(ji)術。 隨著(zhu)計算機技(ji)術的(de)(de)不斷(duan)提(ti)高,語(yu)(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)(he)成(cheng)(cheng)(cheng)(cheng)技(ji)術從早期(qi)的(de)(de)共振峰合(he)(he)(he)成(cheng)(cheng)(cheng)(cheng),逐(zhu)步發展(zhan)為(wei)波形拼接(jie)合(he)(he)(he)成(cheng)(cheng)(cheng)(cheng)和統(tong)計參數語(yu)(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)(he)成(cheng)(cheng)(cheng)(cheng),再發展(zhan)到(dao)混合(he)(he)(he)語(yu)(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)(he)成(cheng)(cheng)(cheng)(cheng);合(he)(he)(he)成(cheng)(cheng)(cheng)(cheng)語(yu)(yu)(yu)(yu)(yu)(yu)音(yin)的(de)(de)質量、自然度已經(jing)(jing)得(de)到(dao)明顯提(ti)高,基本能滿足一(yi)些特定場(chang)合(he)(he)(he)的(de)(de)應(ying)用需求。目(mu)前,語(yu)(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)(he)成(cheng)(cheng)(cheng)(cheng)技(ji)術在(zai)銀行、醫(yi)院等(deng)的(de)(de)信(xin)(xin)息(xi)播報(bao)系(xi)統(tong)、汽車導航系(xi)統(tong)、自動應(ying)答呼叫(jiao)中(zhong)心(xin)(xin)等(deng)都(dou)有廣泛應(ying)用,取得(de)了巨大(da)的(de)(de)經(jing)(jing)濟效益(yi)。 另(ling)外,隨著(zhu)智(zhi)能手機、MP3、PDA 等(deng)與我們(men)生活(huo)密(mi)切(qie)相關的(de)(de)媒(mei)介的(de)(de)大(da)量涌現,語(yu)(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)(he)成(cheng)(cheng)(cheng)(cheng)的(de)(de)應(ying)用也在(zai)逐(zhu)漸向(xiang)娛(yu)樂、語(yu)(yu)(yu)(yu)(yu)(yu)音(yin)教學(xue)、康復治(zhi)療等(deng)領域深(shen)入。可以說語(yu)(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)(he)成(cheng)(cheng)(cheng)(cheng)正在(zai)影響著(zhu)人們(men)生活(huo)的(de)(de)方方面面。

Teaching morals is one of the most important purposes of storytelling. An essential ability for understanding and writing moral stories is bridging story plots and implied morals. Its challenges mainly lie in: (1) grasping knowledge about abstract concepts in morals, (2) capturing inter-event discourse relations in stories, and (3) aligning value preferences of stories and morals concerning good or bad behavior. In this paper, we propose two understanding tasks and two generation tasks to assess these abilities of machines. We present STORAL, a new dataset of Chinese and English human-written moral stories. We show the difficulty of the proposed tasks by testing various models with automatic and manual evaluation on STORAL. Furthermore, we present a retrieval-augmented algorithm that effectively exploits related concepts or events in training sets as additional guidance to improve performance on these tasks.

HeidelTime is one of the most widespread and successful tools for detecting temporal expressions in texts. Since HeidelTime's pattern matching system is based on regular expression, it can be extended in a convenient way. We present such an extension for the German resources of HeidelTime: HeidelTime-EXT . The extension has been brought about by means of observing false negatives within real world texts and various time banks. The gain in coverage is 2.7% or 8.5%, depending on the admitted degree of potential overgeneralization. We describe the development of HeidelTime-EXT, its evaluation on text samples from various genres, and share some linguistic observations. HeidelTime ext can be obtained from //github.com/texttechnologylab/heideltime.

When IP-packet processing is unconditionally carried out on behalf of an operating system kernel thread, processing systems can experience overload in high incoming traffic scenarios. This is especially worrying for embedded real-time devices controlling their physical environment in industrial IoT scenarios and automotive systems. We propose an embedded real-time aware IP stack adaption with an early demultiplexing scheme for incoming packets and subsequent per-flow aperiodic scheduling. By instrumenting existing embedded IP stacks, rigid prioritization with minimal latency is deployed without the need of further task resources. Simple mitigation techniques can be applied to individual flows, causing hardly measurable overhead while at the same time protecting the system from overload conditions. Our IP stack adaption is able to reduce the low-priority packet processing time by over 86% compared to an unmodified stack. The network subsystem can thereby remain active at a 7x higher general traffic load before disabling the receive IRQ as a last resort to assure deadlines.

Self-Supervised Learning (SSL) models have been successfully applied in various deep learning-based speech tasks, particularly those with a limited amount of data. However, the quality of SSL representations depends highly on the relatedness between the SSL training domain(s) and the target data domain. On the contrary, spectral feature (SF) extractors such as log Mel-filterbanks are hand-crafted non-learnable components, and could be more robust to domain shifts. The present work examines the assumption that combining non-learnable SF extractors to SSL models is an effective approach to low resource speech tasks. We propose a learnable and interpretable framework to combine SF and SSL representations. The proposed framework outperforms significantly both baseline and SSL models on Automatic Speech Recognition (ASR) and Speech Translation (ST) tasks on three low resource datasets. We additionally design a mixture of experts based combination model. This last model reveals that the relative contribution of SSL models over conventional SF extractors is very small in case of domain mismatch between SSL training set and the target language data.

Personalized dialogue systems explore the problem of generating responses that are consistent with the user's personality, which has raised much attention in recent years. Existing personalized dialogue systems have tried to extract user profiles from dialogue history to guide personalized response generation. Since the dialogue history is usually long and noisy, most existing methods truncate the dialogue history to model the user's personality. Such methods can generate some personalized responses, but a large part of dialogue history is wasted, leading to sub-optimal performance of personalized response generation. In this work, we propose to refine the user dialogue history on a large scale, based on which we can handle more dialogue history and obtain more abundant and accurate persona information. Specifically, we design an MSP model which consists of three personal information refiners and a personalized response generator. With these multi-level refiners, we can sparsely extract the most valuable information (tokens) from the dialogue history and leverage other similar users' data to enhance personalization. Experimental results on two real-world datasets demonstrate the superiority of our model in generating more informative and personalized responses.

A High-dimensional and sparse (HiDS) matrix is frequently encountered in a big data-related application like an e-commerce system or a social network services system. To perform highly accurate representation learning on it is of great significance owing to the great desire of extracting latent knowledge and patterns from it. Latent factor analysis (LFA), which represents an HiDS matrix by learning the low-rank embeddings based on its observed entries only, is one of the most effective and efficient approaches to this issue. However, most existing LFA-based models perform such embeddings on a HiDS matrix directly without exploiting its hidden graph structures, thereby resulting in accuracy loss. To address this issue, this paper proposes a graph-incorporated latent factor analysis (GLFA) model. It adopts two-fold ideas: 1) a graph is constructed for identifying the hidden high-order interaction (HOI) among nodes described by an HiDS matrix, and 2) a recurrent LFA structure is carefully designed with the incorporation of HOI, thereby improving the representa-tion learning ability of a resultant model. Experimental results on three real-world datasets demonstrate that GLFA outperforms six state-of-the-art models in predicting the missing data of an HiDS matrix, which evidently supports its strong representation learning ability to HiDS data.

Requirements engineering (RE) activities for Machine Learning (ML) are not well-established and researched in the literature. Many issues and challenges exist when specifying, designing, and developing ML-enabled systems. Adding more focus on RE for ML can help to develop more reliable ML-enabled systems. Based on insights collected from previous work and industrial experiences, we propose a catalogue of 45 concerns to be considered when specifying ML-enabled systems, covering five different perspectives we identified as relevant for such systems: objectives, user experience, infrastructure, model, and data. Examples of such concerns include the execution engine and telemetry for the infrastructure perspective, and explainability and reproducibility for the model perspective. We conducted a focus group session with eight software professionals with experience developing ML-enabled systems to validate the importance, quality and feasibility of using our catalogue. The feedback allowed us to improve the catalogue and confirmed its practical relevance. The main research contribution of this work consists in providing a validated set of concerns grouped into perspectives that can be used by requirements engineers to support the specification of ML-enabled systems.

In recent years, large pre-trained transformers have led to substantial gains in performance over traditional retrieval models and feedback approaches. However, these results are primarily based on the MS Marco/TREC Deep Learning Track setup, with its very particular setup, and our understanding of why and how these models work better is fragmented at best. We analyze effective BERT-based cross-encoders versus traditional BM25 ranking for the passage retrieval task where the largest gains have been observed, and investigate two main questions. On the one hand, what is similar? To what extent does the neural ranker already encompass the capacity of traditional rankers? Is the gain in performance due to a better ranking of the same documents (prioritizing precision)? On the other hand, what is different? Can it retrieve effectively documents missed by traditional systems (prioritizing recall)? We discover substantial differences in the notion of relevance identifying strengths and weaknesses of BERT that may inspire research for future improvement. Our results contribute to our understanding of (black-box) neural rankers relative to (well-understood) traditional rankers, help understand the particular experimental setting of MS-Marco-based test collections.

Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.

Sentiment analysis is a widely studied NLP task where the goal is to determine opinions, emotions, and evaluations of users towards a product, an entity or a service that they are reviewing. One of the biggest challenges for sentiment analysis is that it is highly language dependent. Word embeddings, sentiment lexicons, and even annotated data are language specific. Further, optimizing models for each language is very time consuming and labor intensive especially for recurrent neural network models. From a resource perspective, it is very challenging to collect data for different languages. In this paper, we look for an answer to the following research question: can a sentiment analysis model trained on a language be reused for sentiment analysis in other languages, Russian, Spanish, Turkish, and Dutch, where the data is more limited? Our goal is to build a single model in the language with the largest dataset available for the task, and reuse it for languages that have limited resources. For this purpose, we train a sentiment analysis model using recurrent neural networks with reviews in English. We then translate reviews in other languages and reuse this model to evaluate the sentiments. Experimental results show that our robust approach of single model trained on English reviews statistically significantly outperforms the baselines in several different languages.

北京阿比特科技有限公司