亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology.

相關內容

語(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)(he)(he)成(cheng)(cheng)(cheng)(Speech Synthesis),也稱(cheng)為(wei)文語(yu)(yu)(yu)轉換(Text-to-Speech, TTS,它是將任意的(de)(de)(de)(de)輸入文本(ben)轉換成(cheng)(cheng)(cheng)自(zi)然流(liu)暢的(de)(de)(de)(de)語(yu)(yu)(yu)音(yin)(yin)(yin)輸出。語(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)(he)(he)成(cheng)(cheng)(cheng)涉及到人工智能(neng)、心理學(xue)(xue)、聲學(xue)(xue)、語(yu)(yu)(yu)言學(xue)(xue)、數(shu)字信號處理、計(ji)算機科學(xue)(xue)等多(duo)個學(xue)(xue)科技(ji)(ji)術,是信息(xi)(xi)處理領域中的(de)(de)(de)(de)一項前(qian)沿技(ji)(ji)術。 隨(sui)著計(ji)算機技(ji)(ji)術的(de)(de)(de)(de)不斷提高(gao),語(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)(he)(he)成(cheng)(cheng)(cheng)技(ji)(ji)術從早(zao)期的(de)(de)(de)(de)共振(zhen)峰(feng)合(he)(he)(he)(he)成(cheng)(cheng)(cheng),逐步(bu)發展為(wei)波形拼(pin)接合(he)(he)(he)(he)成(cheng)(cheng)(cheng)和統計(ji)參數(shu)語(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)(he)(he)成(cheng)(cheng)(cheng),再發展到混合(he)(he)(he)(he)語(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)(he)(he)成(cheng)(cheng)(cheng);合(he)(he)(he)(he)成(cheng)(cheng)(cheng)語(yu)(yu)(yu)音(yin)(yin)(yin)的(de)(de)(de)(de)質量、自(zi)然度已經得到明顯提高(gao),基本(ben)能(neng)滿足一些特定(ding)場合(he)(he)(he)(he)的(de)(de)(de)(de)應(ying)用(yong)需(xu)求。目前(qian),語(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)(he)(he)成(cheng)(cheng)(cheng)技(ji)(ji)術在(zai)(zai)銀(yin)行、醫院等的(de)(de)(de)(de)信息(xi)(xi)播(bo)報系(xi)統、汽車導航系(xi)統、自(zi)動應(ying)答(da)呼叫(jiao)中心等都有廣泛應(ying)用(yong),取得了(le)巨大的(de)(de)(de)(de)經濟效益。 另外,隨(sui)著智能(neng)手機、MP3、PDA 等與我們生(sheng)活密切相關的(de)(de)(de)(de)媒介的(de)(de)(de)(de)大量涌現,語(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)(he)(he)成(cheng)(cheng)(cheng)的(de)(de)(de)(de)應(ying)用(yong)也在(zai)(zai)逐漸向娛樂、語(yu)(yu)(yu)音(yin)(yin)(yin)教學(xue)(xue)、康(kang)復(fu)治療等領域深入。可以說語(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)(he)(he)成(cheng)(cheng)(cheng)正在(zai)(zai)影響著人們生(sheng)活的(de)(de)(de)(de)方方面面。

In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models. To this end, we propose to utilize prompt learning and mitigate the above two challenges together. Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model. We further explore the effect of different prompt configurations and analyze the robustness to missing modality. Extensive experiments are conducted to show the effectiveness of our prompt learning framework that improves the performance under various missing-modality cases, while alleviating the requirement of heavy model re-training. Code is available.

Pre-trained Language Models (PLMs) which are trained on large text corpus via self-supervised learning method, have yielded promising performance on various tasks in Natural Language Processing (NLP). However, though PLMs with huge parameters can effectively possess rich knowledge learned from massive training text and benefit downstream tasks at the fine-tuning stage, they still have some limitations such as poor reasoning ability due to the lack of external knowledge. Research has been dedicated to incorporating knowledge into PLMs to tackle these issues. In this paper, we present a comprehensive review of Knowledge-Enhanced Pre-trained Language Models (KE-PLMs) to provide a clear insight into this thriving field. We introduce appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight these two main tasks of NLP. For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG), and rule knowledge. The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods. Finally, we point out some promising future directions of KE-PLMs.

Molecular design and synthesis planning are two critical steps in the process of molecular discovery that we propose to formulate as a single shared task of conditional synthetic pathway generation. We report an amortized approach to generate synthetic pathways as a Markov decision process conditioned on a target molecular embedding. This approach allows us to conduct synthesis planning in a bottom-up manner and design synthesizable molecules by decoding from optimized conditional codes, demonstrating the potential to solve both problems of design and synthesis simultaneously. The approach leverages neural networks to probabilistically model the synthetic trees, one reaction step at a time, according to reactivity rules encoded in a discrete action space of reaction templates. We train these networks on hundreds of thousands of artificial pathways generated from a pool of purchasable compounds and a list of expert-curated templates. We validate our method with (a) the recovery of molecules using conditional generation, (b) the identification of synthesizable structural analogs, and (c) the optimization of molecular structures given oracle functions relevant to drug discovery.

Federated learning is a new distributed machine learning framework, where a bunch of heterogeneous clients collaboratively train a model without sharing training data. In this work, we consider a practical and ubiquitous issue in federated learning: intermittent client availability, where the set of eligible clients may change during the training process. Such an intermittent client availability model would significantly deteriorate the performance of the classical Federated Averaging algorithm (FedAvg for short). We propose a simple distributed non-convex optimization algorithm, called Federated Latest Averaging (FedLaAvg for short), which leverages the latest gradients of all clients, even when the clients are not available, to jointly update the global model in each iteration. Our theoretical analysis shows that FedLaAvg attains the convergence rate of $O(1/(N^{1/4} T^{1/2}))$, achieving a sublinear speedup with respect to the total number of clients. We implement and evaluate FedLaAvg with the CIFAR-10 dataset. The evaluation results demonstrate that FedLaAvg indeed reaches a sublinear speedup and achieves 4.23% higher test accuracy than FedAvg.

Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.

In this paper, we propose a deep reinforcement learning framework called GCOMB to learn algorithms that can solve combinatorial problems over large graphs. GCOMB mimics the greedy algorithm in the original problem and incrementally constructs a solution. The proposed framework utilizes Graph Convolutional Network (GCN) to generate node embeddings that predicts the potential nodes in the solution set from the entire node set. These embeddings enable an efficient training process to learn the greedy policy via Q-learning. Through extensive evaluation on several real and synthetic datasets containing up to a million nodes, we establish that GCOMB is up to 41% better than the state of the art, up to seven times faster than the greedy algorithm, robust and scalable to large dynamic networks.

We examine the problem of question answering over knowledge graphs, focusing on simple questions that can be answered by the lookup of a single fact. Adopting a straightforward decomposition of the problem into entity detection, entity linking, relation prediction, and evidence combination, we explore simple yet strong baselines. On the popular SimpleQuestions dataset, we find that basic LSTMs and GRUs plus a few heuristics yield accuracies that approach the state of the art, and techniques that do not use neural networks also perform reasonably well. These results show that gains from sophisticated deep learning techniques proposed in the literature are quite modest and that some previous models exhibit unnecessary complexity.

In this paper, we propose a novel multi-task learning architecture, which incorporates recent advances in attention mechanisms. Our approach, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with task-specific soft-attention modules, which are trainable in an end-to-end manner. These attention modules allow for learning of task-specific features from the global pool, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. Experiments on the CityScapes dataset show that our method outperforms several baselines in both single-task and multi-task learning, and is also more robust to the various weighting schemes in the multi-task loss function. We further explore the effectiveness of our method through experiments over a range of task complexities, and show how our method scales well with task complexity compared to baselines.

Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.

In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.

北京阿比特科技有限公司