亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tr id='7XTAa'><strong id='YYjVV'></strong><small id='wLNmJ'></small><button id='47tN5'></button><li id='ObsY4'><noscript id='6bQSM'><big id='EnWhT'></big><dt id='pqpYP'></dt></noscript></li></tr><ol id='mqVyW'><option id='oLg4z'><table id='Qx1zT'><blockquote id='6F6Gf'><tbody id='XiBjq'></tbody></blockquote></table></option></ol><u id='jYFDt'></u><kbd id='MQF9S'><kbd id='yp4cP'></kbd></kbd>

<code id='pmBlL'><strong id='sPKCg'></strong></code>

<fieldset id='6Ouqp'></fieldset>

<span id='uNgmF'></span>

<ins id='KHCIE'></ins>

<acronym id='Aifo2'><em id='ir09D'></em><td id='fNOBX'><div id='lhSvw'></div></td></acronym><address id='u6ooW'><big id='1x1W2'><big id='pmZ3b'></big><legend id='6nIBC'></legend></big></address>

<i id='7IB7f'><div id='b6hF0'><ins id='Ge9o3'></ins></div></i>

<i id='r63pP'></i>

·

語音合成 · MoDELS · 可約的 · Performer · Neural Networks ·

2024 年 3 月 13 日

EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech

Ziqi Liang,Haoxiang Shi,Jiawei Wang,Keda Lu

from arxiv, Accepted by the 27th IEEE International Conference on Computer Supported Cooperative Work in Design (IEEE CSCWD 2024). arXiv admin note: substantial text overlap with arXiv:2211.01948

Recently, deep learning-based Text-to-Speech (TTS) systems have achieved high-quality speech synthesis results. Recurrent neural networks have become a standard modeling technique for sequential data in TTS systems and are widely used. However, training a TTS model which includes RNN components requires powerful GPU performance and takes a long time. In contrast, CNN-based sequence synthesis techniques can significantly reduce the parameters and training time of a TTS model while guaranteeing a certain performance due to their high parallelism, which alleviate these economic costs of training. In this paper, we propose a lightweight TTS system based on deep convolutional neural networks, which is a two-stage training end-to-end TTS model and does not employ any recurrent units. Our model consists of two stages: Text2Spectrum and SSRN. The former is used to encode phonemes into a coarse mel spectrogram and the latter is used to synthesize the complete spectrum from the coarse mel spectrogram. Meanwhile, we improve the robustness of our model by a series of data augmentations, such as noise suppression, time warping, frequency masking and time masking, for solving the low resource mongolian problem. Experiments show that our model can reduce the training time and parameters while ensuring the quality and naturalness of the synthesized speech compared to using mainstream TTS models. Our method uses NCMMSC2022-MTTSC Challenge dataset for validation, which significantly reduces training time while maintaining a certain accuracy.

相關內容

語音合成

語音(yin)合成

語(yu)(yu)(yu)(yu)(yu)(yu)(yu)音合(he)(he)成（Speech Synthesis），也稱為(wei)文語(yu)(yu)(yu)(yu)(yu)(yu)(yu)轉(zhuan)換（Text-to-Speech, TTS,它是(shi)將(jiang)任意的(de)(de)(de)(de)(de)輸(shu)入文本轉(zhuan)換成自然流(liu)暢的(de)(de)(de)(de)(de)語(yu)(yu)(yu)(yu)(yu)(yu)(yu)音輸(shu)出。語(yu)(yu)(yu)(yu)(yu)(yu)(yu)音合(he)(he)成涉及到人(ren)工智能(neng)、心理(li)學(xue)、聲學(xue)、語(yu)(yu)(yu)(yu)(yu)(yu)(yu)言學(xue)、數字信號(hao)處理(li)、計算機(ji)科學(xue)等(deng)多個學(xue)科技術(shu)，是(shi)信息(xi)處理(li)領域中(zhong)的(de)(de)(de)(de)(de)一項前沿技術(shu)。隨著(zhu)(zhu)計算機(ji)技術(shu)的(de)(de)(de)(de)(de)不斷提高，語(yu)(yu)(yu)(yu)(yu)(yu)(yu)音合(he)(he)成技術(shu)從早(zao)期的(de)(de)(de)(de)(de)共振峰合(he)(he)成,逐(zhu)步發展為(wei)波形拼接合(he)(he)成和統計參(can)數語(yu)(yu)(yu)(yu)(yu)(yu)(yu)音合(he)(he)成，再發展到混合(he)(he)語(yu)(yu)(yu)(yu)(yu)(yu)(yu)音合(he)(he)成；合(he)(he)成語(yu)(yu)(yu)(yu)(yu)(yu)(yu)音的(de)(de)(de)(de)(de)質量、自然度已經(jing)得(de)到明顯(xian)提高，基(ji)本能(neng)滿足一些特(te)定場合(he)(he)的(de)(de)(de)(de)(de)應(ying)用(yong)需求。目(mu)前，語(yu)(yu)(yu)(yu)(yu)(yu)(yu)音合(he)(he)成技術(shu)在(zai)銀(yin)行、醫院等(deng)的(de)(de)(de)(de)(de)信息(xi)播報系統、汽車導航系統、自動(dong)應(ying)答(da)呼(hu)叫中(zhong)心等(deng)都有廣泛應(ying)用(yong)，取得(de)了(le)巨大(da)的(de)(de)(de)(de)(de)經(jing)濟效益(yi)。另外(wai)，隨著(zhu)(zhu)智能(neng)手機(ji)、MP3、PDA 等(deng)與我們生活密切相關的(de)(de)(de)(de)(de)媒介的(de)(de)(de)(de)(de)大(da)量涌現，語(yu)(yu)(yu)(yu)(yu)(yu)(yu)音合(he)(he)成的(de)(de)(de)(de)(de)應(ying)用(yong)也在(zai)逐(zhu)漸向娛樂、語(yu)(yu)(yu)(yu)(yu)(yu)(yu)音教(jiao)學(xue)、康復治(zhi)療(liao)等(deng)領域深(shen)入。可(ke)以(yi)說語(yu)(yu)(yu)(yu)(yu)(yu)(yu)音合(he)(he)成正在(zai)影(ying)響著(zhu)(zhu)人(ren)們生活的(de)(de)(de)(de)(de)方(fang)方(fang)面(mian)面(mian)。

LIDAR · 穩健性 · 回合 · 點云 · INFORMS ·

2024 年 4 月 25 日

COIN-LIO: Complementary Intensity-Augmented LiDAR Inertial Odometry

Patrick Pfreundschuh,Helen Oleynikova,Cesar Cadena,Roland Siegwart,Olov Andersson

We present COIN-LIO, a LiDAR Inertial Odometry pipeline that tightly couples information from LiDAR intensity with geometry-based point cloud registration. The focus of our work is to improve the robustness of LiDAR-inertial odometry in geometrically degenerate scenarios, like tunnels or flat fields. We project LiDAR intensity returns into an intensity image, and propose an image processing pipeline that produces filtered images with improved brightness consistency within the image as well as across different scenes. To effectively leverage intensity as an additional modality, we present a novel feature selection scheme that detects uninformative directions in the point cloud registration and explicitly selects patches with complementary image information. Photometric error minimization in the image patches is then fused with inertial measurements and point-to-plane registration in an iterated Extended Kalman Filter. The proposed approach improves accuracy and robustness on a public dataset. We additionally publish a new dataset, that captures five real-world environments in challenging, geometrically degenerate scenes. By using the additional photometric information, our approach shows drastically improved robustness against geometric degeneracy in environments where all compared baseline approaches fail.

Neural Networks · Automator · CASES · TOOLS · Networking ·

2024 年 4 月 23 日

ITER: Iterative Neural Repair for Multi-Location Patches

He Ye,Martin Monperrus

Automated program repair (APR) has achieved promising results, especially using neural networks. Yet, the overwhelming majority of patches produced by APR tools are confined to one single location. When looking at the patches produced with neural repair, most of them fail to compile, while a few uncompilable ones go in the right direction. In both cases, the fundamental problem is to ignore the potential of partial patches. In this paper, we propose an iterative program repair paradigm called ITER founded on the concept of improving partial patches until they become plausible and correct. First, ITER iteratively improves partial single-location patches by fixing compilation errors and further refining the previously generated code. Second, ITER iteratively improves partial patches to construct multi-location patches, with fault localization re-execution. ITER is implemented for Java based on battle-proven deep neural networks and code representation. ITER is evaluated on 476 bugs from 10 open-source projects in Defects4J 2.0. ITER succeeds in repairing 15.5% of them, including 9 uniquely repaired multi-location bugs.

Performer · Extensibility · 多樣性 · MoDELS · INFORMS ·

2024 年 4 月 22 日

UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation

Siru Zhong,Xixuan Hao,Yibo Yan,Ying Zhang,Yangqiu Song,Yuxuan Liang

Urbanization challenges underscore the necessity for effective satellite image-text retrieval methods to swiftly access specific information enriched with geographic semantics for urban applications. However, existing methods often overlook significant domain gaps across diverse urban landscapes, primarily focusing on enhancing retrieval performance within single domains. To tackle this issue, we present UrbanCross, a new framework for cross-domain satellite image-text retrieval. UrbanCross leverages a high-quality, cross-domain dataset enriched with extensive geo-tags from three countries to highlight domain diversity. It employs the Large Multimodal Model (LMM) for textual refinement and the Segment Anything Model (SAM) for visual augmentation, achieving a fine-grained alignment of images, segments and texts, yielding a 10% improvement in retrieval performance. Additionally, UrbanCross incorporates an adaptive curriculum-based source sampler and a weighted adversarial cross-domain fine-tuning module, progressively enhancing adaptability across various domains. Extensive experiments confirm UrbanCross's superior efficiency in retrieval and adaptation to new urban environments, demonstrating an average performance increase of 15% over its version without domain adaptation mechanisms, effectively bridging the domain gap.

泛函 · 有向 · 有向非循環圖 · 圖 · DAG ·

2024 年 4 月 22 日

MultiFun-DAG: Multivariate Functional Directed Acyclic Graph

Tian Lan,Ziyue Li,Junpeng Lin,Zhishuai Li,Lei Bai,Man Li,Fugee Tsung,Rui Zhao,Chen Zhang

Directed Acyclic Graphical (DAG) models efficiently formulate causal relationships in complex systems. Traditional DAGs assume nodes to be scalar variables, characterizing complex systems under a facile and oversimplified form. This paper considers that nodes can be multivariate functional data and thus proposes a multivariate functional DAG (MultiFun-DAG). It constructs a hidden bilinear multivariate function-to-function regression to describe the causal relationships between different nodes. Then an Expectation-Maximum algorithm is used to learn the graph structure as a score-based algorithm with acyclic constraints. Theoretical properties are diligently derived. Prudent numerical studies and a case study from urban traffic congestion analysis are conducted to show MultiFun-DAG's effectiveness.

Attention · 圖像分割 · INFORMS · 解碼 · 掩碼 ·

2024 年 4 月 19 日

MARIS: Referring Image Segmentation via Mutual-Aware Attention Features

Mengxi Zhang,Yiming Liu,Xiangjun Yin,Huanjing Yue,Jingyu Yang

Referring image segmentation (RIS) aims to segment a particular region based on a language expression prompt. Existing methods incorporate linguistic features into visual features and obtain multi-modal features for mask decoding. However, these methods may segment the visually salient entity instead of the correct referring region, as the multi-modal features are dominated by the abundant visual context. In this paper, we propose MARIS, a referring image segmentation method that leverages the Segment Anything Model (SAM) and introduces a mutual-aware attention mechanism to enhance the cross-modal fusion via two parallel branches. Specifically, our mutual-aware attention mechanism consists of Vision-Guided Attention and Language-Guided Attention, which bidirectionally model the relationship between visual and linguistic features. Correspondingly, we design a Mask Decoder to enable explicit linguistic guidance for more consistent segmentation with the language expression. To this end, a multi-modal query token is proposed to integrate linguistic information and interact with visual information simultaneously. Extensive experiments on three benchmark datasets show that our method outperforms the state-of-the-art RIS methods. Our code will be publicly available.

Graph Transformer · Networking · 變換 · 圖 · 可辨認的 ·

2024 年 4 月 18 日

DST-GTN: Dynamic Spatio-Temporal Graph Transformer Network for Traffic Forecasting

Songtao Huang,Hongjin Song,Tianqi Jiang,Akbar Telikani,Jun Shen,Qingguo Zhou,Binbin Yong,Qiang Wu

Accurate traffic forecasting is essential for effective urban planning and congestion management. Deep learning (DL) approaches have gained colossal success in traffic forecasting but still face challenges in capturing the intricacies of traffic dynamics. In this paper, we identify and address this challenges by emphasizing that spatial features are inherently dynamic and change over time. A novel in-depth feature representation, called Dynamic Spatio-Temporal (Dyn-ST) features, is introduced, which encapsulates spatial characteristics across varying times. Moreover, a Dynamic Spatio-Temporal Graph Transformer Network (DST-GTN) is proposed by capturing Dyn-ST features and other dynamic adjacency relations between intersections. The DST-GTN can model dynamic ST relationships between nodes accurately and refine the representation of global and local ST characteristics by adopting adaptive weights in low-pass and all-pass filters, enabling the extraction of Dyn-ST features from traffic time-series data. Through numerical experiments on public datasets, the DST-GTN achieves state-of-the-art performance for a range of traffic forecasting tasks and demonstrates enhanced stability.

學成 · 可約的 · MoDELS · 深度學習 · Better ·

2021 年 11 月 2 日

Large-Scale Deep Learning Optimizations: A Comprehensive Survey

Xiaoxin He,Fuzhao Xue,Xiaozhe Ren,Yang You

Deep learning have achieved promising results on a wide spectrum of AI applications. Larger datasets and models consistently yield better performance. However, we generally spend longer training time on more computation and communication. In this survey, we aim to provide a clear sketch about the optimizations for large-scale deep learning with regard to the model accuracy and model efficiency. We investigate algorithms that are most commonly used for optimizing, elaborate the debatable topic of generalization gap arises in large-batch training, and review the SOTA strategies in addressing the communication overhead and reducing the memory footprints.

類別 · 學成 · MoDELS · 可辨認的 · Taxonomy ·

2021 年 10 月 9 日

Deep Long-Tailed Learning: A Survey

Yifan Zhang,Bingyi Kang,Bryan Hooi,Shuicheng Yan,Jiashi Feng

Deep long-tailed learning, one of the most challenging problems in visual recognition, aims to train well-performing deep models from a large number of images that follow a long-tailed class distribution. In the last decade, deep learning has emerged as a powerful recognition model for learning high-quality image representations and has led to remarkable breakthroughs in generic visual recognition. However, long-tailed class imbalance, a common problem in practical visual recognition tasks, often limits the practicality of deep network based recognition models in real-world applications, since they can be easily biased towards dominant classes and perform poorly on tail classes. To address this problem, a large number of studies have been conducted in recent years, making promising progress in the field of deep long-tailed learning. Considering the rapid evolution of this field, this paper aims to provide a comprehensive survey on recent advances in deep long-tailed learning. To be specific, we group existing deep long-tailed learning studies into three main categories (i.e., class re-balancing, information augmentation and module improvement), and review these methods following this taxonomy in detail. Afterward, we empirically analyze several state-of-the-art methods by evaluating to what extent they address the issue of class imbalance via a newly proposed evaluation metric, i.e., relative accuracy. We conclude the survey by highlighting important applications of deep long-tailed learning and identifying several promising directions for future research.

SSL · 圖 · 學成 · INFORMS · Performer ·

2021 年 8 月 5 日

Graph Self-Supervised Learning: A Survey

Yixin Liu,Shirui Pan,Ming Jin,Chuan Zhou,Feng Xia,Philip S. Yu

from arxiv, 25 pages, 8 figures, 9 tables

Deep learning on graphs has attracted significant interests recently. However, most of the works have focused on (semi-) supervised learning, resulting in shortcomings including heavy label reliance, poor generalization, and weak robustness. To address these issues, self-supervised learning (SSL), which extracts informative knowledge through well-designed pretext tasks without relying on manual labels, has become a promising and trending learning paradigm for graph data. Different from SSL on other domains like computer vision and natural language processing, SSL on graphs has an exclusive background, design ideas, and taxonomies. Under the umbrella of graph self-supervised learning, we present a timely and comprehensive review of the existing approaches which employ SSL techniques for graph data. We construct a unified framework that mathematically formalizes the paradigm of graph SSL. According to the objectives of pretext tasks, we divide these approaches into four categories: generation-based, auxiliary property-based, contrast-based, and hybrid approaches. We further conclude the applications of graph SSL across various research fields and summarize the commonly used datasets, evaluation benchmark, performance comparison and open-source codes of graph SSL. Finally, we discuss the remaining challenges and potential future directions in this research field.

有向 · 注意力機制 · 可理解性 · 模型評估 · Networking ·

2017 年 11 月 20 日

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Tao Shen,Tianyi Zhou,Guodong Long,Jing Jiang,Shirui Pan,Chengqi Zhang

from arxiv, 10 pages, 8 figures; Accepted in AAAI-18

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, "Directional Self-Attention Network (DiSAN)", is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

語音合成(cheng)

Neural Networks

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='lq1lg'></tfoot>

<legend id='lq1lg'><style id='lq1lg'><dir id='lq1lg'><q id='lq1lg'></q></dir></style></legend>

<i id='lq1lg'><tr id='lq1lg'><dt id='lq1lg'><q id='lq1lg'><span id='lq1lg'><b id='lq1lg'><form id='lq1lg'><ins id='lq1lg'></ins><ul id='lq1lg'></ul><sub id='lq1lg'></sub></form><legend id='lq1lg'></legend><bdo id='lq1lg'><pre id='lq1lg'><center id='lq1lg'></center></pre></bdo></b><th id='lq1lg'></th></span></q></dt></tr></i><div id='lq1lg'><tfoot id='lq1lg'></tfoot><dl id='lq1lg'><fieldset id='lq1lg'></fieldset></dl></div>

<li id='lq1lg'><abbr id='lq1lg'></abbr></li>