蜜桃少妇AV久久久久久久_伊人亚洲综合青草青草久热_轻轻色在线视频中文字幕_草草影院欧美精品第一页_久久99精品国产自在现线_亚洲精品无码人妻无码不卡_亚洲无码免费观看

This paper integrates graph-to-sequence into an end-to-end text-to-speech framework for syntax-aware modelling with syntactic information of input text. Specifically, the input text is parsed by a dependency parsing module to form a syntactic graph. The syntactic graph is then encoded by a graph encoder to extract the syntactic hidden information, which is concatenated with phoneme embedding and input to the alignment and flow-based decoding modules to generate the raw audio waveform. The model is experimented on two languages, English and Mandarin, using single-speaker, few samples of target speakers, and multi-speaker datasets, respectively. Experimental results show better prosodic consistency performance between input text and generated audio, and also get higher scores in the subjective prosodic evaluation, and show the ability of voice conversion. Besides, the efficiency of the model is largely boosted through the design of the AI chip operator with 5x acceleration.

相關內容

語音合成

關注 491

語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)（Speech Synthesis），也稱為文語(yu)轉換（Text-to-Speech, TTS,它是將任意的(de)(de)(de)(de)(de)輸入(ru)文本轉換成(cheng)(cheng)(cheng)(cheng)自然流暢的(de)(de)(de)(de)(de)語(yu)音(yin)輸出(chu)。語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)涉及到人工智能(neng)、心(xin)理學(xue)、聲學(xue)、語(yu)言(yan)學(xue)、數字信(xin)號處理、計算機(ji)科學(xue)等(deng)多個學(xue)科技術(shu)(shu)，是信(xin)息處理領域中的(de)(de)(de)(de)(de)一(yi)項前沿(yan)技術(shu)(shu)。隨著(zhu)計算機(ji)技術(shu)(shu)的(de)(de)(de)(de)(de)不斷提高(gao)，語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)技術(shu)(shu)從早期的(de)(de)(de)(de)(de)共振(zhen)峰合(he)成(cheng)(cheng)(cheng)(cheng),逐(zhu)步(bu)發展為波形拼(pin)接合(he)成(cheng)(cheng)(cheng)(cheng)和統計參數語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)，再發展到混合(he)語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)；合(he)成(cheng)(cheng)(cheng)(cheng)語(yu)音(yin)的(de)(de)(de)(de)(de)質量、自然度已經(jing)得(de)到明顯提高(gao)，基本能(neng)滿(man)足一(yi)些特定場合(he)的(de)(de)(de)(de)(de)應(ying)(ying)用需(xu)求。目前，語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)技術(shu)(shu)在(zai)(zai)銀行、醫院等(deng)的(de)(de)(de)(de)(de)信(xin)息播(bo)報系(xi)統、汽車導航系(xi)統、自動應(ying)(ying)答(da)呼叫中心(xin)等(deng)都有(you)廣泛應(ying)(ying)用，取得(de)了巨大(da)的(de)(de)(de)(de)(de)經(jing)濟效益。另外(wai)，隨著(zhu)智能(neng)手(shou)機(ji)、MP3、PDA 等(deng)與我們生(sheng)活(huo)密切相關的(de)(de)(de)(de)(de)媒(mei)介的(de)(de)(de)(de)(de)大(da)量涌現(xian)，語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)的(de)(de)(de)(de)(de)應(ying)(ying)用也在(zai)(zai)逐(zhu)漸向娛樂、語(yu)音(yin)教學(xue)、康(kang)復治療(liao)等(deng)領域深入(ru)。可以(yi)說(shuo)語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)正在(zai)(zai)影響著(zhu)人們生(sheng)活(huo)的(de)(de)(de)(de)(de)方方面(mian)面(mian)。

MoDELS · GM · FAST · 數據集 · Analysis ·

2023 年 11 月 1 日

GmGM: a Fast Multi-Axis Gaussian Graphical Model

Bailey Andrew,David Westhead,Luisa Cutillo

from arxiv, 8 pages (33 additional in supplementary material), 19 figures, submitted to AIStats

This paper introduces the Gaussian multi-Graphical Model, a model to construct sparse graph representations of matrix- and tensor-variate data. We generalize prior work in this area by simultaneously learning this representation across several tensors that share axes, which is necessary to allow the analysis of multimodal datasets such as those encountered in multi-omics. Our algorithm uses only a single eigendecomposition per axis, achieving an order of magnitude speedup over prior work in the ungeneralized case. This allows the use of our methodology on large multi-modal datasets such as single-cell multi-omics data, which was challenging with previous approaches. We validate our model on synthetic data and five real-world datasets.

SURF · 泛化理論 · MoDELS · INTERACT · Principle ·

2023 年 11 月 1 日

SURF: A Generalization Benchmark for GNNs Predicting Fluid Dynamics

Stefan Künzli,Florian Gr?tschla,Jo?l Mathys,Roger Wattenhofer

Simulating fluid dynamics is crucial for the design and development process, ranging from simple valves to complex turbomachinery. Accurately solving the underlying physical equations is computationally expensive. Therefore, learning-based solvers that model interactions on meshes have gained interest due to their promising speed-ups. However, it is unknown to what extent these models truly understand the underlying physical principles and can generalize rather than interpolate. Generalization is a key requirement for a general-purpose fluid simulator, which should adapt to different topologies, resolutions, or thermodynamic ranges. We propose SURF, a benchmark designed to test the \textit{generalization} of learned graph-based fluid simulators. SURF comprises individual datasets and provides specific performance and generalization metrics for evaluating and comparing different models. We empirically demonstrate the applicability of SURF by thoroughly investigating the two state-of-the-art graph-based models, yielding new insights into their generalization.

去噪 · MoDELS · 噪聲 · 相關系數 · Processing（編程語言） ·

2023 年 11 月 1 日

PTQD: Accurate Post-Training Quantization for Diffusion Models

Yefei He,Luping Liu,Jing Liu,Weijia Wu,Hong Zhou,Bohan Zhuang

from arxiv, Accepted by NeurIPS 2023

Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications. Post-training quantization (PTQ) of diffusion models can significantly reduce the model size and accelerate the sampling process without re-training. Nonetheless, applying existing PTQ methods directly to low-bit diffusion models can significantly impair the quality of generated samples. Specifically, for each denoising step, quantization noise leads to deviations in the estimated mean and mismatches with the predetermined variance schedule. As the sampling process proceeds, the quantization noise may accumulate, resulting in a low signal-to-noise ratio (SNR) during the later denoising steps. To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process. Specifically, we first disentangle the quantization noise into its correlated and residual uncorrelated parts regarding its full-precision counterpart. The correlated part can be easily corrected by estimating the correlation coefficient. For the uncorrelated part, we subtract the bias from the quantized results to correct the mean deviation and calibrate the denoising variance schedule to absorb the excess variance resulting from quantization. Moreover, we introduce a mixed-precision scheme for selecting the optimal bitwidth for each denoising step. Extensive experiments demonstrate that our method outperforms previous post-training quantized diffusion models, with only a 0.06 increase in FID score compared to full-precision LDM-4 on ImageNet 256x256, while saving 19.9x bit operations. Code is available at //github.com/ziplab/PTQD.

Performer · MoDELS · 設計 · Continuity · 語言模型化 ·

2023 年 10 月 31 日

ChipNeMo: Domain-Adapted LLMs for Chip Design

Mingjie Liu,Teo Ene,Robert Kirby,Chris Cheng,Nathaniel Pinckney,Rongjian Liang,Jonah Alben,Himyanshu Anand,Sanmitra Banerjee,Ismet Bayraktaroglu,Bonita Bhaskaran,Bryan Catanzaro,Arjun Chaudhuri,Sharon Clay,Bill Dally,Laura Dang,Parikshit Deshpande,Siddhanth Dhodhi,Sameer Halepete,Eric Hill,Jiashang Hu,Sumit Jain,Brucek Khailany,Kishor Kunal,Xiaowei Li,Hao Liu,Stuart Oberman,Sujeet Omar,Sreedhar Pratty,Ambar Sarkar,Zhengjiang Shao,Hanfei Sun,Pratik P Suthar,Varun Tej,Kaizhe Xu,Haoxing Ren

ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: custom tokenizers, domain-adaptive continued pretraining, supervised fine-tuning (SFT) with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis. Our results show that these domain adaptation techniques enable significant LLM performance improvements over general-purpose base models across the three evaluated applications, enabling up to 5x model size reduction with similar or better performance on a range of design tasks. Our findings also indicate that there's still room for improvement between our current results and ideal outcomes. We believe that further investigation of domain-adapted LLM approaches will help close this gap in the future.

MoDELS · 可理解性 · tuning · GPT-4 · BLIP-2 ·

2023 年 10 月 31 日

Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning

Junyu Lu,Dixiang Zhang,Xiaojun Wu,Xinyu Gao,Ruyi Gan,Jiaxing Zhang,Yan Song,Pingjian Zhang

Recent advancements enlarge the capabilities of large language models (LLMs) in zero-shot image-to-text generation and understanding by integrating multi-modal inputs. However, such success is typically limited to English scenarios due to the lack of large-scale and high-quality non-English multi-modal resources, making it extremely difficult to establish competitive counterparts in other languages. In this paper, we introduce the Ziya-Visual series, a set of bilingual large-scale vision-language models (LVLMs) designed to incorporate visual semantics into LLM for multi-modal dialogue. Composed of Ziya-Visual-Base and Ziya-Visual-Chat, our models adopt the Querying Transformer from BLIP-2, further exploring the assistance of optimization schemes such as instruction tuning, multi-stage training and low-rank adaptation module for visual-language alignment. In addition, we stimulate the understanding ability of GPT-4 in multi-modal scenarios, translating our gathered English image-text datasets into Chinese and generating instruction-response through the in-context learning method. The experiment results demonstrate that compared to the existing LVLMs, Ziya-Visual achieves competitive performance across a wide range of English-only tasks including zero-shot image-text retrieval, image captioning, and visual question answering. The evaluation leaderboard accessed by GPT-4 also indicates that our models possess satisfactory image-text understanding and generation capabilities in Chinese multi-modal scenario dialogues. Code, demo and models are available at ~\url{//huggingface.co/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1}.

編譯器 · Learning · 深度學習 · 變換 · Python ·

2023 年 10 月 30 日

TorchProbe: Fuzzing Dynamic Deep Learning Compilers

Qidong Su,Chuqin Geng,Gennady Pekhimenko,Xujie Si

Static and dynamic computational graphs represent two distinct approaches to constructing deep learning frameworks. The former prioritizes compiler-based optimizations, while the latter focuses on programmability and user-friendliness. The recent release of PyTorch 2.0, which supports compiling arbitrary deep learning programs in Python, signifies a new direction in the evolution of deep learning infrastructure to incorporate compiler techniques in a more dynamic manner and support more dynamic language features like dynamic control flows and closures. Given PyTorch's seamless integration with Python, its compiler aims to support arbitrary deep learning code written in Python. However, the inherent dynamism of Python poses challenges to the completeness and robustness of the compiler. While recent research has introduced fuzzing to test deep learning compilers, there is still a lack of comprehensive analysis on how to test dynamic features. To address this issue, we propose several code transformations to generate test cases involving dynamic features. These transformations preserve the program's semantics, ensuring that any discrepancy between the transformed and original programs indicates the presence of a bug. Through our approach, we have successfully identified twenty previously unknown bugs in the PyTorch compiler and its underlying tensor compiler Triton.

線性模型 · 線性的 · Networking · MoDELS · Neural Networks ·

2023 年 10 月 30 日

The Contextual Lasso: Sparse Linear Models via Deep Neural Networks

Ryan Thompson,Amir Dezfouli,Robert Kohn

from arxiv, To appear in Advances in Neural Information Processing Systems

Sparse linear models are one of several core tools for interpretable machine learning, a field of emerging importance as predictive models permeate decision-making in many domains. Unfortunately, sparse linear models are far less flexible as functions of their input features than black-box models like deep neural networks. With this capability gap in mind, we study a not-uncommon situation where the input features dichotomize into two groups: explanatory features, which are candidates for inclusion as variables in an interpretable model, and contextual features, which select from the candidate variables and determine their effects. This dichotomy leads us to the contextual lasso, a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features. The fitting process learns this function nonparametrically via a deep neural network. To attain sparse coefficients, we train the network with a novel lasso regularizer in the form of a projection layer that maps the network's output onto the space of $\ell_1$-constrained linear models. An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso without sacrificing the predictive power of a standard deep neural network.

dynamic programming · 相互獨立的 · 情景 · 圖 · 圖形處理器 ·

2023 年 10 月 28 日

Maximum Independent Set: Self-Training through Dynamic Programming

Lorenzo Brusca,Lars C. P. M. Quaedvlieg,Stratis Skoulakis,Grigorios G Chrysos,Volkan Cevher

from arxiv, Accepted in NeurIPS 2023

This work presents a graph neural network (GNN) framework for solving the maximum independent set (MIS) problem, inspired by dynamic programming (DP). Specifically, given a graph, we propose a DP-like recursive algorithm based on GNNs that firstly constructs two smaller sub-graphs, predicts the one with the larger MIS, and then uses it in the next recursive call. To train our algorithm, we require annotated comparisons of different graphs concerning their MIS size. Annotating the comparisons with the output of our algorithm leads to a self-training process that results in more accurate self-annotation of the comparisons and vice versa. We provide numerical evidence showing the superiority of our method vs prior methods in multiple synthetic and real-world datasets.

圖卷積神經網絡/圖卷積網絡 · AdaBoost · 圖卷積 · 圖 · Networking ·

2019 年 8 月 14 日

AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models

Ke Sun,Zhouchen Lin,Zhanxing Zhu

The design of deep graph models still remains to be investigated and the crucial part is how to explore and exploit the knowledge from different hops of neighbors in an efficient way. In this paper, we propose a novel RNN-like deep graph neural network architecture by incorporating AdaBoost into the computation of network; and the proposed graph convolutional network called AdaGCN~(AdaBoosting Graph Convolutional Network) has the ability to efficiently extract knowledge from high-order neighbors and integrate knowledge from different hops of neighbors into the network in an AdaBoost way. We also present the architectural difference between AdaGCN and existing graph convolutional methods to show the benefits of our proposal. Finally, extensive experiments demonstrate the state-of-the-art prediction performance and the computational advantage of our approach AdaGCN.

網絡嵌入 · Networking · CASES · 學成 · AUC ·

2018 年 1 月 28 日

HONE: Higher-Order Network Embeddings

Ryan A. Rossi,Nesreen K. Ahmed,Eunyee Koh

This paper describes a general framework for learning Higher-Order Network Embeddings (HONE) from graph data based on network motifs. The HONE framework is highly expressive and flexible with many interchangeable components. The experimental results demonstrate the effectiveness of learning higher-order network representations. In all cases, HONE outperforms recent embedding methods that are unable to capture higher-order structures with a mean relative gain in AUC of $19\%$ (and up to $75\%$ gain) across a wide variety of networks and embedding methods.