亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper integrates graph-to-sequence into an end-to-end text-to-speech framework for syntax-aware modelling with syntactic information of input text. Specifically, the input text is parsed by a dependency parsing module to form a syntactic graph. The syntactic graph is then encoded by a graph encoder to extract the syntactic hidden information, which is concatenated with phoneme embedding and input to the alignment and flow-based decoding modules to generate the raw audio waveform. The model is experimented on two languages, English and Mandarin, using single-speaker, few samples of target speakers, and multi-speaker datasets, respectively. Experimental results show better prosodic consistency performance between input text and generated audio, and also get higher scores in the subjective prosodic evaluation, and show the ability of voice conversion. Besides, the efficiency of the model is largely boosted through the design of the AI chip operator with 5x acceleration.

相關內容

語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)(Speech Synthesis),也稱為文語(yu)轉換(Text-to-Speech, TTS,它是將任意的(de)(de)(de)(de)(de)輸入(ru)文本轉換成(cheng)(cheng)(cheng)(cheng)自然流暢的(de)(de)(de)(de)(de)語(yu)音(yin)輸出(chu)。語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)涉及到人工智能(neng)、心(xin)理學(xue)、聲學(xue)、語(yu)言(yan)學(xue)、數字信(xin)號處理、計算機(ji)科學(xue)等(deng)多個學(xue)科技術(shu)(shu),是信(xin)息處理領域中的(de)(de)(de)(de)(de)一(yi)項前沿(yan)技術(shu)(shu)。 隨著(zhu)計算機(ji)技術(shu)(shu)的(de)(de)(de)(de)(de)不斷提高(gao),語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)技術(shu)(shu)從早期的(de)(de)(de)(de)(de)共振(zhen)峰合(he)成(cheng)(cheng)(cheng)(cheng),逐(zhu)步(bu)發展為波形拼(pin)接合(he)成(cheng)(cheng)(cheng)(cheng)和統計參數語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng),再發展到混合(he)語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng);合(he)成(cheng)(cheng)(cheng)(cheng)語(yu)音(yin)的(de)(de)(de)(de)(de)質量、自然度已經(jing)得(de)到明顯提高(gao),基本能(neng)滿(man)足一(yi)些特定場合(he)的(de)(de)(de)(de)(de)應(ying)(ying)用需(xu)求。目前,語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)技術(shu)(shu)在(zai)(zai)銀行、醫院等(deng)的(de)(de)(de)(de)(de)信(xin)息播(bo)報系(xi)統、汽車導航系(xi)統、自動應(ying)(ying)答(da)呼叫中心(xin)等(deng)都有(you)廣泛應(ying)(ying)用,取得(de)了巨大(da)的(de)(de)(de)(de)(de)經(jing)濟效益。 另外(wai),隨著(zhu)智能(neng)手(shou)機(ji)、MP3、PDA 等(deng)與我們生(sheng)活(huo)密切相關的(de)(de)(de)(de)(de)媒(mei)介的(de)(de)(de)(de)(de)大(da)量涌現(xian),語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)的(de)(de)(de)(de)(de)應(ying)(ying)用也在(zai)(zai)逐(zhu)漸向娛樂、語(yu)音(yin)教學(xue)、康(kang)復治療(liao)等(deng)領域深入(ru)。可以(yi)說(shuo)語(yu)音(yin)合(he)成(cheng)(cheng)(cheng)(cheng)正在(zai)(zai)影響著(zhu)人們生(sheng)活(huo)的(de)(de)(de)(de)(de)方方面(mian)面(mian)。

This paper introduces the Gaussian multi-Graphical Model, a model to construct sparse graph representations of matrix- and tensor-variate data. We generalize prior work in this area by simultaneously learning this representation across several tensors that share axes, which is necessary to allow the analysis of multimodal datasets such as those encountered in multi-omics. Our algorithm uses only a single eigendecomposition per axis, achieving an order of magnitude speedup over prior work in the ungeneralized case. This allows the use of our methodology on large multi-modal datasets such as single-cell multi-omics data, which was challenging with previous approaches. We validate our model on synthetic data and five real-world datasets.

Simulating fluid dynamics is crucial for the design and development process, ranging from simple valves to complex turbomachinery. Accurately solving the underlying physical equations is computationally expensive. Therefore, learning-based solvers that model interactions on meshes have gained interest due to their promising speed-ups. However, it is unknown to what extent these models truly understand the underlying physical principles and can generalize rather than interpolate. Generalization is a key requirement for a general-purpose fluid simulator, which should adapt to different topologies, resolutions, or thermodynamic ranges. We propose SURF, a benchmark designed to test the \textit{generalization} of learned graph-based fluid simulators. SURF comprises individual datasets and provides specific performance and generalization metrics for evaluating and comparing different models. We empirically demonstrate the applicability of SURF by thoroughly investigating the two state-of-the-art graph-based models, yielding new insights into their generalization.

Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications. Post-training quantization (PTQ) of diffusion models can significantly reduce the model size and accelerate the sampling process without re-training. Nonetheless, applying existing PTQ methods directly to low-bit diffusion models can significantly impair the quality of generated samples. Specifically, for each denoising step, quantization noise leads to deviations in the estimated mean and mismatches with the predetermined variance schedule. As the sampling process proceeds, the quantization noise may accumulate, resulting in a low signal-to-noise ratio (SNR) during the later denoising steps. To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process. Specifically, we first disentangle the quantization noise into its correlated and residual uncorrelated parts regarding its full-precision counterpart. The correlated part can be easily corrected by estimating the correlation coefficient. For the uncorrelated part, we subtract the bias from the quantized results to correct the mean deviation and calibrate the denoising variance schedule to absorb the excess variance resulting from quantization. Moreover, we introduce a mixed-precision scheme for selecting the optimal bitwidth for each denoising step. Extensive experiments demonstrate that our method outperforms previous post-training quantized diffusion models, with only a 0.06 increase in FID score compared to full-precision LDM-4 on ImageNet 256x256, while saving 19.9x bit operations. Code is available at //github.com/ziplab/PTQD.

ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: custom tokenizers, domain-adaptive continued pretraining, supervised fine-tuning (SFT) with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis. Our results show that these domain adaptation techniques enable significant LLM performance improvements over general-purpose base models across the three evaluated applications, enabling up to 5x model size reduction with similar or better performance on a range of design tasks. Our findings also indicate that there's still room for improvement between our current results and ideal outcomes. We believe that further investigation of domain-adapted LLM approaches will help close this gap in the future.

Recent advancements enlarge the capabilities of large language models (LLMs) in zero-shot image-to-text generation and understanding by integrating multi-modal inputs. However, such success is typically limited to English scenarios due to the lack of large-scale and high-quality non-English multi-modal resources, making it extremely difficult to establish competitive counterparts in other languages. In this paper, we introduce the Ziya-Visual series, a set of bilingual large-scale vision-language models (LVLMs) designed to incorporate visual semantics into LLM for multi-modal dialogue. Composed of Ziya-Visual-Base and Ziya-Visual-Chat, our models adopt the Querying Transformer from BLIP-2, further exploring the assistance of optimization schemes such as instruction tuning, multi-stage training and low-rank adaptation module for visual-language alignment. In addition, we stimulate the understanding ability of GPT-4 in multi-modal scenarios, translating our gathered English image-text datasets into Chinese and generating instruction-response through the in-context learning method. The experiment results demonstrate that compared to the existing LVLMs, Ziya-Visual achieves competitive performance across a wide range of English-only tasks including zero-shot image-text retrieval, image captioning, and visual question answering. The evaluation leaderboard accessed by GPT-4 also indicates that our models possess satisfactory image-text understanding and generation capabilities in Chinese multi-modal scenario dialogues. Code, demo and models are available at ~\url{//huggingface.co/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1}.

Static and dynamic computational graphs represent two distinct approaches to constructing deep learning frameworks. The former prioritizes compiler-based optimizations, while the latter focuses on programmability and user-friendliness. The recent release of PyTorch 2.0, which supports compiling arbitrary deep learning programs in Python, signifies a new direction in the evolution of deep learning infrastructure to incorporate compiler techniques in a more dynamic manner and support more dynamic language features like dynamic control flows and closures. Given PyTorch's seamless integration with Python, its compiler aims to support arbitrary deep learning code written in Python. However, the inherent dynamism of Python poses challenges to the completeness and robustness of the compiler. While recent research has introduced fuzzing to test deep learning compilers, there is still a lack of comprehensive analysis on how to test dynamic features. To address this issue, we propose several code transformations to generate test cases involving dynamic features. These transformations preserve the program's semantics, ensuring that any discrepancy between the transformed and original programs indicates the presence of a bug. Through our approach, we have successfully identified twenty previously unknown bugs in the PyTorch compiler and its underlying tensor compiler Triton.

Sparse linear models are one of several core tools for interpretable machine learning, a field of emerging importance as predictive models permeate decision-making in many domains. Unfortunately, sparse linear models are far less flexible as functions of their input features than black-box models like deep neural networks. With this capability gap in mind, we study a not-uncommon situation where the input features dichotomize into two groups: explanatory features, which are candidates for inclusion as variables in an interpretable model, and contextual features, which select from the candidate variables and determine their effects. This dichotomy leads us to the contextual lasso, a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features. The fitting process learns this function nonparametrically via a deep neural network. To attain sparse coefficients, we train the network with a novel lasso regularizer in the form of a projection layer that maps the network's output onto the space of $\ell_1$-constrained linear models. An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso without sacrificing the predictive power of a standard deep neural network.

This work presents a graph neural network (GNN) framework for solving the maximum independent set (MIS) problem, inspired by dynamic programming (DP). Specifically, given a graph, we propose a DP-like recursive algorithm based on GNNs that firstly constructs two smaller sub-graphs, predicts the one with the larger MIS, and then uses it in the next recursive call. To train our algorithm, we require annotated comparisons of different graphs concerning their MIS size. Annotating the comparisons with the output of our algorithm leads to a self-training process that results in more accurate self-annotation of the comparisons and vice versa. We provide numerical evidence showing the superiority of our method vs prior methods in multiple synthetic and real-world datasets.

The design of deep graph models still remains to be investigated and the crucial part is how to explore and exploit the knowledge from different hops of neighbors in an efficient way. In this paper, we propose a novel RNN-like deep graph neural network architecture by incorporating AdaBoost into the computation of network; and the proposed graph convolutional network called AdaGCN~(AdaBoosting Graph Convolutional Network) has the ability to efficiently extract knowledge from high-order neighbors and integrate knowledge from different hops of neighbors into the network in an AdaBoost way. We also present the architectural difference between AdaGCN and existing graph convolutional methods to show the benefits of our proposal. Finally, extensive experiments demonstrate the state-of-the-art prediction performance and the computational advantage of our approach AdaGCN.

This paper describes a general framework for learning Higher-Order Network Embeddings (HONE) from graph data based on network motifs. The HONE framework is highly expressive and flexible with many interchangeable components. The experimental results demonstrate the effectiveness of learning higher-order network representations. In all cases, HONE outperforms recent embedding methods that are unable to capture higher-order structures with a mean relative gain in AUC of $19\%$ (and up to $75\%$ gain) across a wide variety of networks and embedding methods.

北京阿比特科技有限公司