两个人的电影全免费观看720,人人操人人莫人人草,露脸视频一区二区三区在线播放

The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from reference speech using self-supervised learning (SSL) speech representations, can reproduce speaker characteristics very accurately. However, this approach suffers from degradation in speech synthesis quality when the reference speech contains noise. In this paper, we propose a noise-robust zero-shot TTS method. We incorporated adapters into the SSL model, which we fine-tuned with the TTS model using noisy reference speech. In addition, to further improve performance, we adopted a speech enhancement (SE) front-end. With these improvements, our proposed SSL-based zero-shot TTS achieved high-quality speech synthesis with noisy reference speech. Through the objective and subjective evaluations, we confirmed that the proposed method is highly robust to noise in reference speech, and effectively works in combination with SE.

相關內容

語音合(he)成

關注 491

語(yu)(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)（Speech Synthesis），也稱為(wei)文(wen)語(yu)(yu)(yu)(yu)轉換(huan)（Text-to-Speech, TTS,它是(shi)將任意(yi)的(de)(de)(de)輸入(ru)文(wen)本(ben)轉換(huan)成(cheng)(cheng)(cheng)自然(ran)流暢的(de)(de)(de)語(yu)(yu)(yu)(yu)音(yin)(yin)(yin)輸出。語(yu)(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)涉及到(dao)人工智能(neng)、心(xin)理(li)學、聲學、語(yu)(yu)(yu)(yu)言學、數字信(xin)(xin)號處(chu)理(li)、計(ji)算(suan)機(ji)科(ke)學等(deng)多個(ge)學科(ke)技術(shu)，是(shi)信(xin)(xin)息處(chu)理(li)領域中(zhong)的(de)(de)(de)一項前沿技術(shu)。隨著(zhu)計(ji)算(suan)機(ji)技術(shu)的(de)(de)(de)不斷提高，語(yu)(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)技術(shu)從早期(qi)的(de)(de)(de)共(gong)振峰合(he)(he)成(cheng)(cheng)(cheng),逐步發展為(wei)波形拼接合(he)(he)成(cheng)(cheng)(cheng)和統(tong)計(ji)參數語(yu)(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)，再發展到(dao)混合(he)(he)語(yu)(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)；合(he)(he)成(cheng)(cheng)(cheng)語(yu)(yu)(yu)(yu)音(yin)(yin)(yin)的(de)(de)(de)質量、自然(ran)度(du)已經(jing)得(de)到(dao)明顯提高，基(ji)本(ben)能(neng)滿足一些特定場合(he)(he)的(de)(de)(de)應(ying)用(yong)需求(qiu)。目(mu)前，語(yu)(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)技術(shu)在銀(yin)行、醫院等(deng)的(de)(de)(de)信(xin)(xin)息播報系統(tong)、汽車導航系統(tong)、自動應(ying)答呼(hu)叫中(zhong)心(xin)等(deng)都有廣泛應(ying)用(yong)，取(qu)得(de)了巨大的(de)(de)(de)經(jing)濟效益。另(ling)外，隨著(zhu)智能(neng)手機(ji)、MP3、PDA 等(deng)與我們生活密(mi)切相關的(de)(de)(de)媒介的(de)(de)(de)大量涌現，語(yu)(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)的(de)(de)(de)應(ying)用(yong)也在逐漸向娛(yu)樂、語(yu)(yu)(yu)(yu)音(yin)(yin)(yin)教學、康復治療等(deng)領域深入(ru)。可以說語(yu)(yu)(yu)(yu)音(yin)(yin)(yin)合(he)(he)成(cheng)(cheng)(cheng)正在影響著(zhu)人們生活的(de)(de)(de)方方面(mian)面(mian)。

似然 · 推斷 · MoDELS · 估計/估計量 · 泛函 ·

2024 年 2 月 23 日

Simulation-based inference using surjective sequential neural likelihood estimation

Simon Dirmeier,Carlo Albert,Fernando Perez-Cruz

We present Surjective Sequential Neural Likelihood (SSNL) estimation, a novel method for simulation-based inference in models where the evaluation of the likelihood function is not tractable and only a simulator that can generate synthetic data is available. SSNL fits a dimensionality-reducing surjective normalizing flow model and uses it as a surrogate likelihood function which allows for conventional Bayesian inference using either Markov chain Monte Carlo methods or variational inference. By embedding the data in a low-dimensional space, SSNL solves several issues previous likelihood-based methods had when applied to high-dimensional data sets that, for instance, contain non-informative data dimensions or lie along a lower-dimensional manifold. We evaluate SSNL on a wide variety of experiments and show that it generally outperforms contemporary methods used in simulation-based inference, for instance, on a challenging real-world example from astrophysics which models the magnetic field strength of the sun using a solar dynamo model.

估計/估計量 · 圖 · GM · MoDELS · 得分 ·

2024 年 2 月 23 日

Estimation of partially known Gaussian graphical models with score-based structural priors

Martín Sevilla,Antonio García Marques,Santiago Segarra

from arxiv, 17 pages, 7 figures, AISTATS 2024

We propose a novel algorithm for the support estimation of partially known Gaussian graphical models that incorporates prior information about the underlying graph. In contrast to classical approaches that provide a point estimate based on a maximum likelihood or a maximum a posteriori criterion using (simple) priors on the precision matrix, we consider a prior on the graph and rely on annealed Langevin diffusion to generate samples from the posterior distribution. Since the Langevin sampler requires access to the score function of the underlying graph prior, we use graph neural networks to effectively estimate the score from a graph dataset (either available beforehand or generated from a known distribution). Numerical experiments demonstrate the benefits of our approach.

tuning · 大語言模型 · 語言模型化 · MoDELS · 超參數 ·

2024 年 2 月 22 日

Zero-shot cross-lingual transfer in instruction tuning of large language model

Nadezhda Chirkova,Vassilina Nikoulina

Instruction tuning (IT) is widely used to teach pretrained large language models (LLMs) to follow arbitrary instructions, but is under-studied in multilingual settings. In this work, we conduct a systematic study of zero-shot cross-lingual transfer in IT, when an LLM is instruction-tuned on English-only data and then tested on user prompts in other languages. We investigate the influence of model configuration choices and devise a multi-facet evaluation strategy for multilingual instruction following. We find that cross-lingual transfer does happen successfully in IT even if all stages of model training are English-centric, but only if multiliguality is taken into account in hyperparameter tuning and with large enough IT data. English-trained LLMs are capable of generating correct-language, comprehensive and helpful responses in the other languages, but suffer from low factuality and may occasionally have fluency errors.

MoDELS · 語音識別 · 自動語音識別 · Performer · 層 ·

2024 年 2 月 22 日

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

George August Wright,Umberto Cappellazzo,Salah Zaiem,Desh Raj,Lucas Ondel Yang,Daniele Falavigna,Mohamed Nabih Ali,Alessio Brutti

from arxiv, Accepted at the ICASSP Workshop Self-supervision in Audio, Speech and Beyond 2024

The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented by early-exit architectures, in which additional exit branches are appended to intermediate layers of the encoder. In self-attention models for automatic speech recognition (ASR), early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands. Previous research on early-exiting ASR models has relied on pre-trained self-supervised models, fine-tuned with an early-exit loss. In this paper, we undertake an experimental comparison between fine-tuning pre-trained backbones and training models from scratch with the early-exiting objective. Experiments conducted on public datasets reveal that early-exit models trained from scratch not only preserve performance when using fewer encoder layers but also exhibit enhanced task accuracy compared to single-exit or pre-trained models. Furthermore, we explore an exit selection strategy grounded in posterior probabilities as an alternative to the conventional frame-based entropy approach. Results provide insights into the training dynamics of early-exit architectures for ASR models, particularly the efficacy of training strategies and exit selection methods.

U-Net · 損失 · 級聯 · 3D · 真實值 ·

2024 年 2 月 22 日

Cascaded multitask U-Net using topological loss for vessel segmentation and centerline extraction

Pierre Rougé,Nicolas Passat,Odyssée Merveille

Vessel segmentation and centerline extraction are two crucial preliminary tasks for many computer-aided diagnosis tools dealing with vascular diseases. Recently, deep-learning based methods have been widely applied to these tasks. However, classic deep-learning approaches struggle to capture the complex geometry and specific topology of vascular networks, which is of the utmost importance in most applications. To overcome these limitations, the clDice loss, a topological loss that focuses on the vessel centerlines, has been recently proposed. This loss requires computing, with a proposed soft-skeleton algorithm, the skeletons of both the ground truth and the predicted segmentation. However, the soft-skeleton algorithm provides suboptimal results on 3D images, which makes the clDice hardly suitable on 3D images. In this paper, we propose to replace the soft-skeleton algorithm by a U-Net which computes the vascular skeleton directly from the segmentation. We show that our method provides more accurate skeletons than the soft-skeleton algorithm. We then build upon this network a cascaded U-Net trained with the clDice loss to embed topological constraints during the segmentation. The resulting model is able to predict both the vessel segmentation and centerlines with a more accurate topology.

穩健性 · 全 · 可約的 · 向量化 · MoDELS ·

2024 年 2 月 22 日

An extended Gauss-Newton method for full waveform inversion

Ali Gholami

Full waveform inversion (FWI) is a large-scale nonlinear ill-posed problem for which computationally expensive Newton-type methods can become trapped in undesirable local minima, particularly when the initial model lacks a low-wavenumber component and the recorded data lacks low-frequency content. A modification to the Gauss-Newton (GN) method is proposed to address these issues. The standard GN system for multisource multireceiver FWI is reformulated into an equivalent matrix equation form, with the solution becoming a diagonal matrix rather than a vector as in the standard system. The search direction is transformed from a vector to a matrix by relaxing the diagonality constraint, effectively adding a degree of freedom to the subsurface offset axis. The relaxed system can be explicitly solved with only the inversion of two small matrices that deblur the data residual matrix along the source and receiver dimensions, which simplifies the inversion of the Hessian matrix. When used to solve the extended source FWI objective function, the Extended GN (EGN) method integrates the benefits of both model and source extension. The EGN method effectively combines the computational effectiveness of the reduced FWI method with the robustness characteristics of extended formulations and offers a promising solution for addressing the challenges of FWI. It bridges the gap between these extended formulations and the reduced FWI method, enhancing inversion robustness while maintaining computational efficiency. The robustness and stability of the EGN algorithm for waveform inversion are demonstrated numerically.

控制器 · 機器人 · 向量化 · Processing（編程語言） · Analysis ·

2024 年 2 月 21 日

Behavioral-based circular formation control for robot swarms

Jesús Bautista,Héctor García de Marina

from arxiv, 7 pages, ICRA 2024

This paper focuses on coordinating a robot swarm orbiting a convex path without collisions among the individuals. The individual robots lack braking capabilities and can only adjust their courses while maintaining their constant but different speeds. Instead of controlling the spatial relations between the robots, our formation control algorithm aims to deploy a dense robot swarm that mimics the behavior of tornado schooling fish. To achieve this objective safely, we employ a combination of a scalable overtaking rule, a guiding vector field, and a control barrier function with an adaptive radius to facilitate smooth overtakes. The decision-making process of the robots is distributed, relying only on local information. Practical applications include defensive structures or escorting missions with the added resiliency of a swarm without a centralized command. We provide a rigorous analysis of the proposed strategy and validate its effectiveness through numerical simulations involving a high density of unicycles.

控制器 · 線性的 · Integration · CASE · 基 ·

2024 年 2 月 21 日

Numerical methods for closed-loop systems with non-autonomous data

B. Baran,P. Benner,J. Saak,T. Stillfjord

By computing a feedback control via the linear quadratic regulator (LQR) approach and simulating a non-linear non-autonomous closed-loop system using this feedback, we combine two numerically challenging tasks. For the first task, the computation of the feedback control, we use the non-autonomous generalized differential Riccati equation (DRE), whose solution determines the time-varying feedback gain matrix. Regarding the second task, we want to be able to simulate non-linear closed-loop systems for which it is known that the regulator is only valid for sufficiently small perturbations. Thus, one easily runs into numerical issues in the integrators when the closed-loop control varies greatly. For these systems, e.g., the A-stable implicit Euler methods fails.\newline On the one hand, we implement non-autonomous versions of splitting schemes and BDF methods for the solution of our non-autonomous DREs. These are well-established DRE solvers in the autonomous case. On the other hand, to tackle the numerical issues in the simulation of the non-linear closed-loop system, we apply a fractional-step-theta scheme with time-adaptivity tuned specifically to this kind of challenge. That is, we additionally base the time-adaptivity on the activity of the control. We compare this approach to the more classical error-based time-adaptivity.\newline We describe techniques to make these two tasks computable in a reasonable amount of time and are able to simulate closed-loop systems with strongly varying controls, while avoiding numerical issues. Our time-adaptivity approach requires fewer time steps than the error-based alternative and is more reliable.

三角形化 · Integration · 優化器 · 數值分析 ·

2024 年 2 月 20 日

Edge-averaged virtual element methods for convection-diffusion and convection-dominated problems

Shuhao Cao,Long Chen,Seulip Lee

This manuscript develops edge-averaged virtual element (EAVE) methodologies to address convection-diffusion problems effectively in the convection-dominated regime. It introduces a variant of EAVE that ensures monotonicity (producing an $M$-matrix) on Voronoi polygonal meshes, provided their duals are Delaunay triangulations with acute angles. Furthermore, the study outlines a comprehensive framework for EAVE methodologies, introducing another variant that integrates with the stiffness matrix derived from the lowest-order virtual element method for the Poisson equation. Numerical experiments confirm the theoretical advantages of the monotonicity property and demonstrate an optimal convergence rate across various mesh configurations.

推斷 · 隱馬爾科夫模型 · Markov · MoDELS · 頻率主義學派 ·

2024 年 2 月 19 日

Manipulating hidden-Markov-model inferences by corrupting batch data

William N. Caballero,Jose Manuel Camacho,Tahir Ekin,Roi Naveiro

from arxiv, 42 pages, 8 figures, 11 tables

Time-series models typically assume untainted and legitimate streams of data. However, a self-interested adversary may have incentive to corrupt this data, thereby altering a decision maker's inference. Within the broader field of adversarial machine learning, this research provides a novel, probabilistic perspective toward the manipulation of hidden Markov model inferences via corrupted data. In particular, we provision a suite of corruption problems for filtering, smoothing, and decoding inferences leveraging an adversarial risk analysis approach. Multiple stochastic programming models are set forth that incorporate realistic uncertainties and varied attacker objectives. Three general solution methods are developed by alternatively viewing the problem from frequentist and Bayesian perspectives. The efficacy of each method is illustrated via extensive, empirical testing. The developed methods are characterized by their solution quality and computational effort, resulting in a stratification of techniques across varying problem-instance architectures. This research highlights the weaknesses of hidden Markov models under adversarial activity, thereby motivating the need for robustification techniques to ensure their security.