东京热加勒比中文无码,国产三级A专区在线观看播放,欧美资源先锋偷拍,成人一区二区三区在线观看

Diffusion model-based speech enhancement has received increased attention since it can generate very natural enhanced signals and generalizes well to unseen conditions. Diffusion models have been explored for several sub-tasks of speech enhancement, such as speech denoising, dereverberation, and source separation. In this paper, we investigate their use for target speech extraction (TSE), which consists of estimating the clean speech signal of a target speaker in a mixture of multi-talkers. TSE is realized by conditioning the extraction process on a clue identifying the target speaker. We show we can realize TSE using a conditional diffusion model conditioned on the clue. Besides, we introduce ensemble inference to reduce potential extraction errors caused by the diffusion process. In experiments on Libri2mix corpus, we show that the proposed diffusion model-based TSE combined with ensemble inference outperforms a comparable TSE system trained discriminatively.

相關內容

TSE

關注 0

IEEE軟件工程事務處理對定義明確的理論結果和對軟件的構建、分析或管理有潛在影響的實證研究感興趣。這些交易的范圍從制定原則的機制到將這些原則應用到具體環境。具體的主題領域包括：a）開發和維護方法和模型，例如軟件系統的規范、設計和實現的技術和原則，包括符號和過程模型；b）評估方法，例如軟件測試和驗證、可靠性模型、測試和診斷程序，用于錯誤控制的軟件冗余和設計，以及過程和產品各個方面的測量和評估；c）軟件項目管理，例如生產力因素、成本模型、進度和組織問題、標準；d）工具和環境，例如特定工具，集成工具環境，包括相關的體系結構、數據庫、并行和分布式處理問題；e）系統問題，例如硬件-軟件權衡；f）最新調查，提供對某一特定關注領域歷史發展的綜合和全面審查。官網地址：

不變 · 線性分類 · 線性的 · MoDELS · Less ·

2023 年 9 月 28 日

Probabilistic Invariant Learning with Randomized Linear Classifiers

Leonardo Cotta,Gal Yehuda,Assaf Schuster,Chris J. Maddison

Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and design models that are both expressive and invariant but use less resources. Inspired by randomized algorithms, our key insight is that accepting probabilistic notions of universal approximation and invariance can reduce our resource requirements. More specifically, we propose a class of binary classification models called Randomized Linear Classifiers (RLCs). We give parameter and sample size conditions in which RLCs can, with high probability, approximate any (smooth) function while preserving invariance to compact group transformations. Leveraging this result, we design three RLCs that are provably probabilistic invariant for classification tasks over sets, graphs, and spherical data. We show how these models can achieve probabilistic invariance and universality using less resources than (deterministic) neural networks and their invariant counterparts. Finally, we empirically demonstrate the benefits of this new class of models on invariant tasks where deterministic invariant neural networks are known to struggle.

語言模型化 · Prompt · Performer · 語音識別 · MoDELS ·

2023 年 9 月 27 日

Generative Speech Recognition Error Correction with Large Language Models

Chao-Han Huck Yang,Yile Gu,Yi-Chieh Liu,Shalini Ghosh,Ivan Bulyko,Andreas Stolcke

from arxiv, Accepted to IEEE Automatic Speech Recognition and Understanding (ASRU) 2023

We explore the ability of large language models (LLMs) to act as ASR post-processors that perform rescoring and error correction. Our focus is on instruction prompting to let LLMs perform these task without fine-tuning, for which we evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task-activating prompting (TAP) method that combines instruction and demonstration. Using a pre-trained first-pass system and rescoring output on two out-of-domain tasks (ATIS and WSJ), we show that rescoring only by in-context learning with frozen LLMs achieves results that are competitive with rescoring by domain-tuned LMs. By combining prompting techniques with fine-tuning we achieve error rates below the N-best oracle level, showcasing the generalization power of the LLMs.

控制器 · 動力系統 · INTERACT · 機器人 · 環 ·

2023 年 9 月 27 日

Orientation Control with Variable Stiffness Dynamical Systems

Youssef Michel,Matteo Saveriano,Fares J. Abu-Dakka,Dongheui Lee

from arxiv, Accepted at the IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS),2023

Recently, several approaches have attempted to combine motion generation and control in one loop to equip robots with reactive behaviors, that cannot be achieved with traditional time-indexed tracking controllers. These approaches however mainly focused on positions, neglecting the orientation part which can be crucial to many tasks e.g. screwing. In this work, we propose a control algorithm that adapts the robot's rotational motion and impedance in a closed-loop manner. Given a first-order Dynamical System representing an orientation motion plan and a desired rotational stiffness profile, our approach enables the robot to follow the reference motion with an interactive behavior specified by the desired stiffness, while always being aware of the current orientation, represented as a Unit Quaternion (UQ). We rely on the Lie algebra to formulate our algorithm, since unlike positions, UQ feature constraints that should be respected in the devised controller. We validate our proposed approach in multiple robot experiments, showcasing the ability of our controller to follow complex orientation profiles, react safely to perturbations, and fulfill physical interaction tasks.

語言模型化 · MoDELS · 圖 · 知識 (knowledge) · Prompt ·

2023 年 9 月 27 日

Graph Neural Prompting with Large Language Models

Yijun Tian,Huan Song,Zichen Wang,Haozhu Wang,Ziqing Hu,Fang Wang,Nitesh V. Chawla,Panpan Xu

Large Language Models (LLMs) have shown remarkable generalization capability with exceptional performance in various language modeling tasks. However, they still exhibit inherent limitations in precisely capturing and returning grounded knowledge. While existing work has explored utilizing knowledge graphs to enhance language modeling via joint training and customized model architectures, applying this to LLMs is problematic owing to their large number of parameters and high computational cost. In addition, how to leverage the pre-trained LLMs and avoid training a customized model from scratch remains an open question. In this work, we propose Graph Neural Prompting (GNP), a novel plug-and-play method to assist pre-trained LLMs in learning beneficial knowledge from KGs. GNP encompasses various designs, including a standard graph neural network encoder, a cross-modality pooling module, a domain projector, and a self-supervised link prediction objective. Extensive experiments on multiple datasets demonstrate the superiority of GNP on both commonsense and biomedical reasoning tasks across different LLM sizes and settings.

MoDELS · Prophet · 損失 · 樣例 · 優化器 ·

2023 年 9 月 27 日

Optimal Stopping with Multi-Dimensional Comparative Loss Aversion

Linda Cai,Joshua Gardner,S. Matthew Weinberg

from arxiv, Accepted to WINE 2023

Despite having the same basic prophet inequality setup and model of loss aversion, conclusions in our multi-dimensional model differs considerably from the one-dimensional model of Kleinberg et al. For example, Kleinberg et al. gives a tight closed-form on the competitive ratio that an online decision-maker can achieve as a function of $\lambda$, for any $\lambda \geq 0$. In our multi-dimensional model, there is a sharp phase transition: if $k$ denotes the number of dimensions, then when $\lambda \cdot (k-1) \geq 1$, no non-trivial competitive ratio is possible. On the other hand, when $\lambda \cdot (k-1) < 1$, we give a tight bound on the achievable competitive ratio (similar to Kleinberg et al.). As another example, Kleinberg et al. uncovers an exponential improvement in their competitive ratio for the random-order vs. worst-case prophet inequality problem. In our model with $k\geq 2$ dimensions, the gap is at most a constant-factor. We uncover several additional key differences in the multi- and single-dimensional models.

Performer · 無監督 · 語音識別 · MoDELS · 表示 ·

2023 年 9 月 26 日

Disentangling Prosody Representations with Unsupervised Speech Reconstruction

Leyuan Qu,Taihao Li,Cornelius Weber,Theresa Pekarek-Rosin,Fuji Ren,Stefan Wermter

from arxiv, Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

Human speech can be characterized by different components, including semantic content, speaker identity and prosodic information. Significant progress has been made in disentangling representations for semantic content and speaker identity in Automatic Speech Recognition (ASR) and speaker verification tasks respectively. However, it is still an open challenging research question to extract prosodic information because of the intrinsic association of different attributes, such as timbre and rhythm, and because of the need for supervised training schemes to achieve robust large-scale and speaker-independent ASR. The aim of this paper is to address the disentanglement of emotional prosody from speech based on unsupervised reconstruction. Specifically, we identify, design, implement and integrate three crucial components in our proposed speech reconstruction model Prosody2Vec: (1) a unit encoder that transforms speech signals into discrete units for semantic content, (2) a pretrained speaker verification model to generate speaker identity embeddings, and (3) a trainable prosody encoder to learn prosody representations. We first pretrain the Prosody2Vec representations on unlabelled emotional speech corpora, then fine-tune the model on specific datasets to perform Speech Emotion Recognition (SER) and Emotional Voice Conversion (EVC) tasks. Both objective (weighted and unweighted accuracies) and subjective (mean opinion score) evaluations on the EVC task suggest that Prosody2Vec effectively captures general prosodic features that can be smoothly transferred to other emotional speech. In addition, our SER experiments on the IEMOCAP dataset reveal that the prosody features learned by Prosody2Vec are complementary and beneficial for the performance of widely used speech pretraining models and surpass the state-of-the-art methods when combining Prosody2Vec with HuBERT representations.

優化器 · Processing（編程語言） · MoDELS · 學成 · 最優化 ·

2021 年 12 月 19 日

Introduction to Online Convex Optimization

Elad Hazan

from arxiv, arXiv admin note: text overlap with arXiv:1909.03550

This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.

異常點 · 異常檢測 · CIFAR-10 · Extensibility · Performance ·

2018 年 12 月 21 日

Deep Anomaly Detection with Outlier Exposure

Dan Hendrycks,Mantas Mazeika,Thomas G. Dietterich

from arxiv, ICLR 2019; PyTorch code available at //github.com/hendrycks/outlier-exposure

It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.

情感分析 · entity · 門控 · 卷積 · MoDELS ·

2018 年 5 月 18 日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Wei Xue,Tao Li

from arxiv, Accepted in ACL 2018

Aspect based sentiment analysis (ABSA) can provide more detailed information than general sentiment analysis, because it aims to predict the sentiment polarities of the given aspects or entities in text. We summarize previous approaches into two subtasks: aspect-category sentiment analysis (ACSA) and aspect-term sentiment analysis (ATSA). Most previous approaches employ long short-term memory and attention mechanisms to predict the sentiment polarity of the concerned targets, which are often complicated and need more training time. We propose a model based on convolutional neural networks and gating mechanisms, which is more accurate and efficient. First, the novel Gated Tanh-ReLU Units can selectively output the sentiment features according to the given aspect or entity. The architecture is much simpler than attention layer used in the existing models. Second, the computations of our model could be easily parallelized during training, because convolutional layers do not have time dependency as in LSTM layers, and gating units also work independently. The experiments on SemEval datasets demonstrate the efficiency and effectiveness of our models.

樣例 · 黑盒 · Networking · MoDELS · 原點 ·

2018 年 1 月 15 日

Generating Adversarial Examples with Adversarial Networks

Chaowei Xiao,Bo Li,Jun-Yan Zhu,Warren He,Mingyan Liu,Dawn Song

Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.