青柠在线观看免费高清1_日韩一区二区综合精品_欧洲中文字幕久久精品无码喷水_国产精品无码久久一线_精品无码久久中文字幕_国产精品无码一区二区三区在线看_国产微拍精品一区在线观看

This paper introduces FLEURS-R, a speech restoration applied version of the Few-shot Learning Evaluation of Universal Representations of Speech (FLEURS) corpus. FLEURS-R maintains an N-way parallel speech corpus in 102 languages as FLEURS, with improved audio quality and fidelity by applying the speech restoration model Miipher. The aim of FLEURS-R is to advance speech technology in more languages and catalyze research including text-to-speech (TTS) and other speech generation tasks in low-resource languages. Comprehensive evaluations with the restored speech and TTS baseline models trained from the new corpus show that the new corpus obtained significantly improved speech quality while maintaining the semantic contents of the speech. The corpus is publicly released via Hugging Face.

相關內容

語(yu)音合成

關注 491

語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)（Speech Synthesis），也(ye)稱為文(wen)語(yu)(yu)(yu)(yu)(yu)轉(zhuan)換(huan)（Text-to-Speech, TTS,它是將任意的(de)(de)(de)輸(shu)入(ru)文(wen)本(ben)(ben)轉(zhuan)換(huan)成(cheng)(cheng)自然流暢的(de)(de)(de)語(yu)(yu)(yu)(yu)(yu)音(yin)輸(shu)出。語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)涉及到(dao)(dao)人(ren)工智(zhi)能、心(xin)理學(xue)、聲學(xue)、語(yu)(yu)(yu)(yu)(yu)言(yan)學(xue)、數字信號處理、計算(suan)機(ji)科學(xue)等(deng)(deng)多個學(xue)科技術，是信息(xi)處理領(ling)域中(zhong)的(de)(de)(de)一項前沿技術。隨著計算(suan)機(ji)技術的(de)(de)(de)不斷提(ti)高，語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)技術從早期的(de)(de)(de)共振峰合(he)(he)成(cheng)(cheng),逐步發展為波形拼接合(he)(he)成(cheng)(cheng)和統(tong)計參數語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)，再發展到(dao)(dao)混合(he)(he)語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)；合(he)(he)成(cheng)(cheng)語(yu)(yu)(yu)(yu)(yu)音(yin)的(de)(de)(de)質量(liang)、自然度(du)已經(jing)得(de)到(dao)(dao)明顯提(ti)高，基本(ben)(ben)能滿足一些特(te)定場合(he)(he)的(de)(de)(de)應(ying)用需求。目(mu)前，語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)技術在(zai)銀行、醫(yi)院等(deng)(deng)的(de)(de)(de)信息(xi)播報系(xi)統(tong)、汽車導(dao)航系(xi)統(tong)、自動應(ying)答呼叫中(zhong)心(xin)等(deng)(deng)都有廣泛應(ying)用，取得(de)了(le)巨(ju)大的(de)(de)(de)經(jing)濟效益。另外，隨著智(zhi)能手機(ji)、MP3、PDA 等(deng)(deng)與我們(men)生(sheng)活密(mi)切(qie)相(xiang)關(guan)的(de)(de)(de)媒介(jie)的(de)(de)(de)大量(liang)涌現，語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)的(de)(de)(de)應(ying)用也(ye)在(zai)逐漸(jian)向娛樂、語(yu)(yu)(yu)(yu)(yu)音(yin)教學(xue)、康復治療等(deng)(deng)領(ling)域深入(ru)。可以(yi)說語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)正在(zai)影(ying)響著人(ren)們(men)生(sheng)活的(de)(de)(de)方(fang)方(fang)面(mian)面(mian)。

INTERACT · Performer · 語言模型化 · Taxonomy · ROUGE ·

2024 年 9 月 30 日

AmbigNLG: Addressing Task Ambiguity in Instruction for NLG

Ayana Niwa,Hayate Iso

from arxiv, EMNLP 2024

We introduce AmbigNLG, a novel task designed to tackle the challenge of task ambiguity in instructions for Natural Language Generation (NLG). Ambiguous instructions often impede the performance of Large Language Models (LLMs), especially in complex NLG tasks. To tackle this issue, we propose an ambiguity taxonomy that categorizes different types of instruction ambiguities and refines initial instructions with clearer specifications. Accompanying this task, we present AmbigSNI-NLG, a dataset comprising 2,500 instances annotated to facilitate research in AmbigNLG. Through comprehensive experiments with state-of-the-art LLMs, we demonstrate that our method significantly enhances the alignment of generated text with user expectations, achieving up to a 15.02-point increase in ROUGE scores. Our findings highlight the critical importance of addressing task ambiguity to fully harness the capabilities of LLMs in NLG tasks. Furthermore, we confirm the effectiveness of our method in practical settings involving interactive ambiguity mitigation with users, underscoring the benefits of leveraging LLMs for interactive clarification.

MoDELS · 語言模型化 · 解碼 · 大語言模型 · 推斷 ·

2024 年 9 月 30 日

Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models

Luohe Shi,Yao Yao,Zuchao Li,Lefei Zhang,Hai Zhao

Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and Parameter-Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting LLMs to downstream tasks. ICL typically constructs a few-shot learning scenario, either manually or by setting up a Retrieval-Augmented Generation (RAG) system, helping models quickly grasp domain knowledge or question-answering patterns without changing model parameters. However, this approach involves trade-offs, such as slower inference speed and increased space occupancy. PEFT assists the model in adapting to tasks through minimal parameter modifications, but the training process still demands high hardware requirements, even with a small number of parameters involved. To address these challenges, we propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning, maintaining low inference costs. RTD constructs a reference datastore from the provided training examples and optimizes the LLM's final vocabulary distribution by flexibly selecting suitable references based on the input, resulting in more trustable responses and enabling the model to adapt to downstream tasks at a low cost. Experimental evaluations on various LLMs using different benchmarks demonstrate that RTD establishes a new paradigm for augmenting models to downstream tasks. Furthermore, our method exhibits strong orthogonality with traditional methods, allowing for concurrent usage.

SimPLe · Performer · 原點 · PAR · 輸入分布 ·

2024 年 9 月 29 日

Making Quickhull More Like Quicksort: A Simple Randomized Output-Sensitive Convex Hull Algorithm

Michael T. Goodrich,Ryuto Kitagawa

In this paper, we present Ray-shooting Quickhull, which is a simple, randomized, outputsensitive version of the Quickhull algorithm for constructing the convex hull of a set of n points in the plane. We show that the randomized Ray-shooting Quickhull algorithm runs in O(n log h) expected time, where h is the number of points on the boundary of the convex hull. Keeping with the spirit of the original Quickhull algorithm, our algorithm is quite simple and is, in fact, closer in spirit to the well-known randomized Quicksort algorithm. Unlike the original Quickhull algorithm, however, which can run in ${\Theta}(n^2) time$ for some input distributions, the expected performance bounds for the randomized Ray-shooting Quickhull algorithm match or improve the performance bounds of more complicated algorithms. Importantly, the expectation in our output-sensitive performance bound does not depend on assumptions about the distribution of input points. Still, we show that, like the deterministic Quickhull algorithm, our randomized Ray-shooting Quickhull algorithm runs in O(n) expected time for n points chosen uniformly at random from a bounded convex region. We also provide experimental evidence that the randomized Ray-shooting Quickhull algorithm is on par or faster than deterministic Quickhull in practice, depending on the input distribution.

MoDELS · 語言模型化 · Learning · ML · Machine Learning ·

2024 年 9 月 27 日

LML: Language Model Learning a Dataset for Data-Augmented Prediction

Praneeth Vadlapati

from arxiv, First version

This paper introduces a new approach to using Large Language Models (LLMs) for classification tasks, which are typically handled using Machine Learning (ML) models. Unlike ML models that rely heavily on data cleaning and feature engineering, this method streamlines the process using LLMs. This paper proposes a new concept called "Language Model Learning (LML)" powered by a new method called "Data-Augmented Prediction (DAP)". The classification is performed by LLMs using a method similar to humans manually exploring and understanding the data and deciding classifications using data as a reference. Training data is summarized and evaluated to determine the features that lead to the classification of each label the most. In the process of DAP, the system uses the data summary to automatically create a query, which is used to retrieve relevant rows from the dataset. A classification is generated by the LLM using data summary and relevant rows, ensuring satisfactory accuracy even with complex data. Usage of data summary and similar data in DAP ensures context-aware decision-making. The proposed method uses the words "Act as an Explainable Machine Learning Model" in the prompt to enhance the interpretability of the predictions by allowing users to review the logic behind each prediction. In some test cases, the system scored an accuracy above 90%, proving the effectiveness of the system and its potential to outperform conventional ML models in various scenarios. The code is available at //github.com/Pro-GenAI/LML-DAP

SOFT · 曲率 · 通道 · 傳感器 · SAC ·

2024 年 9 月 27 日

Soft Acoustic Curvature Sensor: Design and Development

Mohammad Sheikh Sofla,Hanita Golshanian,Vishnu Rajendran S,Amir Ghalamzan E

from arxiv, To appear in Robotics and Automation Letter

This paper introduces a novel Soft Acoustic Curvature (SAC) sensor. SAC incorporates integrated audio components and features an acoustic channel within a flexible structure. A reference acoustic wave, generated by a speaker at one end of the channel, propagates and is received by a microphone at the other channel's end. Our previous study revealed that acoustic wave energy dissipation varies with acoustic channel deformation, leading us to design a novel channel capable of large deformation due to bending. We then use Machine Learning (ML) models to establish a complex mapping between channel deformations and sound modulation. Various sound frequencies and ML models were evaluated to enhance curvature detection accuracy. The sensor, constructed using soft material and 3D printing, was validated experimentally, with curvature measurement errors remaining within 3.5 m-1 for a range of 0 to 60 m-1 curvatures. These results demonstrate the effectiveness of the proposed method for estimating curvatures. With its flexible structure, the SAC sensor holds potential for applications in soft robotics, including shape measurement for continuum manipulators, soft grippers, and wearable devices.

優化器 · 估計/估計量 · CF · Performer · 設計 ·

2024 年 9 月 27 日

Optical ISAC: Fundamental Performance Limits and Transceiver Design

Alireza Ghazavi Khorasgani,Mahtab Mirmohseni,Ahmed Elzanaty

from arxiv, This paper is 8 pages long and includes 1 algorithm, 3 figures, and 3 tables. It has been accepted for presentation at the 2024 Global Communications Conference. For further discussion, please visit AlphaXiv or email the authors

This paper characterizes the optimal Capacity-Distortion (C-D) tradeoff in an optical point-to-point system with Single-Input Single-Output (SISO) for communication and Single-Input Multiple-Output (SIMO) for sensing within an Integrated Sensing and Communication (ISAC) framework. We consider the optimal Rate-Distortion (R-D) region and explore several Inner (IB) and Outer Bounds (OB). We introduce practical, asymptotically optimal Maximum A Posteriori (MAP) and Maximum Likelihood Estimators (MLE) for target distance, addressing nonlinear measurement-to-state relationships and non-conjugate priors. As the number of sensing antennas increases, these estimators converge to the Bayesian Cram\'er-Rao Bound (BCRB). We also establish that the achievable Rate-Cram\'er-Rao Bound (R-CRB) serves as an OB for the optimal C-D region, valid for both unbiased estimators and asymptotically large numbers of receive antennas. To clarify that the input distribution determines the tradeoff across the Pareto boundary of the C-D region, we propose two algorithms: i) an iterative Blahut-Arimoto Algorithm (BAA)-type method, and ii) a memory-efficient Closed-Form (CF) approach. The CF approach includes a CF optimal distribution for high Optical Signal-to-Noise Ratio (O-SNR) conditions. Additionally, we adapt and refine the Deterministic-Random Tradeoff (DRT) to this optical ISAC context.

語音增強 · CC · Boosting（一種模型訓練加速方式） · 設計 · MoDELS ·

2024 年 9 月 27 日

Speech Boosting: Low-Latency Live Speech Enhancement for TWS Earbuds

Hanbin Bae,Pavel Andreev,Azat Saginbaev,Nicholas Babaev,Won-Jun Lee,Hosang Sung,Hoon-Young Cho

from arxiv, Accepted by Interspeech 2024

This paper introduces a speech enhancement solution tailored for true wireless stereo (TWS) earbuds on-device usage. The solution was specifically designed to support conversations in noisy environments, with active noise cancellation (ANC) activated. The primary challenges for speech enhancement models in this context arise from computational complexity that limits on-device usage and latency that must be less than 3 ms to preserve a live conversation. To address these issues, we evaluated several crucial design elements, including the network architecture and domain, design of loss functions, pruning method, and hardware-specific optimization. Consequently, we demonstrated substantial improvements in speech enhancement quality compared with that in baseline models, while simultaneously reducing the computational complexity and algorithmic latency.

稀疏化 · 有偏 · Learning · Analysis · MoDELS ·

2024 年 9 月 27 日

Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning

Wenxuan Zhou,Zhihao Qu,Shen-Huan Lyu,Miao Cai,Baoliu Ye

This paper introduces a novel framework designed to achieve a high compression ratio in Split Learning (SL) scenarios where resource-constrained devices are involved in large-scale model training. Our investigations demonstrate that compressing feature maps within SL leads to biased gradients that can negatively impact the convergence rates and diminish the generalization capabilities of the resulting models. Our theoretical analysis provides insights into how compression errors critically hinder SL performance, which previous methodologies underestimate. To address these challenges, we employ a narrow bit-width encoded mask to compensate for the sparsification error without increasing the order of time complexity. Supported by rigorous theoretical analysis, our framework significantly reduces compression errors and accelerates the convergence. Extensive experiments also verify that our method outperforms existing solutions regarding training efficiency and communication complexity.

INFORMS · Less · 優化器 · 模型評估 · 損失 ·

2024 年 9 月 26 日

SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining

Ruiqi Xian,Xiyang Wu,Tianrui Guan,Xijun Wang,Boqing Gong,Dinesh Manocha

We introduce SOAR, a novel Self-supervised pretraining algorithm for aerial footage captured by Unmanned Aerial Vehicles (UAVs). We incorporate human object knowledge throughout the pretraining process to enhance UAV video pretraining efficiency and downstream action recognition performance. This is in contrast to prior works that primarily incorporate object information during the fine-tuning stage. Specifically, we first propose a novel object-aware masking strategy designed to retain the visibility of certain patches related to objects throughout the pretraining phase. Second, we introduce an object-aware loss function that utilizes object information to adjust the reconstruction loss, preventing bias towards less informative background patches. In practice, SOAR with a vanilla ViT backbone, outperforms best UAV action recognition models, recording a 9.7% and 21.4% boost in top-1 accuracy on the NEC-Drone and UAV-Human datasets, while delivering an inference speed of 18.7ms per video, making it 2x to 5x faster. Additionally, SOAR obtains comparable accuracy to prior self-supervised learning (SSL) methods while requiring 87.5% less pretraining time and 25% less memory usage

MoDELS · 語言模型化 · Performer · 數據集 · 大語言模型 ·

2024 年 9 月 26 日

Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect

Guokan Shang,Hadi Abdine,Yousef Khoubrane,Amr Mohamed,Yassine Abbahaddou,Sofiane Ennadir,Imane Momayiz,Xuguang Ren,Eric Moulines,Preslav Nakov,Michalis Vazirgiannis,Eric Xing

We introduce Atlas-Chat, the first-ever collection of large language models specifically developed for dialectal Arabic. Focusing on Moroccan Arabic, also known as Darija, we construct our instruction dataset by consolidating existing Darija language resources, creating novel datasets both manually and synthetically, and translating English instructions with stringent quality control. Atlas-Chat-9B and 2B models, fine-tuned on the dataset, exhibit superior ability in following Darija instructions and performing standard NLP tasks. Notably, our models outperform both state-of-the-art and Arabic-specialized LLMs like LLaMa, Jais, and AceGPT, e.g., achieving a 13% performance boost over a larger 13B model on DarijaMMLU, in our newly introduced evaluation suite for Darija covering both discriminative and generative tasks. Furthermore, we perform an experimental analysis of various fine-tuning strategies and base model choices to determine optimal configurations. All our resources are publicly accessible, and we believe our work offers comprehensive design methodologies of instruction-tuning for low-resource language variants, which are often neglected in favor of data-rich languages by contemporary LLMs.