亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper introduces FLEURS-R, a speech restoration applied version of the Few-shot Learning Evaluation of Universal Representations of Speech (FLEURS) corpus. FLEURS-R maintains an N-way parallel speech corpus in 102 languages as FLEURS, with improved audio quality and fidelity by applying the speech restoration model Miipher. The aim of FLEURS-R is to advance speech technology in more languages and catalyze research including text-to-speech (TTS) and other speech generation tasks in low-resource languages. Comprehensive evaluations with the restored speech and TTS baseline models trained from the new corpus show that the new corpus obtained significantly improved speech quality while maintaining the semantic contents of the speech. The corpus is publicly released via Hugging Face.

相關內容

語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)(Speech Synthesis),也(ye)稱為文(wen)語(yu)(yu)(yu)(yu)(yu)轉(zhuan)換(huan)(Text-to-Speech, TTS,它是將任意的(de)(de)(de)輸(shu)入(ru)文(wen)本(ben)(ben)轉(zhuan)換(huan)成(cheng)(cheng)自然流暢的(de)(de)(de)語(yu)(yu)(yu)(yu)(yu)音(yin)輸(shu)出。語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)涉及到(dao)(dao)人(ren)工智(zhi)能、心(xin)理學(xue)、聲學(xue)、語(yu)(yu)(yu)(yu)(yu)言(yan)學(xue)、數字信號處理、計算(suan)機(ji)科學(xue)等(deng)(deng)多個學(xue)科技術,是信息(xi)處理領(ling)域中(zhong)的(de)(de)(de)一項前沿技術。 隨著計算(suan)機(ji)技術的(de)(de)(de)不斷提(ti)高,語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)技術從早期的(de)(de)(de)共振峰合(he)(he)成(cheng)(cheng),逐步發展為波形拼接合(he)(he)成(cheng)(cheng)和統(tong)計參數語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng),再發展到(dao)(dao)混合(he)(he)語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng);合(he)(he)成(cheng)(cheng)語(yu)(yu)(yu)(yu)(yu)音(yin)的(de)(de)(de)質量(liang)、自然度(du)已經(jing)得(de)到(dao)(dao)明顯提(ti)高,基本(ben)(ben)能滿足一些特(te)定場合(he)(he)的(de)(de)(de)應(ying)用需求。目(mu)前,語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)技術在(zai)銀行、醫(yi)院等(deng)(deng)的(de)(de)(de)信息(xi)播報系(xi)統(tong)、汽車導(dao)航系(xi)統(tong)、自動應(ying)答呼叫中(zhong)心(xin)等(deng)(deng)都有廣泛應(ying)用,取得(de)了(le)巨(ju)大的(de)(de)(de)經(jing)濟效益。 另外,隨著智(zhi)能手機(ji)、MP3、PDA 等(deng)(deng)與我們(men)生(sheng)活密(mi)切(qie)相(xiang)關(guan)的(de)(de)(de)媒介(jie)的(de)(de)(de)大量(liang)涌現,語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)的(de)(de)(de)應(ying)用也(ye)在(zai)逐漸(jian)向娛樂、語(yu)(yu)(yu)(yu)(yu)音(yin)教學(xue)、康復治療等(deng)(deng)領(ling)域深入(ru)。可以(yi)說語(yu)(yu)(yu)(yu)(yu)音(yin)合(he)(he)成(cheng)(cheng)正在(zai)影(ying)響著人(ren)們(men)生(sheng)活的(de)(de)(de)方(fang)方(fang)面(mian)面(mian)。

We introduce AmbigNLG, a novel task designed to tackle the challenge of task ambiguity in instructions for Natural Language Generation (NLG). Ambiguous instructions often impede the performance of Large Language Models (LLMs), especially in complex NLG tasks. To tackle this issue, we propose an ambiguity taxonomy that categorizes different types of instruction ambiguities and refines initial instructions with clearer specifications. Accompanying this task, we present AmbigSNI-NLG, a dataset comprising 2,500 instances annotated to facilitate research in AmbigNLG. Through comprehensive experiments with state-of-the-art LLMs, we demonstrate that our method significantly enhances the alignment of generated text with user expectations, achieving up to a 15.02-point increase in ROUGE scores. Our findings highlight the critical importance of addressing task ambiguity to fully harness the capabilities of LLMs in NLG tasks. Furthermore, we confirm the effectiveness of our method in practical settings involving interactive ambiguity mitigation with users, underscoring the benefits of leveraging LLMs for interactive clarification.

Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and Parameter-Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting LLMs to downstream tasks. ICL typically constructs a few-shot learning scenario, either manually or by setting up a Retrieval-Augmented Generation (RAG) system, helping models quickly grasp domain knowledge or question-answering patterns without changing model parameters. However, this approach involves trade-offs, such as slower inference speed and increased space occupancy. PEFT assists the model in adapting to tasks through minimal parameter modifications, but the training process still demands high hardware requirements, even with a small number of parameters involved. To address these challenges, we propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning, maintaining low inference costs. RTD constructs a reference datastore from the provided training examples and optimizes the LLM's final vocabulary distribution by flexibly selecting suitable references based on the input, resulting in more trustable responses and enabling the model to adapt to downstream tasks at a low cost. Experimental evaluations on various LLMs using different benchmarks demonstrate that RTD establishes a new paradigm for augmenting models to downstream tasks. Furthermore, our method exhibits strong orthogonality with traditional methods, allowing for concurrent usage.

In this paper, we present Ray-shooting Quickhull, which is a simple, randomized, outputsensitive version of the Quickhull algorithm for constructing the convex hull of a set of n points in the plane. We show that the randomized Ray-shooting Quickhull algorithm runs in O(n log h) expected time, where h is the number of points on the boundary of the convex hull. Keeping with the spirit of the original Quickhull algorithm, our algorithm is quite simple and is, in fact, closer in spirit to the well-known randomized Quicksort algorithm. Unlike the original Quickhull algorithm, however, which can run in ${\Theta}(n^2) time$ for some input distributions, the expected performance bounds for the randomized Ray-shooting Quickhull algorithm match or improve the performance bounds of more complicated algorithms. Importantly, the expectation in our output-sensitive performance bound does not depend on assumptions about the distribution of input points. Still, we show that, like the deterministic Quickhull algorithm, our randomized Ray-shooting Quickhull algorithm runs in O(n) expected time for n points chosen uniformly at random from a bounded convex region. We also provide experimental evidence that the randomized Ray-shooting Quickhull algorithm is on par or faster than deterministic Quickhull in practice, depending on the input distribution.

This paper introduces a new approach to using Large Language Models (LLMs) for classification tasks, which are typically handled using Machine Learning (ML) models. Unlike ML models that rely heavily on data cleaning and feature engineering, this method streamlines the process using LLMs. This paper proposes a new concept called "Language Model Learning (LML)" powered by a new method called "Data-Augmented Prediction (DAP)". The classification is performed by LLMs using a method similar to humans manually exploring and understanding the data and deciding classifications using data as a reference. Training data is summarized and evaluated to determine the features that lead to the classification of each label the most. In the process of DAP, the system uses the data summary to automatically create a query, which is used to retrieve relevant rows from the dataset. A classification is generated by the LLM using data summary and relevant rows, ensuring satisfactory accuracy even with complex data. Usage of data summary and similar data in DAP ensures context-aware decision-making. The proposed method uses the words "Act as an Explainable Machine Learning Model" in the prompt to enhance the interpretability of the predictions by allowing users to review the logic behind each prediction. In some test cases, the system scored an accuracy above 90%, proving the effectiveness of the system and its potential to outperform conventional ML models in various scenarios. The code is available at //github.com/Pro-GenAI/LML-DAP

This paper introduces a novel Soft Acoustic Curvature (SAC) sensor. SAC incorporates integrated audio components and features an acoustic channel within a flexible structure. A reference acoustic wave, generated by a speaker at one end of the channel, propagates and is received by a microphone at the other channel's end. Our previous study revealed that acoustic wave energy dissipation varies with acoustic channel deformation, leading us to design a novel channel capable of large deformation due to bending. We then use Machine Learning (ML) models to establish a complex mapping between channel deformations and sound modulation. Various sound frequencies and ML models were evaluated to enhance curvature detection accuracy. The sensor, constructed using soft material and 3D printing, was validated experimentally, with curvature measurement errors remaining within 3.5 m-1 for a range of 0 to 60 m-1 curvatures. These results demonstrate the effectiveness of the proposed method for estimating curvatures. With its flexible structure, the SAC sensor holds potential for applications in soft robotics, including shape measurement for continuum manipulators, soft grippers, and wearable devices.

This paper characterizes the optimal Capacity-Distortion (C-D) tradeoff in an optical point-to-point system with Single-Input Single-Output (SISO) for communication and Single-Input Multiple-Output (SIMO) for sensing within an Integrated Sensing and Communication (ISAC) framework. We consider the optimal Rate-Distortion (R-D) region and explore several Inner (IB) and Outer Bounds (OB). We introduce practical, asymptotically optimal Maximum A Posteriori (MAP) and Maximum Likelihood Estimators (MLE) for target distance, addressing nonlinear measurement-to-state relationships and non-conjugate priors. As the number of sensing antennas increases, these estimators converge to the Bayesian Cram\'er-Rao Bound (BCRB). We also establish that the achievable Rate-Cram\'er-Rao Bound (R-CRB) serves as an OB for the optimal C-D region, valid for both unbiased estimators and asymptotically large numbers of receive antennas. To clarify that the input distribution determines the tradeoff across the Pareto boundary of the C-D region, we propose two algorithms: i) an iterative Blahut-Arimoto Algorithm (BAA)-type method, and ii) a memory-efficient Closed-Form (CF) approach. The CF approach includes a CF optimal distribution for high Optical Signal-to-Noise Ratio (O-SNR) conditions. Additionally, we adapt and refine the Deterministic-Random Tradeoff (DRT) to this optical ISAC context.

This paper introduces a speech enhancement solution tailored for true wireless stereo (TWS) earbuds on-device usage. The solution was specifically designed to support conversations in noisy environments, with active noise cancellation (ANC) activated. The primary challenges for speech enhancement models in this context arise from computational complexity that limits on-device usage and latency that must be less than 3 ms to preserve a live conversation. To address these issues, we evaluated several crucial design elements, including the network architecture and domain, design of loss functions, pruning method, and hardware-specific optimization. Consequently, we demonstrated substantial improvements in speech enhancement quality compared with that in baseline models, while simultaneously reducing the computational complexity and algorithmic latency.

This paper introduces a novel framework designed to achieve a high compression ratio in Split Learning (SL) scenarios where resource-constrained devices are involved in large-scale model training. Our investigations demonstrate that compressing feature maps within SL leads to biased gradients that can negatively impact the convergence rates and diminish the generalization capabilities of the resulting models. Our theoretical analysis provides insights into how compression errors critically hinder SL performance, which previous methodologies underestimate. To address these challenges, we employ a narrow bit-width encoded mask to compensate for the sparsification error without increasing the order of time complexity. Supported by rigorous theoretical analysis, our framework significantly reduces compression errors and accelerates the convergence. Extensive experiments also verify that our method outperforms existing solutions regarding training efficiency and communication complexity.

We introduce SOAR, a novel Self-supervised pretraining algorithm for aerial footage captured by Unmanned Aerial Vehicles (UAVs). We incorporate human object knowledge throughout the pretraining process to enhance UAV video pretraining efficiency and downstream action recognition performance. This is in contrast to prior works that primarily incorporate object information during the fine-tuning stage. Specifically, we first propose a novel object-aware masking strategy designed to retain the visibility of certain patches related to objects throughout the pretraining phase. Second, we introduce an object-aware loss function that utilizes object information to adjust the reconstruction loss, preventing bias towards less informative background patches. In practice, SOAR with a vanilla ViT backbone, outperforms best UAV action recognition models, recording a 9.7% and 21.4% boost in top-1 accuracy on the NEC-Drone and UAV-Human datasets, while delivering an inference speed of 18.7ms per video, making it 2x to 5x faster. Additionally, SOAR obtains comparable accuracy to prior self-supervised learning (SSL) methods while requiring 87.5% less pretraining time and 25% less memory usage

We introduce Atlas-Chat, the first-ever collection of large language models specifically developed for dialectal Arabic. Focusing on Moroccan Arabic, also known as Darija, we construct our instruction dataset by consolidating existing Darija language resources, creating novel datasets both manually and synthetically, and translating English instructions with stringent quality control. Atlas-Chat-9B and 2B models, fine-tuned on the dataset, exhibit superior ability in following Darija instructions and performing standard NLP tasks. Notably, our models outperform both state-of-the-art and Arabic-specialized LLMs like LLaMa, Jais, and AceGPT, e.g., achieving a 13% performance boost over a larger 13B model on DarijaMMLU, in our newly introduced evaluation suite for Darija covering both discriminative and generative tasks. Furthermore, we perform an experimental analysis of various fine-tuning strategies and base model choices to determine optimal configurations. All our resources are publicly accessible, and we believe our work offers comprehensive design methodologies of instruction-tuning for low-resource language variants, which are often neglected in favor of data-rich languages by contemporary LLMs.

北京阿比特科技有限公司