国产又色又爽又黄又免费软件_国产日本亚洲一区二区三区_亚洲欧美日韩国产精品专门_午夜一级黄色片在线观看_爱情岛论坛亚洲自拍_无码成人免费全部观看_免费观看精品一区二区视频

We present \textbf{H}ybrid-\textbf{A}utoregressive \textbf{IN}ference Tr\textbf{AN}sducers (HAINAN), a novel architecture for speech recognition that extends the Token-and-Duration Transducer (TDT) model. Trained with randomly masked predictor network outputs, HAINAN supports both autoregressive inference with all network components and non-autoregressive inference without the predictor. Additionally, we propose a novel semi-autoregressive inference paradigm that first generates an initial hypothesis using non-autoregressive inference, followed by refinement steps where each token prediction is regenerated using parallelized autoregression on the initial hypothesis. Experiments on multiple datasets across different languages demonstrate that HAINAN achieves efficiency parity with CTC in non-autoregressive mode and with TDT in autoregressive mode. In terms of accuracy, autoregressive HAINAN outperforms TDT and RNN-T, while non-autoregressive HAINAN significantly outperforms CTC. Semi-autoregressive inference further enhances the model's accuracy with minimal computational overhead, and even outperforms TDT results in some cases. These results highlight HAINAN's flexibility in balancing accuracy and speed, positioning it as a strong candidate for real-world speech recognition applications.

相關內容

推斷(duan)

關注 5

Learning · 泛函 · 操作 · 分解 · 輸出空間 ·

2024 年 11 月 12 日

Basis-to-Basis Operator Learning Using Function Encoders

Tyler Ingebrand,Adam J. Thorpe,Somdatta Goswami,Krishna Kumar,Ufuk Topcu

We present Basis-to-Basis (B2B) operator learning, a novel approach for learning operators on Hilbert spaces of functions based on the foundational ideas of function encoders. We decompose the task of learning operators into two parts: learning sets of basis functions for both the input and output spaces and learning a potentially nonlinear mapping between the coefficients of the basis functions. B2B operator learning circumvents many challenges of prior works, such as requiring data to be at fixed locations, by leveraging classic techniques such as least squares to compute the coefficients. It is especially potent for linear operators, where we compute a mapping between bases as a single matrix transformation with a closed-form solution. Furthermore, with minimal modifications and using the deep theoretical connections between function encoders and functional analysis, we derive operator learning algorithms that are directly analogous to eigen-decomposition and singular value decomposition. We empirically validate B2B operator learning on seven benchmark operator learning tasks and show that it demonstrates a two-orders-of-magnitude improvement in accuracy over existing approaches on several benchmark tasks.

優化器 · 知識 (knowledge) · MoDELS · 語言模型化 · INFORMS ·

2024 年 11 月 12 日

Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Youan Cong,Cheng Wang,Pritom Saha Akash,Kevin Chen-Chuan Chang

We introduce the \textit{Extract-Refine-Retrieve-Read} (ERRR) framework, a novel approach designed to bridge the pre-retrieval information gap in Retrieval-Augmented Generation (RAG) systems through query optimization tailored to meet the specific knowledge requirements of Large Language Models (LLMs). Unlike conventional query optimization techniques used in RAG, the ERRR framework begins by extracting parametric knowledge from LLMs, followed by using a specialized query optimizer for refining these queries. This process ensures the retrieval of only the most pertinent information essential for generating accurate responses. Moreover, to enhance flexibility and reduce computational costs, we propose a trainable scheme for our pipeline that utilizes a smaller, tunable model as the query optimizer, which is refined through knowledge distillation from a larger teacher model. Our evaluations on various question-answering (QA) datasets and with different retrieval systems show that ERRR consistently outperforms existing baselines, proving to be a versatile and cost-effective module for improving the utility and accuracy of RAG systems.

詞元分析器 · MoDELS · 損失 · 語言模型化 · 相同 ·

2024 年 11 月 11 日

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

Taihang Hu,Linxuan Li,Joost van de Weijer,Hongcheng Gao,Fahad Shahbaz Khan,Jian Yang,Ming-Ming Cheng,Kai Wang,Yaxing Wang

from arxiv, Accepted by Neurips2024

Although text-to-image (T2I) models exhibit remarkable generation capabilities, they frequently fail to accurately bind semantically related objects or attributes in the input prompts; a challenge termed semantic binding. Previous approaches either involve intensive fine-tuning of the entire T2I model or require users or large language models to specify generation layouts, adding complexity. In this paper, we define semantic binding as the task of associating a given object with its attribute, termed attribute binding, or linking it to other related sub-objects, referred to as object binding. We introduce a novel method called Token Merging (ToMe), which enhances semantic binding by aggregating relevant tokens into a single composite token. This ensures that the object, its attributes and sub-objects all share the same cross-attention map. Additionally, to address potential confusion among main objects with complex textual prompts, we propose end token substitution as a complementary strategy. To further refine our approach in the initial stages of T2I generation, where layouts are determined, we incorporate two auxiliary losses, an entropy loss and a semantic binding loss, to iteratively update the composite token to improve the generation integrity. We conducted extensive experiments to validate the effectiveness of ToMe, comparing it against various existing methods on the T2I-CompBench and our proposed GPT-4o object binding benchmark. Our method is particularly effective in complex scenarios that involve multiple objects and attributes, which previous methods often fail to address. The code will be publicly available at \url{//github.com/hutaihang/ToMe}.

Learning · DDoS · xgboost · 隨機森林 · 得分 ·

2024 年 11 月 9 日

Harnessing PU Learning for Enhanced Cloud-based DDoS Detection: A Comparative Analysis

Robert Dilworth,Charan Gudla

This paper explores the application of Positive-Unlabeled (PU) learning for enhanced Distributed Denial-of-Service (DDoS) detection in cloud environments. Utilizing the $\texttt{BCCC-cPacket-Cloud-DDoS-2024}$ dataset, we implement PU learning with four machine learning algorithms: XGBoost, Random Forest, Support Vector Machine, and Na\"{i}ve Bayes. Our results demonstrate the superior performance of ensemble methods, with XGBoost and Random Forest achieving $F_{1}$ scores exceeding 98%. We quantify the efficacy of each approach using metrics including $F_{1}$ score, ROC AUC, Recall, and Precision. This study bridges the gap between PU learning and cloud-based anomaly detection, providing a foundation for addressing Context-Aware DDoS Detection in multi-cloud environments. Our findings highlight the potential of PU learning in scenarios with limited labeled data, offering valuable insights for developing more robust and adaptive cloud security mechanisms.

語音合成 · MoDELS · 語言模型化 · Processing（編程語言） · HTTPS ·

2024 年 11 月 9 日

Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis

Shijia Liao,Yuxuan Wang,Tianyu Li,Yifan Cheng,Ruoyi Zhang,Rongzhi Zhou,Yijin Xing

Text-to-Speech (TTS) systems face ongoing challenges in processing complex linguistic features, handling polyphonic expressions, and producing natural-sounding multilingual speech - capabilities that are crucial for future AI applications. In this paper, we present Fish-Speech, a novel framework that implements a serial fast-slow Dual Autoregressive (Dual-AR) architecture to enhance the stability of Grouped Finite Scalar Vector Quantization (GFSQ) in sequence generation tasks. This architecture improves codebook processing efficiency while maintaining high-fidelity outputs, making it particularly effective for AI interactions and voice cloning. Fish-Speech leverages Large Language Models (LLMs) for linguistic feature extraction, eliminating the need for traditional grapheme-to-phoneme (G2P) conversion and thereby streamlining the synthesis pipeline and enhancing multilingual support. Additionally, we developed FF-GAN through GFSQ to achieve superior compression ratios and near 100\% codebook utilization. Our approach addresses key limitations of current TTS systems while providing a foundation for more sophisticated, context-aware speech synthesis. Experimental results show that Fish-Speech significantly outperforms baseline models in handling complex linguistic scenarios and voice cloning tasks, demonstrating its potential to advance TTS technology in AI applications. The implementation is open source at \href{//github.com/fishaudio/fish-speech}{//github.com/fishaudio/fish-speech}.

邊 · Extensibility · 近似 · 穩健性 · motivation ·

2024 年 11 月 9 日

Wavelet-based Edge Multiscale Finite Element Methods for Singularly Perturbed Convection-Diffusion Equations

Shubin Fu,Eric Chung,Guanglian Li

We propose a novel efficient and robust Wavelet-based Edge Multiscale Finite Element Method (WEMsFEM) motivated by \cite{MR3980476,GL18} to solve the singularly perturbed convection-diffusion equations. The main idea is to first establish a local splitting of the solution over a local region by a local bubble part and local Harmonic extension part, and then derive a global splitting by means of Partition of Unity. This facilitates a representation of the solution as a summation of a global bubble part and a global Harmonic extension part, where the first part can be computed locally in parallel. To approximate the second part, we construct an edge multiscale ansatz space locally with hierarchical bases as the local boundary data that has a guaranteed approximation rate \noteLg{both inside and outside of the layers}. The key innovation of this proposed WEMsFEM lies in a provable convergence rate with little restriction on the mesh size. Its convergence rate with respect to the computational degree of freedom is rigorously analyzed, which is verified by extensive 2-d and 3-d numerical tests.

可理解性 · INFORMS · 情景 · Analysis · MoDELS ·

2024 年 11 月 8 日

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

Yilun Zhao,Yitao Long,Yuru Jiang,Chengye Wang,Weiyuan Chen,Hongjun Liu,Yiming Zhang,Xiangru Tang,Chen Zhao,Arman Cohan

from arxiv, EMNLP 2024

We introduce FinDVer, a comprehensive benchmark specifically designed to evaluate the explainable claim verification capabilities of LLMs in the context of understanding and analyzing long, hybrid-content financial documents. FinDVer contains 2,400 expert-annotated examples, divided into three subsets: information extraction, numerical reasoning, and knowledge-intensive reasoning, each addressing common scenarios encountered in real-world financial contexts. We assess a broad spectrum of LLMs under long-context and RAG settings. Our results show that even the current best-performing system, GPT-4o, still lags behind human experts. We further provide in-depth analysis on long-context and RAG setting, Chain-of-Thought reasoning, and model reasoning errors, offering insights to drive future advancements. We believe that FinDVer can serve as a valuable benchmark for evaluating LLMs in claim verification over complex, expert-domain documents.

泛函 · GPS · 跡 · Processing（編程語言） · 可辨認的 ·

2024 年 11 月 8 日

Generating Synthetic Functional Data for Privacy-Preserving GPS Trajectories

Arianna Burzacchi,Lise Bellanger,Klervi Le Gall,Aymeric Stamm,Simone Vantini

from arxiv, Updated version, correction of the notation

This research presents FDASynthesis, a novel algorithm designed to generate synthetic GPS trajectory data while preserving privacy. After pre-processing the input GPS data, human mobility traces are modeled as multidimensional curves using Functional Data Analysis (FDA). Then, the synthesis process identifies the K-nearest trajectories and averages their Square-Root Velocity Functions (SRVFs) to generate synthetic data. This results in synthetic trajectories that maintain the utility of the original data while ensuring privacy. Although applied for human mobility research, FDASynthesis is highly adaptable to different types of functional data, offering a scalable solution in various application domains.

INFORMS · 圖 · 可約的 · 知識圖譜 · 可辨認的 ·

2018 年 8 月 29 日

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Yi Luan,Luheng He,Mari Ostendorf,Hannaneh Hajishirzi

We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.

判別器 · Performer · 降維 · 卷積神經網絡 · 多任務學習 ·

2018 年 1 月 25 日

NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Yuan Gao,Qi She,Jiayi Ma,Mingbo Zhao,Wei Liu,Alan L. Yuille

from arxiv, 11 pages, 5 figures, 7 tables

State-of-the-art Convolutional Neural Network (CNN) benefits a lot from multi-task learning (MTL), which learns multiple related tasks simultaneously to obtain shared or mutually related representations for different tasks. The most widely-used MTL CNN structure is based on an empirical or heuristic split on a specific layer (e.g., the last convolutional layer) to minimize different task-specific losses. However, this heuristic sharing/splitting strategy may be harmful to the final performance of one or multiple tasks. In this paper, we propose a novel CNN structure for MTL, which enables automatic feature fusing at every layer. Specifically, we first concatenate features from different tasks according to their channel dimension, and then formulate the feature fusing problem as discriminative dimensionality reduction. We show that this discriminative dimensionality reduction can be done by 1x1 Convolution, Batch Normalization, and Weight Decay in one CNN, which we refer to as Neural Discriminative Dimensionality Reduction (NDDR). We perform ablation analysis in details for different configurations in training the network. The experiments carried out on different network structures and different task sets demonstrate the promising performance and desirable generalizability of our proposed method.