91婷婷国产精选国产色_国产无遮挡又黄又爽不要VIP软_国产精品亚洲A天堂_国产人成综合色国产成人综合频道_国产一级理论片在线观看_AVTT天堂东京热一道本_三级片视频网站国产

This paper explores the pitfalls in evaluating multilingual automatic speech recognition (ASR) models, with a particular focus on Indic language scripts. We investigate the text normalization routine employed by leading ASR models, including OpenAI Whisper, Meta's MMS, Seamless, and Assembly AI's Conformer, and their unintended consequences on performance metrics. Our research reveals that current text normalization practices, while aiming to standardize ASR outputs for fair comparison, by removing inconsistencies such as variations in spelling, punctuation, and special characters, are fundamentally flawed when applied to Indic scripts. Through empirical analysis using text similarity scores and in-depth linguistic examination, we demonstrate that these flaws lead to artificially improved performance metrics for Indic languages. We conclude by proposing a shift towards developing text normalization routines that leverage native linguistic expertise, ensuring more robust and accurate evaluations of multilingual ASR models.

相關內容

語音識別

關注 753

語音識別是計算機科學和計算語言學的一個跨學科子領域，它發展了一些方法和技術，使計算機可以將口語識別和翻譯成文本。它也被稱為自動語音識別（ASR），計算機語音識別或語音轉文本（STT）。它整合了計算機科學，語言學和計算機工程領域的知識和研究。

MoDELS · 設計 · 生成模型 · 離散化 · 余弦 ·

2024 年 12 月 19 日

DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space

Mang Ning,Mingxiao Li,Jianlin Su,Haozhe Jia,Lanmiao Liu,Martin Bene?,Albert Ali Salah,Itir Onal Ertugrul

from arxiv, 23 pages

This paper explores image modeling from the frequency space and introduces DCTdiff, an end-to-end diffusion generative paradigm that efficiently models images in the discrete cosine transform (DCT) space. We investigate the design space of DCTdiff and reveal the key design factors. Experiments on different frameworks (UViT, DiT), generation tasks, and various diffusion samplers demonstrate that DCTdiff outperforms pixel-based diffusion models regarding generative quality and training efficiency. Remarkably, DCTdiff can seamlessly scale up to high-resolution generation without using the latent diffusion paradigm. Finally, we illustrate several intriguing properties of DCT image modeling. For example, we provide a theoretical proof of why `image diffusion can be seen as spectral autoregression', bridging the gap between diffusion and autoregressive models. The effectiveness of DCTdiff and the introduced properties suggest a promising direction for image modeling in the frequency space. The code is at \url{//github.com/forever208/DCTdiff}.

MoDELS · 潛在 · state-of-the-art · 情景 · 設計 ·

2024 年 12 月 19 日

EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space

Jianrong Zhang,Hehe Fan,Yi Yang

from arxiv, Project page: //jiro-zhang.github.io/EnergyMoGen/

Diffusion models, particularly latent diffusion models, have demonstrated remarkable success in text-driven human motion generation. However, it remains challenging for latent diffusion models to effectively compose multiple semantic concepts into a single, coherent motion sequence. To address this issue, we propose EnergyMoGen, which includes two spectrums of Energy-Based Models: (1) We interpret the diffusion model as a latent-aware energy-based model that generates motions by composing a set of diffusion models in latent space; (2) We introduce a semantic-aware energy model based on cross-attention, which enables semantic composition and adaptive gradient descent for text embeddings. To overcome the challenges of semantic inconsistency and motion distortion across these two spectrums, we introduce Synergistic Energy Fusion. This design allows the motion latent diffusion model to synthesize high-quality, complex motions by combining multiple energy terms corresponding to textual descriptions. Experiments show that our approach outperforms existing state-of-the-art models on various motion generation tasks, including text-to-motion generation, compositional motion generation, and multi-concept motion generation. Additionally, we demonstrate that our method can be used to extend motion datasets and improve the text-to-motion task.

圖 · CASES · MoDELS · Engineering · CASE ·

2024 年 12 月 18 日

Localized RETE for Incremental Graph Queries with Nested Graph Conditions

Matthias Barkowsky,Holger Giese

from arxiv, arXiv admin note: substantial text overlap with arXiv:2405.01145

The growing size of graph-based modeling artifacts in model-driven engineering calls for techniques that enable efficient execution of graph queries. Incremental approaches based on the RETE algorithm provide an adequate solution in many scenarios, but are generally designed to search for query results over the entire graph. However, in certain situations, a user may only be interested in query results for a subgraph, for instance when a developer is working on a large model of which only a part is loaded into their workspace. In this case, the global execution semantics can result in significant computational overhead. To mitigate the outlined shortcoming, in this article we propose an extension of the RETE approach that enables local, yet fully incremental execution of graph queries, while still guaranteeing completeness of results with respect to the relevant subgraph. We empirically evaluate the presented approach via experiments inspired by a scenario from software development and with queries and data from an independent social network benchmark. The experimental results indicate that the proposed technique can significantly improve performance regarding memory consumption and execution time in favorable cases, but may incur a noticeable overhead in unfavorable cases.

MoDELS · 優化器 · 推斷 · 混合專家模型 · 知識 (knowledge) ·

2024 年 12 月 18 日

A Survey on Inference Optimization Techniques for Mixture of Experts Models

Jiacheng Liu,Peng Tang,Wenfeng Wang,Yuhang Ren,Xiaofeng Hou,Pheng-Ann Heng,Minyi Guo,Chao Li

from arxiv, Work in Progress

The emergence of large-scale Mixture of Experts (MoE) models has marked a significant advancement in artificial intelligence, offering enhanced model capacity and computational efficiency through conditional computation. However, the deployment and inference of these models present substantial challenges in terms of computational resources, latency, and energy efficiency. This comprehensive survey systematically analyzes the current landscape of inference optimization techniques for MoE models across the entire system stack. We first establish a taxonomical framework that categorizes optimization approaches into model-level, system-level, and hardware-level optimizations. At the model level, we examine architectural innovations including efficient expert design, attention mechanisms, various compression techniques such as pruning, quantization, and knowledge distillation, as well as algorithm improvement including dynamic routing strategies and expert merging methods. At the system level, we investigate distributed computing approaches, load balancing mechanisms, and efficient scheduling algorithms that enable scalable deployment. Furthermore, we delve into hardware-specific optimizations and co-design strategies that maximize throughput and energy efficiency. This survey not only provides a structured overview of existing solutions but also identifies key challenges and promising research directions in MoE inference optimization. Our comprehensive analysis serves as a valuable resource for researchers and practitioners working on large-scale deployment of MoE models in resource-constrained environments. To facilitate ongoing updates and the sharing of cutting-edge advances in MoE inference optimization research, we have established a repository accessible at \url{//github.com/MoE-Inf/awesome-moe-inference/}.

Learning · GROUP · AI · INFORMS · ChatGPT ·

2024 年 12 月 18 日

Evaluating the Effect of Pretesting with Conversational AI on Retention of Needed Information

Mahir Akgun,Sacip Toker

This study explores the role of pretesting when integrated with conversational AI tools, specifically ChatGPT, in enhancing learning outcomes. Drawing on existing research, which demonstrates the benefits of pretesting in memory activation and retention, this experiment extends these insights into the context of digital learning environments. A randomized true experimental study was utilized. Participants were divided into two groups: one engaged in pretesting before using ChatGPT for a problem-solving task involving chi-square analysis, while the control group accessed ChatGPT immediately. The results indicate that the pretest group significantly outperformed the no-pretest group in a subsequent test, which suggests that pretesting enhances the retention of complex material. This study contributes to the field by demonstrating that pretesting can augment the learning process in technology-assisted environments by preparing the memory and promoting active engagement with the material. The findings also suggest that learning strategies like pretesting retain their relevance in the context of rapidly evolving AI technologies. Further research and practical implications are presented.

置信度 · Processing（編程語言） · 估計/估計量 · MoDELS · 泛函 ·

2024 年 12 月 18 日

Combining Priors with Experience: Confidence Calibration Based on Binomial Process Modeling

Jinzong Dong,Zhaohui Jiang,Dong Pan,Haoyang Yu

from arxiv, Accepted by AAAI-25

Confidence calibration of classification models is a technique to estimate the true posterior probability of the predicted class, which is critical for ensuring reliable decision-making in practical applications. Existing confidence calibration methods mostly use statistical techniques to estimate the calibration curve from data or fit a user-defined calibration function, but often overlook fully mining and utilizing the prior distribution behind the calibration curve. However, a well-informed prior distribution can provide valuable insights beyond the empirical data under the limited data or low-density regions of confidence scores. To fill this gap, this paper proposes a new method that integrates the prior distribution behind the calibration curve with empirical data to estimate a continuous calibration curve, which is realized by modeling the sampling process of calibration data as a binomial process and maximizing the likelihood function of the binomial process. We prove that the calibration curve estimating method is Lipschitz continuous with respect to data distribution and requires a sample size of $3/B$ of that required for histogram binning, where $B$ represents the number of bins. Also, a new calibration metric ($TCE_{bpm}$), which leverages the estimated calibration curve to estimate the true calibration error (TCE), is designed. $TCE_{bpm}$ is proven to be a consistent calibration measure. Furthermore, realistic calibration datasets can be generated by the binomial process modeling from a preset true calibration curve and confidence score distribution, which can serve as a benchmark to measure and compare the discrepancy between existing calibration metrics and the true calibration error. The effectiveness of our calibration method and metric are verified in real-world and simulated data.

任務對話系統 · motivation · 控制器 · 語言模型化 · 有向 ·

2024 年 12 月 17 日

Rethinking the Alignment of Psychotherapy Dialogue Generation with Motivational Interviewing Strategies

Xin Sun,Xiao Tang,Abdallah El Ali,Zhuying Li,Pengjie Ren,Jan de Wit,Jiahuan Pei,Jos A. Bosch

Recent advancements in large language models (LLMs) have shown promise in generating psychotherapeutic dialogues, particularly in the context of motivational interviewing (MI). However, the inherent lack of transparency in LLM outputs presents significant challenges given the sensitive nature of psychotherapy. Applying MI strategies, a set of MI skills, to generate more controllable therapeutic-adherent conversations with explainability provides a possible solution. In this work, we explore the alignment of LLMs with MI strategies by first prompting the LLMs to predict the appropriate strategies as reasoning and then utilizing these strategies to guide the subsequent dialogue generation. We seek to investigate whether such alignment leads to more controllable and explainable generations. Multiple experiments including automatic and human evaluations are conducted to validate the effectiveness of MI strategies in aligning psychotherapy dialogue generation. Our findings demonstrate the potential of LLMs in producing strategically aligned dialogues and suggest directions for practical applications in psychotherapeutic settings.

MoDELS · Performer · Continuity · state-of-the-art · 代價 ·

2024 年 12 月 17 日

Can GPT-O1 Kill All Bugs? An Evaluation of GPT-Family LLMs on QuixBugs

Haichuan Hu,Ye Shang,Guolin Xu,Congqing He,Quanjun Zhang

from arxiv, Accepted to the 6th International Workshop on Automated Program Repair (APR 2025)

LLMs have long demonstrated remarkable effectiveness in automatic program repair (APR), with OpenAI's ChatGPT being one of the most widely used models in this domain. Through continuous iterations and upgrades of GPT-family models, their performance in fixing bugs has already reached state-of-the-art levels. However, there are few works comparing the effectiveness and variations of different versions of GPT-family models on APR. In this work, inspired by the recent public release of the GPT-o1 models, we conduct the first study to compare the effectiveness of different versions of the GPT-family models in APR. We evaluate the performance of the latest version of the GPT-family models (i.e., O1-preview and O1-mini), GPT-4o, and the historical version of ChatGPT on APR. We conduct an empirical study of the four GPT-family models against other LLMs and APR techniques on the QuixBugs benchmark from multiple evaluation perspectives, including repair success rate, repair cost, response length, and behavior patterns. The results demonstrate that O1's repair capability exceeds that of prior GPT-family models, successfully fixing all 40 bugs in the benchmark. Our work can serve as a foundation for further in-depth exploration of the applications of GPT-family models in APR.

MoDELS · 語言模型化 · Weight · 大語言模型 · Performer ·

2024 年 12 月 17 日

DSGram: Dynamic Weighting Sub-Metrics for Grammatical Error Correction in the Era of Large Language Models

Jinxiang Xie,Yilin Li,Xunjian Yin,Xiaojun Wan

from arxiv, Extended version of a paper to appear in AAAI-25

Evaluating the performance of Grammatical Error Correction (GEC) models has become increasingly challenging, as large language model (LLM)-based GEC systems often produce corrections that diverge from provided gold references. This discrepancy undermines the reliability of traditional reference-based evaluation metrics. In this study, we propose a novel evaluation framework for GEC models, DSGram, integrating Semantic Coherence, Edit Level, and Fluency, and utilizing a dynamic weighting mechanism. Our framework employs the Analytic Hierarchy Process (AHP) in conjunction with large language models to ascertain the relative importance of various evaluation criteria. Additionally, we develop a dataset incorporating human annotations and LLM-simulated sentences to validate our algorithms and fine-tune more cost-effective models. Experimental results indicate that our proposed approach enhances the effectiveness of GEC model evaluations.

語言模型化 · MoDELS · Taxonomy · AIM · 散度 ·

2023 年 9 月 3 日

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Yue Zhang,Yafu Li,Leyang Cui,Deng Cai,Lemao Liu,Tingchen Fu,Xinting Huang,Enbo Zhao,Yu Zhang,Yulong Chen,Longyue Wang,Anh Tuan Luu,Wei Bi,Freda Shi,Shuming Shi

from arxiv, work in progress; 32 pages

While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.