蜜芽亚洲精品国产品国语在线试看_亚洲国产精品成人综合一区_亚洲AV无码久久久一区二区三区_精品久久久久久久免费加勒比K_久久久久久中文字幕久久久_免费观看又黄又爽的网站_黑人外教啪啪中国女留学生

This paper explores the integration of Large Language Models (LLMs) into Automatic Speech Recognition (ASR) systems to improve transcription accuracy. The increasing sophistication of LLMs, with their in-context learning capabilities and instruction-following behavior, has drawn significant attention in the field of Natural Language Processing (NLP). Our primary focus is to investigate the potential of using an LLM's in-context learning capabilities to enhance the performance of ASR systems, which currently face challenges such as ambient noise, speaker accents, and complex linguistic contexts. We designed a study using the Aishell-1 and LibriSpeech datasets, with ChatGPT and GPT-4 serving as benchmarks for LLM capabilities. Unfortunately, our initial experiments did not yield promising results, indicating the complexity of leveraging LLM's in-context learning for ASR applications. Despite further exploration with varied settings and models, the corrected sentences from the LLMs frequently resulted in higher Word Error Rates (WER), demonstrating the limitations of LLMs in speech applications. This paper provides a detailed overview of these experiments, their results, and implications, establishing that using LLMs' in-context learning capabilities to correct potential errors in speech recognition transcriptions is still a challenging task at the current stage.

相關內容

語音識別

關注 753

語音識(shi)別(bie)是計(ji)(ji)(ji)算(suan)機(ji)科(ke)(ke)學(xue)和(he)(he)計(ji)(ji)(ji)算(suan)語言學(xue)的(de)一個跨學(xue)科(ke)(ke)子領域(yu)，它發展(zhan)了一些(xie)方(fang)法和(he)(he)技術，使計(ji)(ji)(ji)算(suan)機(ji)可以將口語識(shi)別(bie)和(he)(he)翻譯成文本。它也被稱(cheng)為自(zi)動語音識(shi)別(bie)（ASR），計(ji)(ji)(ji)算(suan)機(ji)語音識(shi)別(bie)或語音轉文本（STT）。它整(zheng)合(he)了計(ji)(ji)(ji)算(suan)機(ji)科(ke)(ke)學(xue)，語言學(xue)和(he)(he)計(ji)(ji)(ji)算(suan)機(ji)工程領域(yu)的(de)知(zhi)識(shi)和(he)(he)研究。

Cognition · Integration · MoDELS · 語言模型化 · 穩健性 ·

2023 年 9 月 5 日

Synergistic Integration of Large Language Models and Cognitive Architectures for Robust AI: An Exploratory Analysis

Oscar J. Romero,John Zimmerman,Aaron Steinfeld,Anthony Tomasic

from arxiv, AAAI 2023 Fall Symposium

This paper explores the integration of two AI subdisciplines employed in the development of artificial agents that exhibit intelligent behavior: Large Language Models (LLMs) and Cognitive Architectures (CAs). We present three integration approaches, each grounded in theoretical models and supported by preliminary empirical evidence. The modular approach, which introduces four models with varying degrees of integration, makes use of chain-of-thought prompting, and draws inspiration from augmented LLMs, the Common Model of Cognition, and the simulation theory of cognition. The agency approach, motivated by the Society of Mind theory and the LIDA cognitive architecture, proposes the formation of agent collections that interact at micro and macro cognitive levels, driven by either LLMs or symbolic components. The neuro-symbolic approach, which takes inspiration from the CLARION cognitive architecture, proposes a model where bottom-up learning extracts symbolic representations from an LLM layer and top-down guidance utilizes symbolic representations to direct prompt engineering in the LLM layer. These approaches aim to harness the strengths of both LLMs and CAs, while mitigating their weaknesses, thereby advancing the development of more robust AI systems. We discuss the tradeoffs and challenges associated with each approach.

AI · 可辨認的 · INTERACT · 評論員 · Integration ·

2023 年 9 月 5 日

Exploring the Intersection of Complex Aesthetics and Generative AI for Promoting Cultural Creativity in Rural China after the Post-Pandemic Era

Mengyao Guo,Xiaolin Zhang,Yuan Zhuang,Jing Chen,Pengfei Wang,Ze Gao

from arxiv, Accepted by 2023 the 1st International Conference on AI-generated Content (AIGC2023)

This paper explores using generative AI and aesthetics to promote cultural creativity in rural China amidst COVID-19's impact. Through literature reviews, case studies, surveys, and text analysis, it examines art and technology applications in rural contexts and identifies key challenges. The study finds artworks often fail to resonate locally, while reliance on external artists limits sustainability. Hence, nurturing grassroots "artist villagers" through AI is proposed. Our approach involves training machine learning on subjective aesthetics to generate culturally relevant content. Interactive AI media can also boost tourism while preserving heritage. This pioneering research puts forth original perspectives on the intersection of AI and aesthetics to invigorate rural culture. It advocates holistic integration of technology and emphasizes AI's potential as a creative enabler versus replacement. Ultimately, it lays the groundwork for further exploration of leveraging AI innovations to empower rural communities. This timely study contributes to growing interest in emerging technologies to address critical issues facing rural China.

Boosting（一種模型訓練加速方式） · 樣本 · 人機交互 ·

2023 年 9 月 5 日

Investigating the Impact of a Dual Musical Brain-Computer Interface on Interpersonal Synchrony: A Pilot Study

Anita Vrins,Ethel Pruss,Caterina Ceccato,Jos Prinsen,Maryam Alimardani

from arxiv, 6 pages, 4 figures

This study looked into how effective a Musical Brain-Computer Interface (MBCI) can be in providing feedback about synchrony between two people. Using a double EEG setup, we compared two types of musical feedback; one that adapted in real-time based on the inter-brain synchrony between participants (Neuroadaptive condition), and another music that was randomly generated (Random condition). We evaluated how these two conditions were perceived by 8 dyads (n = 16) and whether the generated music could influence the perceived connection and EEG synchrony between them. The findings indicated that Neuroadaptive musical feedback could potentially boost synchrony levels between people compared to Random feedback, as seen by a significant increase in EEG phase-locking values. Additionally, the real-time measurement of synchrony was successfully validated and musical neurofeedback was generally well-received by the participants. However, more research is needed for conclusive results due to the small sample size. This study is a stepping stone towards creating music that can audibly reflect the level of synchrony between individuals.

AI · 模型評估 · 可辨認的 · INTERACT · CASES ·

2023 年 9 月 5 日

Bridging the Global Divide in AI Regulation: A Proposal for a Contextual, Coherent, and Commensurable Framework

Sangchul Park

This paper examines the current landscape of AI regulations, highlighting the divergent approaches being taken, and proposes an alternative contextual, coherent, and commensurable (3C) framework. The EU, Canada, South Korea, and Brazil follow a horizontal or lateral approach that postulates the homogeneity of AI systems, seeks to identify common causes of harm, and demands uniform human interventions. In contrast, the U.K., Israel, Switzerland, Japan, and China have pursued a context-specific or modular approach, tailoring regulations to the specific use cases of AI systems. The U.S. is reevaluating its strategy, with growing support for controlling existential risks associated with AI. Addressing such fragmentation of AI regulations is crucial to ensure the interoperability of AI. The present degree of proportionality, granularity, and foreseeability of the EU AI Act is not sufficient to garner consensus. The context-specific approach holds greater promises but requires further development in terms of details, coherency, and commensurability. To strike a balance, this paper proposes a hybrid 3C framework. To ensure contextuality, the framework categorizes AI into distinct types based on their usage and interaction with humans: autonomous, allocative, punitive, cognitive, and generative AI. To ensure coherency, each category is assigned specific regulatory objectives: safety for autonomous AI; fairness and explainability for allocative AI; accuracy and explainability for punitive AI; accuracy, robustness, and privacy for cognitive AI; and the mitigation of infringement and misuse for generative AI. To ensure commensurability, the framework promotes the adoption of international industry standards that convert principles into quantifiable metrics. In doing so, the framework is expected to foster international collaboration and standardization without imposing excessive compliance costs.

知識 (knowledge) · 知識提取 · Processing（編程語言） · 數據集 · INFORMS ·

2023 年 9 月 4 日

Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts

Ruth Dannenfelser,Jeffrey Zhong,Ran Zhang,Vicky Yao

from arxiv, Submitted to NeurIPS 2023 Datasets and Benchmarks Track

Many of the most commonly explored natural language processing (NLP) information extraction tasks can be thought of as evaluations of declarative knowledge, or fact-based information extraction. Procedural knowledge extraction, i.e., breaking down a described process into a series of steps, has received much less attention, perhaps in part due to the lack of structured datasets that capture the knowledge extraction process from end-to-end. To address this unmet need, we present FlaMB\'e (Flow annotations for Multiverse Biological entities), a collection of expert-curated datasets across a series of complementary tasks that capture procedural knowledge in biomedical texts. This dataset is inspired by the observation that one ubiquitous source of procedural knowledge that is described as unstructured text is within academic papers describing their methodology. The workflows annotated in FlaMB\'e are from texts in the burgeoning field of single cell research, a research area that has become notorious for the number of software tools and complexity of workflows used. Additionally, FlaMB\'e provides, to our knowledge, the largest manually curated named entity recognition (NER) and disambiguation (NED) datasets for tissue/cell type, a fundamental biological entity that is critical for knowledge extraction in the biomedical research domain. Beyond providing a valuable dataset to enable further development of NLP models for procedural knowledge extraction, automating the process of workflow mining also has important implications for advancing reproducibility in biomedical research.

可理解性 · MoDELS · state-of-the-art · Attention · 端到端 ·

2023 年 9 月 3 日

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Haoyu Cao,Changcun Bao,Chaohu Liu,Huang Chen,Kun Yin,Hao Liu,Yinsong Liu,Deqiang Jiang,Xing Sun

from arxiv, Accepted to ICCV 2023 main conference

We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation. Unlike state-of-the-art approaches that rely on multi-stage technical schemes and are computationally expensive, SeRum converts document image understanding and recognition tasks into a local decoding process of the visual tokens of interest, using a content-aware token merge module. This mechanism enables the model to pay more attention to regions of interest generated by the query decoder, improving the model's effectiveness and speeding up the decoding speed of the generative scheme. We also designed several pre-training tasks to enhance the understanding and local awareness of the model. Experimental results demonstrate that SeRum achieves state-of-the-art performance on document understanding tasks and competitive results on text spotting tasks. SeRum represents a substantial advancement towards enabling efficient and effective end-to-end document understanding.

MoDELS · 數據集 · 可辨認的 · 假陽性 · 相互獨立的 ·

2023 年 9 月 2 日

The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification

Norbert Tihanyi,Tamas Bisztray,Ridhi Jain,Mohamed Amine Ferrag,Lucas C. Cordeiro,Vasileios Mavroeidis

from arxiv, //github.com/FormAI-Dataset

This paper presents the FormAI dataset, a large collection of 112, 000 AI-generated compilable and independent C programs with vulnerability classification. We introduce a dynamic zero-shot prompting technique constructed to spawn diverse programs utilizing Large Language Models (LLMs). The dataset is generated by GPT-3.5-turbo and comprises programs with varying levels of complexity. Some programs handle complicated tasks like network management, table games, or encryption, while others deal with simpler tasks like string manipulation. Every program is labeled with the vulnerabilities found within the source code, indicating the type, line number, and vulnerable function name. This is accomplished by employing a formal verification method using the Efficient SMT-based Bounded Model Checker (ESBMC), which uses model checking, abstract interpretation, constraint programming, and satisfiability modulo theories to reason over safety/security properties in programs. This approach definitively detects vulnerabilities and offers a formal model known as a counterexample, thus eliminating the possibility of generating false positive reports. We have associated the identified vulnerabilities with Common Weakness Enumeration (CWE) numbers. We make the source code available for the 112, 000 programs, accompanied by a separate file containing the vulnerabilities detected in each program, making the dataset ideal for training LLMs and machine learning algorithms. Our study unveiled that according to ESBMC, 51.24% of the programs generated by GPT-3.5 contained vulnerabilities, thereby presenting considerable risks to software safety and security.

MoDELS · 數值分析 ·

2023 年 9 月 1 日

Instabilities of Super-Time-Stepping Methods on the Heston Stochastic Volatility Model

Fabien Le Floc'h

This note explores in more details instabilities of explicit super-time-stepping schemes, such as the Runge-Kutta-Chebyshev or Runge-Kutta-Legendre schemes, noticed in the litterature, when applied to the Heston stochastic volatility model. The stability remarks are relevant beyond the scope of super-time-stepping schemes.

Markov · 隱馬爾科夫模型 · MoDELS · 估計/估計量 · 線性的 ·

2023 年 9 月 1 日

Nonparametric Identification and Estimation of Earnings Dynamics using a Hidden Markov Model: Evidence from the PSID

Tong Zhou

This paper presents a hidden Markov model designed to investigate the complex nature of earnings persistence. The proposed model assumes that the residuals of log-earnings consist of a persistent component and a transitory component, both following general Markov processes. Nonparametric identification is achieved through spectral decomposition of linear operators, and a modified stochastic EM algorithm is introduced for model estimation. Applying the framework to the Panel Study of Income Dynamics (PSID) dataset, we find that the earnings process displays nonlinear persistence, conditional skewness, and conditional kurtosis. Additionally, the transitory component is found to possess non-Gaussian properties, resulting in a significantly asymmetric distributional impact when high-earning households face negative shocks or low-earning households encounter positive shocks. Our empirical findings also reveal the presence of ARCH effects in earnings at horizons ranging from 2 to 8 years, further highlighting the complex dynamics of earnings persistence.

Vision · 圖 · 變換 · Networking · 圖形處理器 ·

2022 年 9 月 27 日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Chaoqi Chen,Yushuang Wu,Qiyuan Dai,Hong-Yu Zhou,Mutian Xu,Sibei Yang,Xiaoguang Han,Yizhou Yu

from arxiv, Preprint

Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (\emph{e.g.,} social network analysis and recommender systems), computer vision (\emph{e.g.,} object detection and point cloud learning), and natural language processing (\emph{e.g.,} relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, \emph{i.e.,} 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.