亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<dir id='36j66'><del id='36j66'><del id='36j66'></del><pre id='36j66'><pre id='36j66'><option id='36j66'><address id='36j66'></address><bdo id='36j66'><tr id='36j66'><acronym id='36j66'><pre id='36j66'></pre></acronym><div id='36j66'></div></tr></bdo></option></pre><small id='36j66'><address id='36j66'><u id='36j66'><legend id='36j66'><option id='36j66'><abbr id='36j66'></abbr><li id='36j66'><pre id='36j66'></pre></li></option></legend><select id='36j66'></select></u></address></small></pre></del><sup id='36j66'></sup><blockquote id='36j66'><dt id='36j66'></dt></blockquote><blockquote id='36j66'></blockquote></dir><tt id='36j66'></tt><u id='36j66'><tt id='36j66'><form id='36j66'></form></tt><td id='36j66'><dt id='36j66'></dt></td></u>

<code id='36j66'><i id='36j66'><q id='36j66'><legend id='36j66'><pre id='36j66'><style id='36j66'><acronym id='36j66'><i id='36j66'><form id='36j66'><option id='36j66'><center id='36j66'></center></option></form></i></acronym></style><tt id='36j66'></tt></pre></legend></q></i></code><center id='36j66'></center>

<dd id='36j66'></dd>

<style id='36j66'></style><sub id='36j66'><dfn id='36j66'><abbr id='36j66'><big id='36j66'><bdo id='36j66'></bdo></big></abbr></dfn></sub>_{<dir id='36j66'></dir>}

·

大語言模型 · INFORMS · MoDELS · 語音識別 · 推斷 ·

2023 年 12 月 15 日

Generative Context-aware Fine-tuning of Self-supervised Speech Models

Suwon Shon,Kwangyoun Kim,Prashant Sridhar,Yi-Te Hsu,Shinji Watanabe,Karen Livescu

When performing tasks like automatic speech recognition or spoken language understanding for a given utterance, access to preceding text or audio provides contextual information can improve performance. Considering the recent advances in generative large language models (LLM), we hypothesize that an LLM could generate useful context information using the preceding text. With appropriate prompts, LLM could generate a prediction of the next sentence or abstractive text like titles or topics. In this paper, we study the use of LLM-generated context information and propose an approach to distill the generated information during fine-tuning of self-supervised speech models, which we refer to as generative context-aware fine-tuning. This approach allows the fine-tuned model to make improved predictions without access to the true surrounding segments or to the LLM at inference time, while requiring only a very small additional context module. We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis. The results show that generative context-aware fine-tuning outperforms a context injection fine-tuning approach that accesses the ground-truth previous text, and is competitive with a generative context injection fine-tuning approach that requires the LLM at inference time.

相關內容

大語言模型

大語言模型

大語言模型是基于海量文本數據訓練的深度學習模型。它不僅能夠生成自然語言文本，還能夠深入理解文本含義，處理各種自然語言任務，如文本摘要、問答、翻譯等。2023年，大語言模型及其在人工智能領域的應用已成為全球科技研究的熱點，其在規模上的增長尤為引人注目，參數量已從最初的十幾億躍升到如今的一萬億。參數量的提升使得模型能夠更加精細地捕捉人類語言微妙之處，更加深入地理解人類語言的復雜性。在過去的一年里，大語言模型在吸納新知識、分解復雜任務以及圖文對齊等多方面都有顯著提升。隨著技術的不斷成熟，它將不斷拓展其應用范圍，為人類提供更加智能化和個性化的服務，進一步改善人們的生活和生產方式。

Learning · Performer · INFORMS · 分離的 · Better ·

2024 年 2 月 5 日

Learning to Abstract Visuomotor Mappings using Meta-Reinforcement Learning

Carlos A. Velazquez-Vargas,Isaac Ray Christian,Jordan A. Taylor,Sreejan Kumar

We investigated the human capacity to acquire multiple visuomotor mappings for de novo skills. Using a grid navigation paradigm, we tested whether contextual cues implemented as different "grid worlds", allow participants to learn two distinct key-mappings more efficiently. Our results indicate that when contextual information is provided, task performance is significantly better. The same held true for meta-reinforcement learning agents that differed in whether or not they receive contextual information when performing the task. We evaluated their accuracy in predicting human performance in the task and analyzed their internal representations. The results indicate that contextual cues allow the formation of separate representations in space and time when using different visuomotor mappings, whereas the absence of them favors sharing one representation. While both strategies can allow learning of multiple visuomotor mappings, we showed contextual cues provide a computational advantage in terms of how many mappings can be learned.

損失 · 語音識別 · 離散化 · 掩碼 · 變換 ·

2024 年 2 月 5 日

Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR

Qian Chen,Wen Wang,Qinglin Zhang,Siqi Zheng,Shiliang Zhang,Chong Deng,Yukun Ma,Hai Yu,Jiaqing Liu,Chong Zhang

from arxiv, 5 pages, accepted by ICASSP 2024

Recently, unified speech-text models, such as SpeechGPT, VioLA, and AudioPaLM, have achieved remarkable performance on various speech tasks. These models discretize speech signals into tokens (speech discretization) and use a shared vocabulary for both text and speech tokens. Then they train a single decoder-only Transformer on a mixture of speech tasks. However, these models rely on the Loss Masking strategy for the ASR task, which ignores the dependency among speech tokens. In this paper, we propose to model speech tokens in an autoregressive way, similar to text. We find that applying the conventional cross-entropy loss on input speech tokens does not consistently improve the ASR performance over the Loss Masking approach. To address this issue, we propose a novel approach denoted Smoothed Label Distillation (SLD), which applies a KL divergence loss with smoothed labels on speech tokens. Our experiments show that SLD effectively models speech tokens and outperforms Loss Masking for decoder-only Transformers in ASR tasks with different speech discretization methods. The source code can be found here: //github.com/alibaba-damo-academy/SpokenNLP/tree/main/sld

語音識別 · 估計/估計量 · MoDELS · Processing（編程語言） · Attention ·

2024 年 2 月 2 日

Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric

Golara Javadi,Kamer Ali Yuksel,Yunsu Kim,Thiago Castro Ferreira,Mohamed Al-Badrashiny

In the realm of automatic speech recognition (ASR), the quest for models that not only perform with high accuracy but also offer transparency in their decision-making processes is crucial. The potential of quality estimation (QE) metrics is introduced and evaluated as a novel tool to enhance explainable artificial intelligence (XAI) in ASR systems. Through experiments and analyses, the capabilities of the NoRefER (No Reference Error Rate) metric are explored in identifying word-level errors to aid post-editors in refining ASR hypotheses. The investigation also extends to the utility of NoRefER in the corpus-building process, demonstrating its effectiveness in augmenting datasets with insightful annotations. The diagnostic aspects of NoRefER are examined, revealing its ability to provide valuable insights into model behaviors and decision patterns. This has proven beneficial for prioritizing hypotheses in post-editing workflows and fine-tuning ASR models. The findings suggest that NoRefER is not merely a tool for error detection but also a comprehensive framework for enhancing ASR systems' transparency, efficiency, and effectiveness. To ensure the reproducibility of the results, all source codes of this study are made publicly available.

Weight · Performer · 估計/估計量 · 知識 (knowledge) · 有偏 ·

2024 年 2 月 2 日

The Connection Between R-Learning and Inverse-Variance Weighting for Estimation of Heterogeneous Treatment Effects

from arxiv, The main change in this version is the addition of simulations

Many methods for estimating conditional average treatment effects (CATEs) can be expressed as weighted pseudo-outcome regressions (PORs). Previous comparisons of POR techniques have paid careful attention to the choice of pseudo-outcome transformation. However, we argue that the dominant driver of performance is actually the choice of weights. For example, we point out that R-Learning implicitly performs a POR with inverse-variance weights (IVWs). In the CATE setting, IVWs mitigate the instability associated with inverse-propensity weights, and lead to convenient simplifications of bias terms. We demonstrate the superior performance of IVWs in simulations, and derive convergence rates for IVWs that are, to our knowledge, the fastest yet shown without assuming knowledge of the covariate distribution.

Pair · 語音合成 · 表示 · Learning · Extensibility ·

2024 年 2 月 2 日

DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation

Jianzong Wang,Pengcheng Li,Xulong Zhang,Ning Cheng,Jing Xiao

from arxiv, Accepted by the 13th IEEE International Conference on Big Data and Cloud Computing (IEEE BDCloud 2023)

Most existing neural-based text-to-speech methods rely on extensive datasets and face challenges under low-resource condition. In this paper, we introduce a novel semi-supervised text-to-speech synthesis model that learns from both paired and unpaired data to address this challenge. The key component of the proposed model is a dynamic quantized representation module, which is integrated into a sequential autoencoder. When given paired data, the module incorporates a trainable codebook that learns quantized representations under the supervision of the paired data. However, due to the limited paired data in low-resource scenario, these paired data are difficult to cover all phonemes. Then unpaired data is fed to expand the dynamic codebook by adding quantized representation vectors that are sufficiently distant from the existing ones during training. Experiments show that with less than 120 minutes of paired data, the proposed method outperforms existing methods in both subjective and objective metrics.

語言模型化 · MoDELS · 數據集 · state-of-the-art · Performance ·

2024 年 2 月 2 日

A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains

Alon Jacovi,Yonatan Bitton,Bernd Bohnet,Jonathan Herzig,Or Honovich,Michael Tseng,Michael Collins,Roee Aharoni,Mor Geva

from arxiv, Dataset at //huggingface.co/datasets/google/reveal

Prompting language models to provide step-by-step answers (e.g., "Chain-of-Thought") is the prominent approach for complex reasoning tasks, where more accurate reasoning chains typically improve downstream task performance. Recent literature discusses automatic methods to verify reasoning steps to evaluate and improve their correctness. However, no fine-grained step-level datasets are available to enable thorough evaluation of such verification methods, hindering progress in this direction. We introduce Reveal: Reasoning Verification Evaluation, a new dataset to benchmark automatic verifiers of complex Chain-of-Thought reasoning in open-domain question answering settings. Reveal includes comprehensive labels for the relevance, attribution to evidence passages, and logical correctness of each reasoning step in a language model's answer, across a wide variety of datasets and state-of-the-art language models.

知識 (knowledge) · 語言模型化 · MoDELS · NLU · Learning ·

2022 年 11 月 17 日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Linmei Hu,Zeyi Liu,Ziwang Zhao,Lei Hou,Liqiang Nie,Juanzi Li

Pre-trained Language Models (PLMs) which are trained on large text corpus via self-supervised learning method, have yielded promising performance on various tasks in Natural Language Processing (NLP). However, though PLMs with huge parameters can effectively possess rich knowledge learned from massive training text and benefit downstream tasks at the fine-tuning stage, they still have some limitations such as poor reasoning ability due to the lack of external knowledge. Research has been dedicated to incorporating knowledge into PLMs to tackle these issues. In this paper, we present a comprehensive review of Knowledge-Enhanced Pre-trained Language Models (KE-PLMs) to provide a clear insight into this thriving field. We introduce appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight these two main tasks of NLP. For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG), and rule knowledge. The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods. Finally, we point out some promising future directions of KE-PLMs.

contrastive · 學成 · 對比學習 · 判別器 · Extensibility ·

2021 年 12 月 16 日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Yujia Zhang,Lai-Man Po,Xuyuan Xu,Mengyang Liu,Yexin Wang,Weifeng Ou,Yuzhi Zhao,Wing-Yin Yu

from arxiv, Accepted by AAAI 2022, Preprint version with Appendix

Spatio-temporal representation learning is critical for video self-supervised representation. Recent approaches mainly use contrastive learning and pretext tasks. However, these approaches learn representation by discriminating sampled instances via feature similarity in the latent space while ignoring the intermediate state of the learned representations, which limits the overall performance. In this work, taking into account the degree of similarity of sampled instances as the intermediate state, we propose a novel pretext task - spatio-temporal overlap rate (STOR) prediction. It stems from the observation that humans are capable of discriminating the overlap rates of videos in space and time. This task encourages the model to discriminate the STOR of two generated samples to learn the representations. Moreover, we employ a joint optimization combining pretext tasks with contrastive learning to further enhance the spatio-temporal representation learning. We also study the mutual influence of each component in the proposed scheme. Extensive experiments demonstrate that our proposed STOR task can favor both contrastive learning and pretext tasks. The joint optimization scheme can significantly improve the spatio-temporal representation in video understanding. The code is available at //github.com/Katou2/CSTP.

MoDELS · CLUES · INTERACT · 圖形處理器 · Neural Networks ·

2021 年 1 月 28 日

A Graph-based Relevance Matching Model for Ad-hoc Retrieval

Yufeng Zhang,Jinghao Zhang,Zeyu Cui,Shu Wu,Liang Wang

from arxiv, To appear at AAAI 2021

To retrieve more relevant, appropriate and useful documents given a query, finding clues about that query through the text is crucial. Recent deep learning models regard the task as a term-level matching problem, which seeks exact or similar query patterns in the document. However, we argue that they are inherently based on local interactions and do not generalise to ubiquitous, non-consecutive contextual relationships.In this work, we propose a novel relevance matching model based on graph neural networks to leverage the document-level word relationships for ad-hoc retrieval. In addition to the local interactions, we explicitly incorporate all contexts of a term through the graph-of-word text format. Matching patterns can be revealed accordingly to provide a more accurate relevance score. Our approach significantly outperforms strong baselines on two ad-hoc benchmarks. We also experimentally compare our model with BERT and show our ad-vantages on long documents.

對象識別 · MoDELS · Backbone · Extensibility · 學成 ·

2020 年 3 月 31 日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Mohan Zhou,Yalong Bai,Wei Zhang,Tiejun Zhao,Tao Mei

from arxiv, 10 pages, 7 figures, accepted by CVPR 2020

Most object recognition approaches predominantly focus on learning discriminative visual patterns while overlooking the holistic object structure. Though important, structure modeling usually requires significant manual annotations and therefore is labor-intensive. In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. We show the recognition backbone can be substantially enhanced for more robust representation learning, without any cost of extra annotation and inference speed. Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category. We then design a spatial context learning module for modeling the internal structures of the object, through predicting the relative positions within the extent. These two modules can be easily plugged into any backbone networks during training and detached at inference time. Extensive experiments show that our look-into-object approach (LIO) achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft). We also show that this learning paradigm is highly generalizable to other tasks such as object detection and segmentation (MS COCO). Project page: //github.com/JDAI-CV/LIO.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

大語言模型

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tr id='36j66'><strong id='36j66'></strong><small id='36j66'></small><button id='36j66'></button><li id='36j66'><noscript id='36j66'><big id='36j66'></big><dt id='36j66'></dt></noscript></li></tr><ol id='36j66'><option id='36j66'><table id='36j66'><blockquote id='36j66'><tbody id='36j66'></tbody></blockquote></table></option></ol><u id='36j66'></u><kbd id='36j66'><kbd id='36j66'></kbd></kbd>

<code id='36j66'><strong id='36j66'></strong></code>

<fieldset id='36j66'></fieldset>

<span id='36j66'></span>

<ins id='36j66'></ins>

<acronym id='36j66'><em id='36j66'></em><td id='36j66'><div id='36j66'></div></td></acronym><address id='36j66'><big id='36j66'><big id='36j66'></big><legend id='36j66'></legend></big></address>

<i id='36j66'><div id='36j66'><ins id='36j66'></ins></div></i>

<i id='36j66'></i>