亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<dir id='k3Pji'><del id='RT62v'><del id='GbZKo'></del><pre id='Fasbn'><pre id='qMfnF'><option id='NQVtK'><address id='YYDw8'></address><bdo id='gXiT0'><tr id='eiS9p'><acronym id='roKiF'><pre id='mGWCh'></pre></acronym><div id='IXBNF'></div></tr></bdo></option></pre><small id='a9MqG'><address id='SHCMi'><u id='R3Jci'><legend id='i4g9o'><option id='ygMXx'><abbr id='sbapz'></abbr><li id='svEn9'><pre id='gFCCG'></pre></li></option></legend><select id='TW90X'></select></u></address></small></pre></del><sup id='u8Uu9'></sup><blockquote id='ORDzg'><dt id='5wZ4o'></dt></blockquote><blockquote id='RGmm1'></blockquote></dir><tt id='wxvdc'></tt><u id='MBOBE'><tt id='MhM0c'><form id='vqQqV'></form></tt><td id='L3fge'><dt id='TQIoZ'></dt></td></u>

<code id='ufCIH'><i id='qEDM8'><q id='yQ7tM'><legend id='jNzca'><pre id='zDyKL'><style id='8sYUP'><acronym id='oejv2'><i id='OrJvm'><form id='hZFlf'><option id='1XdBi'><center id='JMMDN'></center></option></form></i></acronym></style><tt id='1R37K'></tt></pre></legend></q></i></code><center id='8PT7S'></center>

<dd id='JIg4h'></dd>

<style id='MWbB4'></style><sub id='SiLAR'><dfn id='VCNb9'><abbr id='mgDHI'><big id='cmb4N'><bdo id='Lqo88'></bdo></big></abbr></dfn></sub>_{<dir id='C4NCT'></dir>}

·

微調 · 正則化項 · 語言模型化 · 預訓練 · 下游任務 ·

2023 年 3 月 31 日

Debiased Fine-Tuning for Vision-language Models by Prompt Regularization

Beier Zhu,Yulei Niu,Saeil Lee,Minhoe Hur,Hanwang Zhang

from arxiv, AAAI2023 accepted

We present a new paradigm for fine-tuning large-scale visionlanguage pre-trained models on downstream task, dubbed Prompt Regularization (ProReg). Different from traditional fine-tuning which easily overfits to the downstream task data, ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning. The motivation is: by prompting the large model "a photo of a [CLASS]", the fil-lin answer is only dependent on the pretraining encyclopedic knowledge while independent of the task data distribution, which is usually biased. Specifically, given a training sample prediction during fine-tuning, we first calculate its KullbackLeibler loss of the prompt prediction and Cross-Entropy loss of the ground-truth label, and then combine them with a proposed sample-wise adaptive trade-off weight, which automatically adjusts the transfer between the pretrained and downstream domains. On various out-of-distribution benchmarks, we show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.

相關內容

Performer · 訓練數據 · 覆蓋 · MoDELS · 數據監管 ·

2023 年 5 月 22 日

A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

Shayne Longpre,Gregory Yauney,Emily Reif,Katherine Lee,Adam Roberts,Barret Zoph,Denny Zhou,Jason Wei,Kevin Robinson,David Mimno,Daphne Ippolito

Pretraining is the preliminary and fundamental step in developing capable language models (LM). Despite this, pretraining data design is critically under-documented and often guided by empirically unsupported intuitions. To address this, we pretrain 28 1.5B parameter decoder-only models, training on data curated (1) at different times, (2) with varying toxicity and quality filters, and (3) with different domain compositions. First, we quantify the effect of pretraining data age. A temporal shift between evaluation data and pretraining data leads to performance degradation, which is not overcome by finetuning. Second, we explore the effect of quality and toxicity filters, showing a trade-off between performance on standard benchmarks and risk of toxic generations. Our findings indicate there does not exist a one-size-fits-all solution to filtering training data. We also find that the effects of different types of filtering are not predictable from text domain characteristics. Lastly, we empirically validate that the inclusion of heterogeneous data sources, like books and web, is broadly beneficial and warrants greater prioritization. These findings constitute the largest set of experiments to validate, quantify, and expose many undocumented intuitions about text pretraining, which we hope will help support more informed data-centric decisions in LM development.

命名實體識別 · 小樣本學習 · entity · 數據增強 · Prompt ·

2023 年 5 月 19 日

Enhancing Few-shot NER with Prompt Ordering based Data Augmentation

Huiming Wang,Liying Cheng,Wenxuan Zhang,De Wen Soh,Lidong Bing

from arxiv, 7 pages, 2 figures

Recently, data augmentation (DA) methods have been proven to be effective for pre-trained language models (PLMs) in low-resource settings, including few-shot named entity recognition (NER). However, conventional NER DA methods are mostly aimed at sequence labeling models, i.e., token-level classification, and few are compatible with unified autoregressive generation frameworks, which can handle a wider range of NER tasks, such as nested NER. Furthermore, these generation frameworks have a strong assumption that the entities will appear in the target sequence with the same left-to-right order as the source sequence. In this paper, we claim that there is no need to keep this strict order, and more diversified but reasonable target entity sequences can be provided during the training stage as a novel DA method. Nevertheless, a naive mixture of augmented data can confuse the model since one source sequence will then be paired with different target sequences. Therefore, we propose a simple but effective Prompt Ordering based Data Augmentation (PODA) method to improve the training of unified autoregressive generation frameworks under few-shot NER scenarios. Experimental results on three public NER datasets and further analyses demonstrate the effectiveness of our approach.

文本分類 · tuning · MoDELS · Analysis · Learning ·

2023 年 5 月 19 日

Zero-Shot Text Classification via Self-Supervised Tuning

Chaoqun Liu,Wenxuan Zhang,Guizhen Chen,Xiaobao Wu,Anh Tuan Luu,Chip Hong Chang,Lidong Bing

from arxiv, Accepted to the Findings of ACL 2023

Existing solutions to zero-shot text classification either conduct prompting with pre-trained language models, which is sensitive to the choices of templates, or rely on large-scale annotated data of relevant tasks for meta-tuning. In this work, we propose a new paradigm based on self-supervised learning to solve zero-shot text classification tasks by tuning the language models with unlabeled data, called self-supervised tuning. By exploring the inherent structure of free texts, we propose a new learning objective called first sentence prediction to bridge the gap between unlabeled data and text classification tasks. After tuning the model to learn to predict the first sentence in a paragraph based on the rest, the model is able to conduct zero-shot inference on unseen tasks such as topic classification and sentiment analysis. Experimental results show that our model outperforms the state-of-the-art baselines on 7 out of 10 tasks. Moreover, the analysis reveals that our model is less sensitive to the prompt design. Our code and pre-trained models are publicly available at //github.com/DAMO-NLP-SG/SSTuning .

語言模型化 · Performer · MoDELS · Processing（編程語言） · Extensibility ·

2023 年 5 月 19 日

Explicit Planning Helps Language Models in Logical Reasoning

Hongyu Zhao,Kangrui Wang,Mo Yu,Hongyuan Mei

from arxiv, error correction (e.g., model sizes) and new experiments (new data, GPT-3.5-based system, new results and anlaysis)

Language models have been shown to perform remarkably well on a wide range of natural language processing tasks. In this paper, we propose a novel system that uses language models to perform multi-step logical reasoning. Our system incorporates explicit planning into its inference procedure, thus able to make more informed reasoning decisions at each step by looking ahead into their future effects. Moreover, we propose a training strategy that safeguards the planning process from being led astray by spurious features. Our full system significantly outperforms other competing methods on multiple standard datasets. When using a T5 model as its core component, our system performs competitively compared to GPT-3 despite having only about 1B parameters (i.e., 175 times smaller than GPT-3). When using GPT-3.5, it significantly outperforms chain-of-thought prompting on the challenging PrOntoQA dataset. We have conducted extensive empirical studies to demonstrate that explicit planning plays a crucial role in the system's performance.

推斷 · 基準 · MoDELS · 語言模型化 · GLUE ·

2023 年 5 月 18 日

Ahead-of-Time P-Tuning

Daniil Gavrilov,Nikita Balagansky

In this paper, we propose Ahead-of-Time (AoT) P-Tuning, a novel parameter-efficient fine-tuning method for pre-trained Language Models (LMs) that adds input-dependent bias before each Transformer layer. We evaluate AoT P-Tuning on GLUE and SuperGLUE benchmarking datasets using RoBERTa and DeBERTa models, showing that it outperforms BitFit and is comparable or better than other baseline methods for efficient fine-tuning. Additionally, we assess the inference overhead of AoT P-Tuning and demonstrate that it introduces negligible overhead compared to established baseline methods. Our method enables multi-task inference with a single backbone LM, making it a practical solution for real-world applications.

MoDELS · 語言模型化 · Continuity · 分類數據 · Performer ·

2023 年 5 月 18 日

Democratized Diffusion Language Model

Nikita Balagansky,Daniil Gavrilov

Despite the potential benefits of Diffusion Models for NLP applications, publicly available implementations, trained models, or reproducible training procedures currently need to be publicly available. We present the Democratized Diffusion Language Model (DDLM), based on the Continuous Diffusion for Categorical Data (CDCD) framework, to address these challenges. We propose a simplified training procedure for DDLM using the C4 dataset and perform an in-depth analysis of the trained model's behavior. Furthermore, we introduce a novel early-exiting strategy for faster sampling with models trained with score interpolation. Since no previous works aimed at solving downstream tasks with pre-trained Diffusion LM (e.g., classification tasks), we experimented with GLUE Benchmark to study the ability of DDLM to transfer knowledge. With this paper, we propose available training and evaluation pipelines to other researchers and pre-trained DDLM models, which could be used in future research with Diffusion LMs.

知識 (knowledge) · MoDELS · 評論員 · 語言模型化 · Extensibility ·

2023 年 3 月 14 日

The Life Cycle of Knowledge in Big Language Models: A Survey

Boxi Cao,Hongyu Lin,Xianpei Han,Le Sun

from arxiv, paperlist: //github.com/c-box/KnowledgeLifecycle

Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.

Prompt · MoDELS · 學成 · Extensibility · 向量化 ·

2022 年 3 月 10 日

Conditional Prompt Learning for Vision-Language Models

Kaiyang Zhou,Jingkang Yang,Chen Change Loy,Ziwei Liu

from arxiv, CVPR 2022. TL;DR: We propose a conditional prompt learning approach to solve the generalizability issue of static prompts

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt learning -- a recent trend in NLP -- to the vision domain for adapting pre-trained vision-language models. Specifically, CoOp turns context words in a prompt into a set of learnable vectors and, with only a few labeled images for learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify a critical problem of CoOp: the learned context is not generalizable to wider unseen classes within the same dataset, suggesting that CoOp overfits base classes observed during training. To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). Compared to CoOp's static prompts, our dynamic prompts adapt to each instance and are thus less sensitive to class shift. Extensive experiments show that CoCoOp generalizes much better than CoOp to unseen classes, even showing promising transferability beyond a single dataset; and yields stronger domain generalization performance as well. Code is available at //github.com/KaiyangZhou/CoOp.

Prompt · 語言模型化 · MoDELS · Performer · Processing（編程語言） ·

2021 年 7 月 28 日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Pengfei Liu,Weizhe Yuan,Jinlan Fu,Zhengbao Jiang,Hiroaki Hayashi,Graham Neubig

from arxiv, Website: //pretrain.nlpedia.ai/

This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning". Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x' that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string x, from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g.the choice of pre-trained models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts, but also release other resources, e.g., a website //pretrain.nlpedia.ai/ including constantly-updated survey, and paperlist.

小樣本學習 · 語言模型化 · Better · MoDELS · Performer ·

2020 年 12 月 31 日

Making Pre-trained Language Models Better Few-shot Learners

Tianyu Gao,Adam Fisch,Danqi Chen

The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

語言模型化

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<li id='oDrQ5'></li>

_{^{<dd id='Ma803'><tbody id='Q23kn'><td id='QjsBl'><optgroup id='KmXpj'><strong id='RbIyK'></strong></optgroup><address id='Ld5VG'><ul id='SsgEK'></ul></address><big id='139NL'></big></td><table id='sjz41'></table></tbody><pre id='OUX3c'></pre></dd><span id='WNCCk'><b id='LCIyr'></b></span>}}


<dfn id='feTU8'><optgroup id='D51uK'></optgroup></dfn><tfoot id='t6dG3'><bdo id='L58c0'><div id='ZMb4u'></div><i id='16HDo'><dt id='Ow1Yt'></dt></i></bdo></tfoot>

_{<fieldset id='sDw99'></fieldset>}