丰满人妻被公侵犯高清版,色香欲天天影视综合网八区,国产免费AV片在线观看免费播放

Shaden Smith,Mostofa Patwary,Brandon Norick,Patrick LeGresley,Samyam Rajbhandari,Jared Casper,Zhun Liu,Shrimai Prabhumoye,George Zerveas,Vijay Korthikanti,Elton Zhang,Rewon Child,Reza Yazdani Aminabadi,Julie Bernauer,Xia Song,Mohammad Shoeybi,Yuxiong He,Michael Houston,Saurabh Tiwary,Bryan Catanzaro

from arxiv, *Equal contribution

Pretrained general-purpose language models can achieve state-of-the-art accuracies in various natural language processing domains by adapting to downstream tasks via zero-shot, few-shot and fine-tuning techniques. Because of their success, the size of these models has increased rapidly, requiring high-performance hardware, software, and algorithmic techniques to enable training such large models. As the result of a joint effort between Microsoft and NVIDIA, we present details on the training of the largest monolithic transformer based language model, Megatron-Turing NLG 530B (MT-NLG), with 530 billion parameters. In this paper, we first focus on the infrastructure as well as the 3D parallelism methodology used to train this model using DeepSpeed and Megatron. Next, we detail the training process, the design of our training corpus, and our data curation techniques, which we believe is a key ingredient to the success of the model. Finally, we discuss various evaluation results, as well as other interesting observations and new properties exhibited by MT-NLG. We demonstrate that MT-NLG achieves superior zero-, one-, and few-shot learning accuracies on several NLP benchmarks and establishes new state-of-the-art results. We believe that our contributions will help further the development of large-scale training infrastructures, large-scale language models, and natural language generations.

相關內容

語言模型化

關注 9

道德化 · 可理解性 · 講稿 · 知識 (knowledge) · Guidance ·

2022 年 4 月 20 日

A Corpus for Understanding and Generating Moral Stories

Jian Guan,Ziqi Liu,Minlie Huang

from arxiv, Accepted by NAACL 2022 main conference (Long paper)

Teaching morals is one of the most important purposes of storytelling. An essential ability for understanding and writing moral stories is bridging story plots and implied morals. Its challenges mainly lie in: (1) grasping knowledge about abstract concepts in morals, (2) capturing inter-event discourse relations in stories, and (3) aligning value preferences of stories and morals concerning good or bad behavior. In this paper, we propose two understanding tasks and two generation tasks to assess these abilities of machines. We present STORAL, a new dataset of Chinese and English human-written moral stories. We show the difficulty of the proposed tasks by testing various models with automatic and manual evaluation on STORAL. Furthermore, we present a retrieval-augmented algorithm that effectively exploits related concepts or events in training sets as additional guidance to improve performance on these tasks.

孿生網絡 · Siamese · 小樣本學習 · 標注 · tuning ·

2022 年 4 月 20 日

Few-Shot Learning with Siamese Networks and Label Tuning

Thomas Müller,Guillermo Pérez-Torró,Marc Franco-Salvador

from arxiv, ACL 2022

We study the problem of building text classifiers with little or no training data, commonly known as zero and few-shot text classification. In recent years, an approach based on neural textual entailment models has been found to give strong results on a diverse range of tasks. In this work, we show that with proper pre-training, Siamese Networks that embed texts and labels offer a competitive alternative. These models allow for a large reduction in inference cost: constant in the number of labels rather than linear. Furthermore, we introduce label tuning, a simple and computationally efficient approach that allows to adapt the models in a few-shot setup by only changing the label embeddings. While giving lower performance than model fine-tuning, this approach has the architectural advantage that a single encoder can be shared by many different tasks.

語言模型化 · contrastive · tuning · Prompt · MoDELS ·

2022 年 4 月 18 日

Contrastive Demonstration Tuning for Pre-trained Language Models

Xiaozhuan Liang,Ningyu Zhang,Siyuan Cheng,Zhen Bi,Zhenru Zhang,Chuanqi Tan,Songfang Huang,Fei Huang,Huajun Chen

from arxiv, Work in progress

Pretrained language models can be effectively stimulated by textual prompts or demonstrations, especially in low-data scenarios. Recent works have focused on automatically searching discrete or continuous prompts or optimized verbalizers, yet studies for the demonstration are still limited. Concretely, the demonstration examples are crucial for an excellent final performance of prompt-tuning. In this paper, we propose a novel pluggable, extensible, and efficient approach named contrastive demonstration tuning, which is free of demonstration sampling. Furthermore, the proposed approach can be: (i) Plugged to any previous prompt-tuning approaches; (ii) Extended to widespread classification tasks with a large number of categories. Experimental results on 16 datasets illustrate that our method integrated with previous approaches LM-BFF and P-tuning can yield better performance. Code is available in //github.com/zjunlp/PromptKG/tree/main/research/Demo-Tuning.

語言模型化 · MoDELS · 去噪 · 自編碼器 · 縮放 ·

2022 年 4 月 16 日

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

Payal Bajaj,Chenyan Xiong,Guolin Ke,Xiaodong Liu,Di He,Saurabh Tiwary,Tie-Yan Liu,Paul Bennett,Xia Song,Jianfeng Gao

from arxiv, Update details in scaled initialization and add acknowledgement

We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. Originated in ELECTRA, this training strategy has demonstrated sample-efficiency to pretrain models at the scale of hundreds of millions of parameters. In this work, we conduct a comprehensive empirical study, and propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO), which incorporates some of the best modeling techniques developed recently to speed up, stabilize, and enhance pretrained language models without compromising model effectiveness. The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks. More importantly, METRO-LM are efficient in that they often outperform previous large models with significantly smaller model sizes and lower pretraining cost.

SimPLe · Continuity · MoDELS · 掩碼語言模型化 · 學成 ·

2022 年 4 月 16 日

SimpleBERT: A Pre-trained Model That Learns to Generate Simple Words

Renliang Sun,Xiaojun Wan

Pre-trained models are widely used in the tasks of natural language processing nowadays. However, in the specific field of text simplification, the research on improving pre-trained models is still blank. In this work, we propose a continued pre-training method for text simplification. Specifically, we propose a new masked language modeling (MLM) mechanism, which does not randomly mask words but only masks simple words. The new mechanism can make the model learn to generate simple words. We use a small-scale simple text dataset for continued pre-training and employ two methods to identify simple words from the texts. We choose BERT, a representative pre-trained model, and continue pre-training it using our proposed method. Finally, we obtain SimpleBERT, which surpasses BERT in both lexical simplification and sentence simplification tasks and has achieved state-of-the-art results on multiple datasets. What's more, SimpleBERT can replace BERT in existing simplification models without modification.

泛化理論 · MoDELS · 可理解性 · Processing（編程語言） · 縮放 ·

2022 年 4 月 16 日

Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

Yizhong Wang,Swaroop Mishra,Pegah Alipoormolabashi,Yeganeh Kordi,Amirreza Mirzaei,Anjana Arunkumar,Arjun Ashok,Arut Selvan Dhanasekaran,Atharva Naik,David Stap,Eshaan Pathak,Giannis Karamanolakis,Haizhi Gary Lai,Ishan Purohit,Ishani Mondal,Jacob Anderson,Kirby Kuznia,Krima Doshi,Maitreya Patel,Kuntal Kumar Pal,Mehrad Moradshahi,Mihir Parmar,Mirali Purohit,Neeraj Varshney,Phani Rohitha Kaza,Pulkit Verma,Ravsehaj Singh Puri,Rushang Karia,Shailaja Keyur Sampat,Savan Doshi,Siddhartha Mishra,Sujan Reddy,Sumanta Patro,Tanay Dixit,Xudong Shen,Chitta Baral,Yejin Choi,Hannaneh Hajishirzi,Noah A. Smith,Daniel Khashabi

from arxiv, 16 pages, 9 figures

How can we measure the generalization of models to a variety of unseen tasks when provided with their language instructions? To facilitate progress in this goal, we introduce Natural-Instructions v2, a collection of 1,600+ diverse language tasks and their expert written instructions. More importantly, the benchmark covers 70+ distinct task types, such as tagging, in-filling, and rewriting. This benchmark is collected with contributions of NLP practitioners in the community and through an iterative peer review process to ensure their quality. This benchmark enables large-scale evaluation of cross-task generalization of the models -- training on a subset of tasks and evaluating on the remaining unseen ones. For instance, we are able to rigorously quantify generalization as a function of various scaling parameters, such as the number of observed tasks, the number of instances, and model sizes. As a by-product of these experiments. we introduce Tk-Instruct, an encoder-decoder Transformer that is trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples) which outperforms existing larger models on our benchmark. We hope this benchmark facilitates future progress toward more general-purpose language understanding models.

MoDELS · 語言模型化 · CASE · AIM · 表示 ·

2022 年 4 月 15 日

Evaluation Benchmarks for Spanish Sentence Representations

Vladimir Araujo,Andrés Carvallo,Souvik Kundu,José Ca?ete,Marcelo Mendoza,Robert E. Mercer,Felipe Bravo-Marquez,Marie-Francine Moens,Alvaro Soto

from arxiv, Accepted paper at LREC2022

Due to the success of pre-trained language models, versions of languages other than English have been released in recent years. This fact implies the need for resources to evaluate these models. In the case of Spanish, there are few ways to systematically assess the models' quality. In this paper, we narrow the gap by building two evaluation benchmarks. Inspired by previous work (Conneau and Kiela, 2018; Chen et al., 2019), we introduce Spanish SentEval and Spanish DiscoEval, aiming to assess the capabilities of stand-alone and discourse-aware sentence representations, respectively. Our benchmarks include considerable pre-existing and newly constructed datasets that address different tasks from various domains. In addition, we evaluate and analyze the most recent pre-trained Spanish language models to exhibit their capabilities and limitations. As an example, we discover that for the case of discourse evaluation tasks, mBERT, a language model trained on multiple languages, usually provides a richer latent representation than models trained only with documents in Spanish. We hope our contribution will motivate a fairer, more comparable, and less cumbersome way to evaluate future Spanish language models.

INFORMS · IR · 信息檢索 · 秩 · 學成 ·

2022 年 4 月 15 日

Pre-training Methods in Information Retrieval

Yixing Fan,Xiaohui Xie,Yinqiong Cai,Jia Chen,Xinyu Ma,Xiangsheng Li,Ruqing Zhang,Jiafeng Guo,Yiqun Liu

The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to the user's information need. Recently, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of IR. Since there have been a large number of works dedicating to the application of PTMs in IR, we believe it is the right time to summarize the current status, learn from existing researches, and gain some insights for future development. In this survey, we present an overview of PTMs applied in different components of an IR system, including the retrieval component, the re-ranking component, and other components. In addition, we also introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards. Moreover, we discuss some open challenges and envision some promising directions, with the hope of inspiring more works on these topics for future research.

MoDELS · 學成 · Next · Processing（編程語言） · Taxonomy ·

2021 年 8 月 12 日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Katikapalli Subramanyam Kalyan,Ajit Rajasekharan,Sivanesan Sangeetha

from arxiv, Preprint under review

Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT. These models are built on the top of transformers, self-supervised learning and transfer learning. Transformed-based PTLMs learn universal language representations from large volumes of text data using self-supervised learning and transfer this knowledge to downstream tasks. These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch. In this comprehensive survey paper, we initially give a brief overview of self-supervised learning. Next, we explain various core concepts like pretraining, pretraining methods, pretraining tasks, embeddings and downstream adaptation methods. Next, we present a new taxonomy of T-PTLMs and then give brief overview of various benchmarks including both intrinsic and extrinsic. We present a summary of various useful libraries to work with T-PTLMs. Finally, we highlight some of the future research directions which will further improve these models. We strongly believe that this comprehensive survey paper will serve as a good reference to learn the core concepts as well as to stay updated with the recent happenings in T-PTLMs.

任務對話系統 · state-of-the-art · Extensibility · MoDELS · 可辨認的 ·

2021 年 5 月 10 日

Recent Advances in Deep Learning-based Dialogue Systems

Jinjie Ni,Tom Young,Vlad Pandelea,Fuzhao Xue,Vinay Adiga,Erik Cambria

from arxiv, 75 pages, 19 figures

Dialogue systems are a popular Natural Language Processing (NLP) task as it is promising in real-life applications. It is also a complicated task since many NLP tasks deserving study are involved. As a result, a multitude of novel works on this task are carried out, and most of them are deep learning-based due to the outstanding performance. In this survey, we mainly focus on the deep learning-based dialogue systems. We comprehensively review state-of-the-art research outcomes in dialogue systems and analyze them from two angles: model type and system type. Specifically, from the angle of model type, we discuss the principles, characteristics, and applications of different models that are widely used in dialogue systems. This will help researchers acquaint these models and see how they are applied in state-of-the-art frameworks, which is rather helpful when designing a new dialogue system. From the angle of system type, we discuss task-oriented and open-domain dialogue systems as two streams of research, providing insight into the hot topics related. Furthermore, we comprehensively review the evaluation methods and datasets for dialogue systems to pave the way for future research. Finally, some possible research trends are identified based on the recent research outcomes. To the best of our knowledge, this survey is the most comprehensive and up-to-date one at present in the area of dialogue systems and dialogue-related tasks, extensively covering the popular frameworks, topics, and datasets.