无码人妻一区二区三区在线不卡,日韩纯肉无遮挡一区二区视频,99久热这里精品免费观看

Learning algorithms become more powerful, often at the cost of increased complexity. In response, the demand for algorithms to be transparent is growing. In NLP tasks, attention distributions learned by attention-based deep learning models are used to gain insights in the models' behavior. To which extent is this perspective valid for all NLP tasks? We investigate whether distributions calculated by different attention heads in a transformer architecture can be used to improve transparency in the task of abstractive summarization. To this end, we present both a qualitative and quantitative analysis to investigate the behavior of the attention heads. We show that some attention heads indeed specialize towards syntactically and semantically distinct input. We propose an approach to evaluate to which extent the Transformer model relies on specifically learned attention distributions. We also discuss what this implies for using attention distributions as a means of transparency.

相關內容

注意(yi)力(li)機(ji)制

關注 120

Attention機(ji)制(zhi)最早是(shi)在視覺(jue)圖(tu)(tu)像領域提出(chu)(chu)來的(de)，但(dan)是(shi)真正火起來應(ying)該(gai)算(suan)是(shi)google mind團隊的(de)這篇(pian)論文《Recurrent Models of Visual Attention》[14]，他們在RNN模型上使(shi)用(yong)(yong)了attention機(ji)制(zhi)來進(jin)(jin)行圖(tu)(tu)像分(fen)類(lei)(lei)。隨(sui)后，Bahdanau等人(ren)在論文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使(shi)用(yong)(yong)類(lei)(lei)似attention的(de)機(ji)制(zhi)在機(ji)器翻譯任(ren)務(wu)上將翻譯和對齊同時進(jin)(jin)行，他們的(de)工作(zuo)算(suan)是(shi)是(shi)第一個提出(chu)(chu)attention機(ji)制(zhi)應(ying)用(yong)(yong)到NLP領域中。接著類(lei)(lei)似的(de)基于attention機(ji)制(zhi)的(de)RNN模型擴展開始應(ying)用(yong)(yong)到各種NLP任(ren)務(wu)中。最近，如何在CNN中使(shi)用(yong)(yong)attention機(ji)制(zhi)也成(cheng)為了大家的(de)研究熱點(dian)。下圖(tu)(tu)表示了attention研究進(jin)(jin)展的(de)大概趨勢。

Pegasus · Performer · state-of-the-art · MoDELS · ROUGE ·

2020 年 6 月 2 日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Jingqing Zhang,Yao Zhao,Mohammad Saleh,Peter J. Liu

from arxiv, Added Human Evaluation results; Code link added; Accepted for ICML 2020

Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Our model also shows surprising performance on low-resource summarization, surpassing previous state-of-the-art results on 6 datasets with only 1000 examples. Finally we validated our results using human evaluation and show that our model summaries achieve human performance on multiple datasets.

可辨認的 · 圖 · 規范化的 · INFORMS · 知識圖譜 ·

2020 年 3 月 23 日

What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization

Caleb Belth,Xinyi Zheng,Jilles Vreeken,Danai Koutra

from arxiv, 10 pages, plus 2 pages of references. 5 figures. Accepted at The Web Conference 2020

Knowledge graphs (KGs) store highly heterogeneous information about the world in the structure of a graph, and are useful for tasks such as question answering and reasoning. However, they often contain errors and are missing information. Vibrant research in KG refinement has worked to resolve these issues, tailoring techniques to either detect specific types of errors or complete a KG. In this work, we introduce a unified solution to KG characterization by formulating the problem as unsupervised KG summarization with a set of inductive, soft rules, which describe what is normal in a KG, and thus can be used to identify what is abnormal, whether it be strange or missing. Unlike first-order logic rules, our rules are labeled, rooted graphs, i.e., patterns that describe the expected neighborhood around a (seen or unseen) node, based on its type, and information in the KG. Stepping away from the traditional support/confidence-based rule mining techniques, we propose KGist, Knowledge Graph Inductive SummarizaTion, which learns a summary of inductive rules that best compress the KG according to the Minimum Description Length principle---a formulation that we are the first to use in the context of KG rule mining. We apply our rules to three large KGs (NELL, DBpedia, and Yago), and tasks such as compression, various types of error detection, and identification of incomplete information. We show that KGist outperforms task-specific, supervised and unsupervised baselines in error detection and incompleteness identification, (identifying the location of up to 93% of missing entities---over 10% more than baselines), while also being efficient for large knowledge graphs.

BERT · MoDELS · 語言模型化 · 變換 · state-of-the-art ·

2019 年 8 月 22 日

Text Summarization with Pretrained Encoders

Yang Liu,Mirella Lapata

from arxiv, To appear in EMNLP 2019

Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several inter-sentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves state-of-the-art results across the board in both extractive and abstractive settings. Our code is available at //github.com/nlpyang/PreSumm

自動摘要 · contrastive · ROUGE · BLEU · 可理解性 ·

2018 年 12 月 18 日

Automatic Summarization of Natural Language

Marc Everett Johnson

from arxiv, 6 pages, 1 literature synthesis matrix

Automatic summarization of natural language is a current topic in computer science research and industry, studied for decades because of its usefulness across multiple domains. For example, summarization is necessary to create reviews such as this one. Research and applications have achieved some success in extractive summarization (where key sentences are curated), however, abstractive summarization (synthesis and re-stating) is a hard problem and generally unsolved in computer science. This literature review contrasts historical progress up through current state of the art, comparing dimensions such as: extractive vs. abstractive, supervised vs. unsupervised, NLP (Natural Language Processing) vs Knowledge-based, deep learning vs algorithms, structured vs. unstructured sources, and measurement metrics such as Rouge and BLEU. Multiple dimensions are contrasted since current research uses combinations of approaches as seen in the review matrix. Throughout this summary, synthesis and critique is provided. This review concludes with insights for improved abstractive summarization measurement, with surprising implications for detecting understanding and comprehension in general.

變分自編碼 · MoDELS · 可理解性 · 精確推斷 · 模式崩潰 ·

2018 年 12 月 13 日

A Probe into Understanding GAN and VAE models

Jingzhao Zhang,Lu Mi,Macheng Shen

from arxiv, 9 pages, 8 figures

Both generative adversarial network models and variational autoencoders have been widely used to approximate probability distributions of datasets. Although they both use parametrized distributions to approximate the underlying data distribution, whose exact inference is intractable, their behaviors are very different. In this report, we summarize our experiment results that compare these two categories of models in terms of fidelity and mode collapse. We provide a hypothesis to explain their different behaviors and propose a new model based on this hypothesis. We further tested our proposed model on MNIST dataset and CelebA dataset.

contrastive · 學成 · MoDELS · 可理解性 · Machine Learning ·

2018 年 7 月 23 日

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Jasper van der Waa,Jurriaan van Diggelen,Karel van den Bosch,Mark Neerincx

from arxiv, XAI workshop on the IJCAI conference 2018, Stockholm, Sweden

Machine Learning models become increasingly proficient in complex tasks. However, even for experts in the field, it can be difficult to understand what the model learned. This hampers trust and acceptance, and it obstructs the possibility to correct the model. There is therefore a need for transparency of machine learning models. The development of transparent classification models has received much attention, but there are few developments for achieving transparent Reinforcement Learning (RL) models. In this study we propose a method that enables a RL agent to explain its behavior in terms of the expected consequences of state transitions and outcomes. First, we define a translation of states and actions to a description that is easier to understand for human users. Second, we developed a procedure that enables the agent to obtain the consequences of a single action, as well as its entire policy. The method calculates contrasts between the consequences of a policy derived from a user query, and of the learned policy of the agent. Third, a format for generating explanations was constructed. A pilot survey study was conducted to explore preferences of users for different explanation properties. Results indicate that human users tend to favor explanations about policy rather than about single actions.

ROUGE · Performer · state-of-the-art · Mail · 優化器 ·

2018 年 4 月 17 日

Multi-Reward Reinforced Summarization with Saliency and Entailment

Ramakanth Pasunuru,Mohit Bansal

from arxiv, NAACL 2018 (8 pages)

Abstractive text summarization is the task of compressing and rewriting a long document into a short summary while maintaining saliency, directed logical entailment, and non-redundancy. In this work, we address these three important aspects of a good summary via a reinforcement learning approach with two novel reward functions: ROUGESal and Entail, on top of a coverage-based baseline. The ROUGESal reward modifies the ROUGE metric by up-weighting the salient phrases/words detected via a keyphrase classifier. The Entail reward gives high (length-normalized) scores to logically-entailed summaries using an entailment classifier. Further, we show superior performance improvement when these rewards are combined with traditional metric (ROUGE) based rewards, via our novel and effective multi-reward approach of optimizing multiple rewards simultaneously in alternate mini-batches. Our method achieves the new state-of-the-art results on CNN/Daily Mail dataset as well as strong improvements in a test-only transfer setup on DUC-2002.

秩 · MoDELS · 學成 · INFORMS · Neural Networks ·

2018 年 4 月 16 日

Learning a Deep Listwise Context Model for Ranking Refinement

Qingyao Ai,Keping Bi,Jiafeng Guo,W. Bruce Croft

Learning to rank has been intensively studied and widely applied in information retrieval. Typically, a global ranking function is learned from a set of labeled data, which can achieve good performance on average but may be suboptimal for individual queries by ignoring the fact that relevant documents for different queries may have different distributions in the feature space. Inspired by the idea of pseudo relevance feedback where top ranked documents, which we refer as the \textit{local ranking context}, can provide important information about the query's characteristics, we propose to use the inherent feature distributions of the top results to learn a Deep Listwise Context Model that helps us fine tune the initial ranked list. Specifically, we employ a recurrent neural network to sequentially encode the top results using their feature vectors, learn a local context model and use it to re-rank the top results. There are three merits with our model: (1) Our model can capture the local ranking context based on the complex interactions between top results using a deep neural network; (2) Our model can be built upon existing learning-to-rank methods by directly using their extracted feature vectors; (3) Our model is trained with an attention-based loss function, which is more effective and efficient than many existing listwise methods. Experimental results show that the proposed model can significantly improve the state-of-the-art learning to rank methods on benchmark retrieval corpora.

端到端 · 強化學習 · 學成 · 自然語言處理 ·

2018 年 3 月 27 日

Deep Communicating Agents for Abstractive Summarization

Asli Celikyilmaz,Antoine Bosselut,Xiaodong He,Yejin Choi

from arxiv, Accepted for publication at NAACL 2018

We present deep communicating agents in an encoder-decoder architecture to address the challenges of representing a long document for abstractive summarization. With deep communicating agents, the task of encoding a long text is divided across multiple collaborating agents, each in charge of a subsection of the input text. These encoders are connected to a single decoder, trained end-to-end using reinforcement learning to generate a focused and coherent summary. Empirical results demonstrate that multiple communicating encoders lead to a higher quality summary compared to several strong baselines, including those based on a single encoder or multiple non-communicating encoders.

Extensibility · 優化器 · 約束 · MoDELS · 可辨認的 ·

2018 年 1 月 11 日

Distributed Constraint Optimization Problems and Applications: A Survey

Ferdinando Fioretto,Enrico Pontelli,William Yeoh

The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents' autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.