两个人的电影全免费观看720-99视频在线播放喷射

This review paper discusses how context has been used in neural machine translation (NMT) in the past two years (2017-2018). Starting with a brief retrospect on the rapid evolution of NMT models, the paper then reviews studies that evaluate NMT output from various perspectives, with emphasis on those analyzing limitations of the translation of contextual phenomena. In a subsequent version, the paper will then present the main methods that were proposed to leverage context for improving translation quality, and distinguishes methods that aim to improve the translation of specific phenomena from those that consider a wider unstructured context.

相關內容

NMT

關注 0

變換 · MoDELS · Performer · state-of-the-art · Transformer模型 ·

2018 年 10 月 8 日

Improving the Transformer Translation Model with Document-Level Context

Jiacheng Zhang,Huanbo Luan,Maosong Sun,FeiFei Zhai,Jingfang Xu,Min Zhang,Yang Liu

from arxiv, EMNLP 2018

Although the Transformer translation model (Vaswani et al., 2017) has achieved state-of-the-art performance in a variety of translation tasks, how to use document-level context to deal with discourse phenomena problematic for Transformer still remains a challenge. In this work, we extend the Transformer model with a new context encoder to represent document-level context, which is then incorporated into the original encoder and decoder. As large-scale document-level parallel corpora are usually not available, we introduce a two-step training method to take full advantage of abundant sentence-level parallel corpora and limited document-level parallel corpora. Experiments on the NIST Chinese-English datasets and the IWSLT French-English datasets show that our approach improves over Transformer significantly.

Machine Translation · BLEU · 注意力機制 · Performance · 變換 ·

2018 年 9 月 11 日

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

Tamer Alkhouli,Gabriel Bretschner,Hermann Ney

This work investigates the alignment problem in state-of-the-art multi-head attention models based on the transformer architecture. We demonstrate that alignment extraction in transformer models can be improved by augmenting an additional alignment head to the multi-head source-to-target attention component. This is used to compute sharper attention weights. We describe how to use the alignment head to achieve competitive performance. To study the effect of adding the alignment head, we simulate a dictionary-guided translation task, where the user wants to guide translation using pre-defined dictionary entries. Using the proposed approach, we achieve up to $3.8$ % BLEU improvement when using the dictionary, in comparison to $2.4$ % BLEU in the baseline case. We also propose alignment pruning to speed up decoding in alignment-based neural machine translation (ANMT), which speeds up translation by a factor of $1.8$ without loss in translation performance. We carry out experiments on the shared WMT 2016 English$\to$Romanian news task and the BOLT Chinese$\to$English discussion forum task.

Performer · Networking · 詞義消歧 · Machine Translation · 自注意力機制 ·

2018 年 8 月 27 日

Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

Gongbo Tang,Matthias Muller,Annette Rios,Rico Sennrich

from arxiv, 10 pages, 5 figures, accepted by EMNLP 2018

Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.

共軛 · Machine Translation · 詞表 · NMT · Softmax函數/軟最大化函數 ·

2018 年 5 月 25 日

Japanese Predicate Conjugation for Neural Machine Translation

Michiki Kurosawa,Yukio Matsumura,Hayahide Yamagishi,Mamoru Komachi

from arxiv, 6 pages; NAACL 2018 Student Research Workshop

Neural machine translation (NMT) has a drawback in that can generate only high-frequency words owing to the computational costs of the softmax function in the output layer. In Japanese-English NMT, Japanese predicate conjugation causes an increase in vocabulary size. For example, one verb can have as many as 19 surface varieties. In this research, we focus on predicate conjugation for compressing the vocabulary size in Japanese. The vocabulary list is filled with the various forms of verbs. We propose methods using predicate conjugation information without discarding linguistic information. The proposed methods can generate low-frequency words and deal with unknown words. Two methods were considered to introduce conjugation information: the first considers it as a token (conjugation token) and the second considers it as an embedded vector (conjugation feature). The results using these methods demonstrate that the vocabulary size can be compressed by approximately 86.1% (Tanaka corpus) and the NMT models can output the words not in the training data set. Furthermore, BLEU scores improved by 0.91 points in Japanese-to-English translation, and 0.32 points in English-to-Japanese translation with ASPEC.

注意力機制 · 稀疏 · Machine Translation · NMT · 變換 ·

2018 年 5 月 21 日

Sparse and Constrained Attention for Neural Machine Translation

Chaitanya Malaviya,Pedro Ferreira,André F. T. Martins

from arxiv, Proceedings of ACL 2018

In NMT, words are sometimes dropped from the source or generated repeatedly in the translation. We explore novel strategies to address the coverage problem that change only the attention transformation. Our approach allocates fertilities to source words, used to bound the attention each word can receive. We experiment with various sparse and constrained attention transformations and propose a new one, constrained sparsemax, shown to be differentiable and sparse. Empirical evaluation is provided in three languages pairs.

注意力機制 · Machine Translation · 上下文向量 · 向量化 · 得分 ·

2018 年 4 月 3 日

Fine-Grained Attention Mechanism for Neural Machine Translation

Heeyoul Choi,Kyunghyun Cho,Yoshua Bengio

from arxiv, 9 pages, 4 figures

Neural machine translation (NMT) has been a new paradigm in machine translation, and the attention mechanism has become the dominant approach with the state-of-the-art records in many language pairs. While there are variants of the attention mechanism, all of them use only temporal attention where one scalar value is assigned to one context vector corresponding to a source word. In this paper, we propose a fine-grained (or 2D) attention mechanism where each dimension of a context vector will receive a separate attention score. In experiments with the task of En-De and En-Fi translation, the fine-grained attention method improves the translation quality in terms of BLEU score. In addition, our alignment analysis reveals how the fine-grained attention mechanism exploits the internal structure of context vectors.

Machine Translation · NMT · 詞義消歧 · Performer · MoDELS ·

2018 年 3 月 28 日

Handling Homographs in Neural Machine Translation

Frederick Liu,Han Lu,Graham Neubig

from arxiv, NAACL2018

Homographs, words with different meanings but the same surface form, have long caused difficulty for machine translation systems, as it is difficult to select the correct translation based on the context. However, with the advent of neural machine translation (NMT) systems, which can theoretically take into account global sentential context, one may hypothesize that this problem has been alleviated. In this paper, we first provide empirical evidence that existing NMT systems in fact still have significant problems in properly translating ambiguous words. We then proceed to describe methods, inspired by the word sense disambiguation literature, that model the context of the input word with context-aware word embeddings that help to differentiate the word sense be- fore feeding it into the encoder. Experiments on three language pairs demonstrate that such models improve the performance of NMT systems both in terms of BLEU score and in the accuracy of translating homographs.

Machine Translation · 無監督 · NMT · 三角形化 · MoDELS ·

2018 年 2 月 26 日

Unsupervised Neural Machine Translation

Mikel Artetxe,Gorka Labaka,Eneko Agirre,Kyunghyun Cho

from arxiv, Published as a conference paper at ICLR 2018

In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs. There have been several proposals to alleviate this issue with, for instance, triangulation and semi-supervised learning techniques, but they still require a strong cross-lingual signal. In this work, we completely remove the need of parallel data and propose a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora. Our model builds upon the recent work on unsupervised embedding mappings, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingual corpora alone using a combination of denoising and backtranslation. Despite the simplicity of the approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014 French-to-English and German-to-English translation. The model can also profit from small parallel corpora, and attains 21.81 and 15.24 points when combined with 100,000 parallel sentences, respectively. Our implementation is released as an open source project.

NMT · 詞性標注 · Machine Translation · 層 · MoDELS ·

2018 年 1 月 23 日

Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks

Yonatan Belinkov,Lluís Màrquez,Hassan Sajjad,Nadir Durrani,Fahim Dalvi,James Glass

from arxiv, IJCNLP 2017

While neural machine translation (NMT) models provide improved translation quality in an elegant, end-to-end framework, it is less clear what they learn about language. Recent work has started evaluating the quality of vector representations learned by NMT models on morphological and syntactic tasks. In this paper, we investigate the representations learned at different layers of NMT encoders. We train NMT systems on parallel data and use the trained models to extract features for training a classifier on two tasks: part-of-speech and semantic tagging. We then measure the performance of the classifier as a proxy to the quality of the original NMT model for the given task. Our quantitative analysis yields interesting insights regarding representation learning in NMT models. For instance, we find that higher layers are better at learning semantics while lower layers tend to be better for part-of-speech tagging. We also observe little effect of the target language on source-side representations, especially with higher quality NMT models.

Machine Translation · NMT · 隨機變量 · 后驗推斷 · 再參數化/重參數化 ·

2018 年 1 月 16 日

Variational Recurrent Neural Machine Translation

Jinsong Su,Shan Wu,Deyi Xiong,Yaojie Lu,Xianpei Han,Biao Zhang

from arxiv, accepted by AAAI 18

Partially inspired by successful applications of variational recurrent neural networks, we propose a novel variational recurrent neural machine translation (VRNMT) model in this paper. Different from the variational NMT, VRNMT introduces a series of latent random variables to model the translation procedure of a sentence in a generative way, instead of a single latent variable. Specifically, the latent random variables are included into the hidden states of the NMT decoder with elements from the variational autoencoder. In this way, these variables are recurrently generated, which enables them to further capture strong and complex dependencies among the output translations at different timesteps. In order to deal with the challenges in performing efficient posterior inference and large-scale training during the incorporation of latent variables, we build a neural posterior approximator, and equip it with a reparameterization technique to estimate the variational lower bound. Experiments on Chinese-English and English-German translation tasks demonstrate that the proposed model achieves significant improvements over both the conventional and variational NMT models.