內容簡介: 采用NLP預訓練模型Bert的訓練數據如果少的話,那么不足以訓練復雜的網絡;并且如果采用bert進行預訓練則可以加快訓練的速度;在運用預訓練時,首先對參數進行初始化,找到一個好的初始點,那么對后續的優化將會產生巨大的影響。
說到利用深度學習來進行自然語言處理,必然繞不開的一個問題就是“Word Embedding”也 就是將詞轉換為計算機能夠處理的向量,隨之而來的人們也碰到到了一個根本性的問題,我們通常會面臨這樣的一個問題,同一個單詞在不同語 境中的一詞多義問題,研究人員對此也想到了對應的解決方案,例如在大語料上訓練語境表示,從而得到不同的上下文情況的 不同向量表示。
Bert在模型層面上并沒有新的突破,準確來說它更像是NLP領域 近期優秀模型的集大成者,Bert相比其他神經網絡模型,同時具備了特征提取能力與語境表達能力,這是其他比如OPEN AI與ELMo所不能達到的。為了解決雙向編碼器循環過程中出現的間接“窺見”自己的問題,Bert采用了一個masked語言模型,將其他模型的思想恰到好處的融合起來了。
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.