台湾 || 语言: 大陆简体港澳繁體台灣正體

解讀 | 從seq2seq到attention的方方面面

雪花臺灣 2019-03-18 16:40

這篇文章的主要內容是對谷歌提出的transformer 進行論文解讀，主要是針對Google在2017年《Attention is all you need》提出的transformer模型進行一個研究，另外我還看了這篇文章《Neural Machine Translation by Jointly Learning to Align and Translate》，這篇文章對seq2seq模型及其attention機制的應用做了詳細的介紹。在此整理了下內容及筆記，歡迎各位讀者交流討論。

《Attention is all you need 》論文地址：https://arxiv.org/pdf/1706.03762.pdf

《Neural Machine Translation by Jointly Learning to Align and Translate》

論文地址：https://arxiv.org/pdf/1409.0473.pdf

目錄

1.引例

2.Encoder-Decoder框架

2.1seq2seq框架內部構造

2.2encoder內部連接解釋

2.3decoder內部連接解釋

2.4 侷限性

2.5 solution

3.《Neural Machine Translation by Jointly Learning to Align and Translate》

3.1 encoder

3.2 decoder

4.《Attention is all you need》

4.1transformer

4.1.1 encoder

4.1.2 decoder

4.1.3 殘差網路

4.2.1 self-attetion

4.2.2 Multi-Head attention

4.3 position-wise feed-forward network

4.4 position embeddings

4.5 怎麼計算attention

4.6 match程度怎麼比較

5 總結

1.引例

從注意力模型的命名方式看，很明顯其借鑒了人類的注意力機制，因此，我們首先簡單介紹人類視覺的選擇性注意力機制。

相關文章