Road 2 NLP- Word Embedding詞向量（FastText）

1. 參考資料

論文1：FastText1詞向量，《Enriching Word Vectors with Subword Information》，作者Bojanowski et al. （包括Mikolov），FAIR（Facebook AI Research）
論文2：FastText2文本分類模型，《Bag of Tricks for Efficient Text Classification》，作者Joulin et al. （包括Mikolov），FAIR（Facebook AI Research）
博客文：《fastText 源碼分析》
FastText源代碼（C++）

主要參考資料如上，其實還有其他博客文，然而很多博客文都是互相抄襲的……而且很多都是講解FastText文本分類模型，而非FastText詞向量，前者基於後者建模。FastText文本分類模型原理簡單易懂，然而詞向量的訓練原理有某些地方講的很含糊。基於以上參考資料，我只能做出個人理解。（因為看源碼看了很久，還是感覺沒能解決我的核心疑惑……）

FastText其實是包括2個東西的：

FastText詞向量（PS：和Word2vec、GloVe一樣，FastText詞向量也屬於靜態詞向量），對應論文1
FastText文本分類模型，對應論文2

雖說本系列文章主題是：Word Embedding詞向量，但是由於FastText特殊性，這裡一起講FastText文本分類模型。

FastText的最大優點：快速。

《Efficient estimation of word representations in vector space》摘要部分：

We can train fastText on more than
one billion words in less than ten
minutes using a standard multicore CPU,
and classify half a million sentences
among 312K classes in less than a minute.