中文詞向量論文綜述

導讀

前段時間看了一些有關中文詞向量的論文，不過由於時間原因，最近幾天才完成了整理，這裡只是整理了15年以後的幾篇論文，沒有涉及全部，把主要的拿出來了，15前之前也有很好的論文可以看一下，一共寫了4篇綜述，每篇包含2-3篇論文，鏈接裡面是詳細內容。

更新：添加最新有關中文詞向量的論文。

Paper

Component-Enhanced Chinese Character Embeddings

這是一篇2015年發表在EMNLP(Empirical Methods in Natural Language Processing)會議上的論文，作者來自於香港理工大學 — 李嫣然。
Joint Learning of Character and Word Embeddings 這是一篇2015年發表在IJCAI (International Joint Conference on Artificial Intelligence)會議上的論文，作者來自於清華大學 — 陳新雄，徐磊。
Improve Chinese Word Embeddings by Exploiting Internal Structure

這是一篇2016年發表在NAACL-HLT(Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)會議上的論文，作者來自於中國科學技術大學 — Jian Xu。
Multi-Granularity Chinese Word Embedding 這是一篇2016年發表在EMNLP(Empirical Methods in Natural Language Processing)會議上的論文，作者來自於信息內容安全技術國家工程實驗室 — 殷榮超。
Learning Chinese Word Representations From Glyphs Of Characters 這是一篇2017年發表在EMNLP(Empirical Methods in Natural Language Processing)會議上的論文，作者來自於台灣大學 — Tzu-Ray Su 和 Hung-Yi Lee。
Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components 這是一篇2017年發表在EMNLP(Empirical Methods in Natural Language Processing)會議上的論文，作者來自於香港科技大學 — Jinxing Yu。
Enriching Word Vectors with Subword Information 這是一篇2017年發表在ACL(Association for Computational Linguistics)會議上的論文，作者來自於Facebook AI Research — Piotr Bojanowski ，Edouard Grave 。
cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information

這是一篇2018年發表在AAAI 2018(Association for the Advancement of Artificial Intelligence 2018)會議上的論文，作者來自於螞蟻金服人工智慧部 — 曹紹升。
Radical Enhanced Chinese Word Embedding 這是一篇2018年發表在CCL2018(The Seventeenth China National Conference on Computational Linguistics, CCL 2018)會議上的論文，作者來自於電子科技大學 — Zheng Chen 和 Keqi Hu 。
Glyce: Glyph-vectors for Chinese Character Representations

2019年，香儂科技提出了一種漢字字形向量 Glyce。根據漢字的進化過程，採用了多種漢字古今文字和多種書寫風格，專為中文象形文字建模設計了一種田字格 CNN架構。Glyce 在13個任務上面達到了很好的性能。

References

[1] Component-Enhanced Chinese Character Embeddings

[2] Joint Learning of Character and Word Embeddings [3] Improve Chinese Word Embeddings by Exploiting Internal Structure [4] Multi-Granularity Chinese Word Embedding [5] Learning Chinese Word Representations From Glyphs Of Characters [6] Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components [7] Enriching Word Vectors with Subword Information [8] cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information [9] Radical Enhanced Chinese Word Embedding [10] Glyce: Glyph-vectors for Chinese Character Representations
推薦閱讀：