自然語言處理基礎技術工具篇之Stanfordcorenlp
Stanfordcorenlp簡介
- Stanford CoreNLP提供了一套人類語言技術工具。 支持多種自然語言處理基本功能,Stanfordcorenlp是它的一個python介面。
- 官網地址:Stanford CoreNLP - Natural language software
- Github地址:stanfordnlp/CoreNLP
- Stanfordcorenlp主要功能包括分詞、詞性標註、命名實體識別、句法結構分析和依存分析等等。
Stanfordcorenlp工具Demo
安裝:pip install stanfordcorenlp
先下載模型,下載地址:https://nlp.stanford.edu/software/corenlp-backup-download.html
支持多種語言,這裡記錄一下中英文使用方法
from stanfordcorenlp import StanfordCoreNLP
zh_model = StanfordCoreNLP(rstanford-corenlp-full-2018-02-27, lang=zh)
en_model = StanfordCoreNLP(rstanford-corenlp-full-2018-02-27, lang=en)
zh_sentence = 我愛自然語言處理技術!
en_sentence = I love natural language processing technology!
1.分詞(Tokenize)
print (Tokenize:, zh_model.word_tokenize(zh_sentence))
print (Tokenize:, en_model.word_tokenize(en_sentence))
Tokenize: [我愛, 自然, 語言, 處理, 技術, !]
Tokenize: [I, love, natural, language, processing, technology, !]
2.詞性標註(Part of Speech)
print (Part of Speech:, zh_model.pos_tag(zh_sentence))
print (Part of Speech:, en_model.pos_tag(en_sentence))
Part of Speech: [(我愛, NN), (自然, AD), (語言, NN), (處理, VV), (技術, NN), (!, PU)]
Part of Speech: [(I, PRP), (love, VBP), (natural, JJ), (language, NN), (processing, NN), (technology, NN), (!, .)]
3.命名實體識別(Named Entity)
print (Named Entities:, zh_model.ner(zh_sentence))
print (Named Entities:, en_model.ner(en_sentence))
Named Entities: [(我愛, O), (自然, O), (語言, O), (處理, O), (技術, O), (!, O)]
Named Entities: [(I, O), (love, O), (natural, O), (language, O), (processing, O), (technology, O), (!, O)]
4.句法成分分析(Constituency Parse)
print (Constituency Parsing:, zh_model.parse(zh_sentence) + "
")
print (Constituency Parsing:, en_model.parse(en_sentence))
Constituency Parsing: (ROOT
(IP
(IP
(NP (NN 我愛))
(ADVP (AD 自然))
(NP (NN 語言))
(VP (VV 處理)
(NP (NN 技術))))
(PU !)))
Constituency Parsing: (ROOT
(S
(NP (PRP I))
(VP (VBP love)
(NP (JJ natural) (NN language) (NN processing) (NN technology)))
(. !)))
5.依存句法分析(Dependency Parse)
print (Dependency:, zh_model.dependency_parse(zh_sentence))
print (Dependency:, en_model.dependency_parse(en_sentence))
Dependency: [(ROOT, 0, 4), (nsubj, 4, 1), (advmod, 4, 2), (nsubj, 4, 3), (dobj, 4, 5), (punct, 4, 6)]
Dependency: [(ROOT, 0, 2), (nsubj, 2, 1), (amod, 6, 3), (compound, 6, 4), (compound, 6, 5), (dobj, 2, 6), (punct, 2, 7)]
另外,代碼我已經上傳github:https://github.com/yuquanle/StudyForNLP/blob/master/NLPtools/StanfordcorenlpDemo.ipynb
公眾號:StudyForAI(小白人工智慧入門學習)
推薦閱讀: