文本情感分析

一篇文章全文體現的感情色彩怎麼判斷，大家只要讀過一遍就知道作者的情感方向。但是如果我沒有讀過這啊文章，到那時又想快速知道這篇文章的感情偏向，應該怎麼辦那？

本文就是告訴大家在不知道文章感情色彩的情況下，怎麼用R語言來做感情色彩分析。

思路：

載入超時，點擊重試

上述就是完整分析分詞的思路，採用的方式是和情感詞典庫匹配；判斷文章情感。

註：文本情感分析一共有三種方式

1、精確模式：試圖將句子最精確地切分開，適合文本分析

2、全模式：把句子中所有可以成詞的詞語掃面一邊，速度非常快，到那時不能解決歧義

3、搜索引擎模式：在精確模式的基礎上，對長詞再次切分，提高召回率，適合用於搜索引擎

本文採用第一種模式進行判斷。

話不多說，直接上代》》》

#數據詞雲處理 install.packages("jiebaR") install.packages("wordcloud2") install.packages("dplyr") install.packages("plyr") install.packages("ggplot2") library(jiebaR) library(wordcloud2) library("dplyr") library("plyr") library(ggplot2) setwd("C:/Users/Administrator/Desktop/word") text1<-readLines("./data/南京.txt",encoding = "UTF-8")#導入原始文本 #文本的部分內容如下

#導入正負情感詞典和停頓詞 pos <- readLines("C:/Users/Administrator/Desktop/word/data/p.txt") neg <- readLines("C:/Users/Administrator/Desktop/word/data/n.txt") stopwords <- readLines("C:/Users/Administrator/Desktop/word/data/stop_words_zh.txt", encoding = UTF-8)

#積極情感詞

#消極情感辭彙部分

#停頓詞部分：

#文本清理 text2 <- gsub(pattern = " ", replacement ="", text1) #gsub是字元替換函數，去空格 text2 <- gsub(" ", "", text1) #有時需要使用\ text2 <- gsub(",", "，", text1)#文中有英文逗號會報錯，所以用大寫的「，」 text2 <- gsub("~|", "", text1)#替換了波浪號（~）和英文單引號（），它們之間用「|」符號隔開，表示或的關係 text2 <- gsub("\"", "", text1)#替換所有的英文雙引號（"），因為雙引號在R中有特殊含義，所以要使用三個斜杠（\）轉義 #清除部分內容 sentence <- as.vector(text2) #文本內容轉化為向量sentence sentence <- gsub("[[:digit:]]*", "", sentence) #清除數字[a-zA-Z] sentence <- gsub("[a-zA-Z]", "", sentence) #清除英文字元 sentence <- gsub("\.", "", sentence) #清除全英文的dot符號 sentence <- sentence[!is.na(sentence)] #清除對應sentence裡面的空值（文本內容），要先執行文本名 sentence <- sentence[!nchar(sentence) < 2] #`nchar`函數對字元計數，英文嘆號為R語言里的「非」函數 text<-sentence #合併情感詞庫 mydict<-c(pos,neg) #jiebaR切換分詞 engine <- worker()#設置搜索引擎 #自發定義詞庫 new_user_word(engine, mydict) # 對每一條評論進行切詞 segwords <- sapply(text, segment, engine) head(segwords) # 自定義函數：用於刪除停止詞 removewords <- function(target_words,stop_words){ target_words = target_words[target_words%in%stop_words==FALSE] return(target_words) } segwords2 <- sapply(segwords, removewords, stopwords) head(segwords2) #自定義情感類型得分函數 fun <- function(x,y) x %in% y getEmotionalType <- function(x,pwords,nwords) { pos.weight = sapply(llply(x,fun,pwords),sum) neg.weight = sapply(llply(x,fun,nwords),sum) total = pos.weight - neg.weight

return(data.frame(pos.weight, neg.weight, total))
}
# 計算每條評論的正負得分
score <- getEmotionalType(segwords2, pos, neg)
head(score)
nrow(score)
nrow(text1)
#計算總分，以總分為正記為積極語句，以總分為負記為消極語句
TEXT.score <- cbind(text1, score)
TEXT.score <- transform(TEXT.score,
emotion = ifelse(total>=0, Pos, Neg))#transform為數據框增加列
class(TEXT.score)
#可視化表示本文的情感偏好
ggplot(group_by(TEXT.score,emotion),aes(x=emotion,fill=emotion))+geom_bar(width =0.5)+ggtitle("情感偏向對比")

圖形如圖：