當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

基于情感词典的python情感分析

發(fā)布時(shí)間：2023/12/9 python 43 豆豆

生活随笔收集整理的這篇文章主要介紹了基于情感词典的python情感分析小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

近期老師給我們安排了一個(gè)大作業(yè)，要求根據(jù)情感詞典對微博語料進(jìn)行情感分析。于是在網(wǎng)上狂找資料，看相關(guān)書籍，終于搞出了這個(gè)任務(wù)。現(xiàn)在做做筆記，總結(jié)一下本次的任務(wù)，同時(shí)也給遇到有同樣需求的人，提供一點(diǎn)幫助。

1、情感分析含義

情感分析指的是對新聞報(bào)道、商品評論、電影影評等文本信息進(jìn)行觀點(diǎn)提取、主題分析、情感挖掘。情感分析常用于對某一篇新聞報(bào)道積極消極分析、淘寶商品評論情感打分、股評情感分析、電影評論情感挖掘。情感分析的內(nèi)容包括:情感的持有者分析、態(tài)度持有者分析、態(tài)度類型分析（一系列類型如喜歡（like），討厭（hate），珍視（value），渴望（desire）等；或著簡單的加權(quán)極性如積極（positive），消極（negative）和中性（neutral）并可用具體的權(quán)重修飾）、態(tài)度的范圍分析（包含每句話，某一段、或者全文）。因此，情感分析的目的可以分為：初級：文章的整體感情是積極/消極的；進(jìn)階：對文章的態(tài)度從1-5打分；高級：檢測態(tài)度的目標(biāo)，持有者和類型。

總的來說，情感分析就是對文本信息進(jìn)行情感傾向挖掘。

2、情感挖掘方法

情感挖掘目前主要使用的方法是使用情感詞典，對文本進(jìn)行情感詞匹配，匯總情感詞進(jìn)行評分，最后得到文本的情感傾向。本次我主要使用了兩種方法進(jìn)行情感分析。第一種：基于BosonNLP情感詞典。該情感詞典是由波森自然語言處理公司推出的一款已經(jīng)做好標(biāo)注的情感詞典。詞典中對每個(gè)情感詞進(jìn)行情感值評分，bosanNLP情感詞典如下圖所示：

第二種，采用的是知網(wǎng)推出的情感詞典，以及極性表進(jìn)行情感分析。知網(wǎng)提供的情感詞典共用12個(gè)文件，分為英文和中文。其中中文情感詞典包括：評價(jià)、情感、主張、程度（正面、負(fù)面）的情感文本。本文將評價(jià)和情感詞整合作為情感詞典使用，程度詞表中含有的程度詞，按照等級區(qū)分，分為：most（最高）-very（很、非常）-more（更多、更）-ish（稍、一點(diǎn)點(diǎn)）-insufficiently（欠、不）-over（過多、多分、多）六個(gè)情感程度詞典。

?知網(wǎng)情感詞典下載地址：-?http://www.keenage.com/html/c_bulletin_2007.htm

?3、原理介紹

3.1 基于BosonNLP情感分析原理

基于BosonNLP情感詞典的情感分析較為簡單。首先，需要對文本進(jìn)行分句、分詞，本文選擇的分詞工具為哈工大的pyltp。其次，將分詞好的列表數(shù)據(jù)對應(yīng)BosonNLp詞典進(jìn)行逐個(gè)匹配，并記錄匹配到的情感詞分值。最后，統(tǒng)計(jì)計(jì)算分值總和，如果分值大于0，表示情感傾向?yàn)榉e極的；如果小于0，則表示情感傾向?yàn)橄麡O的。原理框圖如下：

3.2 基于BosonNLP情感分析代碼：

# -*- coding:utf-8 -*- import pandas as pd import jieba

#基于波森情感詞典計(jì)算情感值
def getscore(text):
df = pd.read_table(r"BosonNLP_dict\BosonNLP_sentiment_score.txt", sep=" ", names=[‘key’, ‘score’])
key = df[‘key’].values.tolist()
score = df[‘score’].values.tolist()
# jieba分詞
segs = jieba.lcut(text,cut_all = False) #返回list
# 計(jì)算得分
score_list = [score[key.index(x)] for x in segs if(x in key)]
return sum(score_list)

#讀取文件
def read_txt(filename):
with open(filename,‘r’,encoding=‘utf-8’)as f:
txt = f.read()
return txt
#寫入文件
def write_data(filename,data):
with open(filename,‘a(chǎn)’,encoding=‘utf-8’)as f:
f.write(data)

if name==‘main’:
text = read_txt(‘test_data\微博.txt’)
lists = text.split(’\n’)

# al_senti = ['無','積極','消極','消極','中性','消極','積極','消極','積極','積極','積極', # '無','積極','積極','中性','積極','消極','積極','消極','積極','消極','積極', # '無','中性','消極','中性','消極','積極','消極','消極','消極','消極','積極' # ] al_senti = read_txt(r'test_data\人工情感標(biāo)注.txt').split('\n') i = 0 for list in lists:if list != '':# print(list)sentiments = round(getscore(list),2)#情感值為正數(shù)，表示積極；為負(fù)數(shù)表示消極print(list)print("情感值：",sentiments)print('人工標(biāo)注情感傾向：'+al_senti[i])if sentiments > 0:print("機(jī)器標(biāo)注情感傾向：積極\n")s = "機(jī)器判斷情感傾向：積極\n"else:print('機(jī)器標(biāo)注情感傾向：消極\n')s = "機(jī)器判斷情感傾向：消極"+'\n'sentiment = '情感值：'+str(sentiments)+'\n'al_sentiment= '人工標(biāo)注情感傾向:'+al_senti[i]+'\n'#文件寫入filename = 'result_data\BosonNLP情感分析結(jié)果.txt'write_data(filename,'情感分析文本：')write_data(filename,list+'\n') #寫入待處理文本write_data(filename,sentiment) #寫入情感值write_data(filename,al_sentiment) #寫入機(jī)器判斷情感傾向write_data(filename,s+'\n') #寫入人工標(biāo)注情感i = i+1</pre>

相關(guān)文件：

BosonNLp情感詞典下載地址：https://bosonnlp.com/dev/resource

微博語料：鏈接：https://pan.baidu.com/s/1Pskzw7bg9qTnXD_QKF-4sg? ?提取碼：15bu

輸出結(jié)果：

?3.3 基于知網(wǎng)情感詞典的情感挖掘原理

基于知網(wǎng)情感詞典的情感分析原理分為以下幾步：

1、首先，需要對文本分句，分句，得到分詞分句后的文本語料，并將結(jié)果與哈工大的停用詞表比對，去除停用詞；

2、其次，對每一句話進(jìn)行情感分析，分析的方法主要為：判斷這段話中的情感詞數(shù)目，含有積極詞，則積極詞數(shù)目加1，含有消極詞，則消極詞數(shù)目加1。并且再統(tǒng)計(jì)的過程中還需要判斷該情感詞前面是否存在程度副詞，如果存在，則需要根據(jù)程度副詞的種類賦予不同的權(quán)重，乘以情感詞數(shù)。如果句尾存在？！等符號(hào)，則情感詞數(shù)目增加一定值，因?yàn)?#xff01;與？這類的標(biāo)點(diǎn)往往表示情感情緒的加強(qiáng)，因此需要進(jìn)行一定處理。

3、接著統(tǒng)計(jì)計(jì)算整段話的情感值(積極詞值-消極詞值），得到該段文本的情感傾向。

4、最后，統(tǒng)計(jì)每一段的情感值，相加得到文章的情感值。

整體流程框圖如下：

3.4 基于知網(wǎng)情感詞典的情感分析代碼：

import pyltp from pyltp import Segmentor from pyltp import SentenceSplitter from pyltp import Postagger import numpy as np

#讀取文件，文件讀取函數(shù)
def read_file(filename):
with open(filename, ‘r’,encoding=‘utf-8’)as f:
text = f.read()
#返回list類型數(shù)據(jù)
text = text.split(’\n’)
return text

#將數(shù)據(jù)寫入文件中
def write_data(filename,data):
with open(filename,‘a(chǎn)’,encoding=‘utf-8’)as f:
f.write(str(data))

#文本分句
def cut_sentence(text):
sentences = SentenceSplitter.split(text)
sentence_list = [ w for w in sentences]
return sentence_list

#文本分詞
def tokenize(sentence):
#加載模型
segmentor = Segmentor() # 初始化實(shí)例
# 加載模型
segmentor.load(r’E:\tool\python\Lib\site-packages\pyltp-0.2.1.dist-info\ltp_data\cws.model’)
# 產(chǎn)生分詞，segment分詞函數(shù)
words = segmentor.segment(sentence)
words = list(words)
# 釋放模型
segmentor.release()
return words

#詞性標(biāo)注
def postagger(words):
# 初始化實(shí)例
postagger = Postagger()
# 加載模型
postagger.load(r’E:\tool\python\Lib\site-packages\pyltp-0.2.1.dist-info\ltp_data\pos.model’)
# 詞性標(biāo)注
postags = postagger.postag(words)
# 釋放模型
postagger.release()
#返回list
postags = [i for i in postags]
return postags

# 分詞，詞性標(biāo)注，詞和詞性構(gòu)成一個(gè)元組
def intergrad_word(words,postags):
#拉鏈算法，兩兩匹配
pos_list = zip(words,postags)
pos_list = [ w for w in pos_list]
return pos_list

#去停用詞函數(shù)
def del_stopwords(words):
# 讀取停用詞表
stopwords = read_file(r"test_data\stopwords.txt")
# 去除停用詞后的句子
new_words = []
for word in words:
if word not in stopwords:
new_words.append(word)
return new_words

# 獲取六種權(quán)值的詞，根據(jù)要求返回list，這個(gè)函數(shù)是為了配合Django的views下的函數(shù)使用
def weighted_value(request):
result_dict = []
if request == “one”:
result_dict = read_file(r"E:\學(xué)習(xí)筆記\NLP學(xué)習(xí)\NLP code\情感分析3\degree_dict\most.txt")
elif request == “two”:
result_dict = read_file(r"E:\學(xué)習(xí)筆記\NLP學(xué)習(xí)\NLP code\情感分析3\degree_dict\very.txt")
elif request == “three”:
result_dict = read_file(r"E:\學(xué)習(xí)筆記\NLP學(xué)習(xí)\NLP code\情感分析3\degree_dict\more.txt")
elif request == “four”:
result_dict = read_file(r"E:\學(xué)習(xí)筆記\NLP學(xué)習(xí)\NLP code\情感分析3\degree_dict\ish.txt")
elif request == “five”:
result_dict = read_file(r"E:\學(xué)習(xí)筆記\NLP學(xué)習(xí)\NLP code\情感分析3\degree_dict\insufficiently.txt")
elif request == “six”:
result_dict = read_file(r"E:\學(xué)習(xí)筆記\NLP學(xué)習(xí)\NLP code\情感分析3\degree_dict\inverse.txt")
elif request == ‘posdict’:
result_dict = read_file(r"E:\學(xué)習(xí)筆記\NLP學(xué)習(xí)\NLP code\情感分析3\emotion_dict\pos_all_dict.txt")
elif request == ‘negdict’:
result_dict = read_file(r"E:\學(xué)習(xí)筆記\NLP學(xué)習(xí)\NLP code\情感分析3\emotion_dict\neg_all_dict.txt")
else:
pass
return result_dict

print(“reading sentiment dict …”)
#讀取情感詞典
posdict = weighted_value(‘posdict’)
negdict = weighted_value(‘negdict’)
# 讀取程度副詞詞典
# 權(quán)值為2
mostdict = weighted_value(‘one’)
# 權(quán)值為1.75
verydict = weighted_value(‘two’)
# 權(quán)值為1.50
moredict = weighted_value(‘three’)
# 權(quán)值為1.25
ishdict = weighted_value(‘four’)
# 權(quán)值為0.25
insufficientdict = weighted_value(‘five’)
# 權(quán)值為-1
inversedict = weighted_value(‘six’)

#程度副詞處理，對不同的程度副詞給予不同的權(quán)重
def match_adverb(word,sentiment_value):
#最高級權(quán)重為
if word in mostdict:
sentiment_value = 8
#比較級權(quán)重
elif word in verydict:
sentiment_value = 6
#比較級權(quán)重
elif word in moredict:
sentiment_value = 4
#輕微程度詞權(quán)重
elif word in ishdict:
sentiment_value = 2
#相對程度詞權(quán)重
elif word in insufficientdict:
sentiment_value = 0.5
#否定詞權(quán)重
elif word in inversedict:
sentiment_value = -1
else:
sentiment_value *= 1
return sentiment_value

#對每一條微博打分
def single_sentiment_score(text_sent):
sentiment_scores = []
#對單條微博分句
sentences = cut_sentence(text_sent)
for sent in sentences:
#查看分句結(jié)果
# print(‘分句：’,sent)
#分詞
words = tokenize(sent)
seg_words = del_stopwords(words)
#i，s 記錄情感詞和程度詞出現(xiàn)的位置
i = 0 #記錄掃描到的詞位子
s = 0 #記錄情感詞的位置
poscount = 0 #記錄積極情感詞數(shù)目
negcount = 0 #記錄消極情感詞數(shù)目
#逐個(gè)查找情感詞
for word in seg_words:
#如果為積極詞
if word in posdict:
poscount += 1 #情感詞數(shù)目加1
#在情感詞前面尋找程度副詞
for w in seg_words[s:i]:
poscount = match_adverb(w,poscount)
s = i+1 #記錄情感詞位置
# 如果是消極情感詞
elif word in negdict:
negcount +=1
for w in seg_words[s:i]:
negcount = match_adverb(w,negcount)
s = i+1
#如果結(jié)尾為感嘆號(hào)或者問號(hào)，表示句子結(jié)束，并且倒序查找感嘆號(hào)前的情感詞，權(quán)重+4
elif word ==’!’ or word ==’！’ or word ==’?’ or word == ‘？’:
for w2 in seg_words[::-1]:
#如果為積極詞，poscount+2
if w2 in posdict:
poscount += 4
break
#如果是消極詞，negcount+2
elif w2 in negdict:
negcount += 4
break
i += 1 #定位情感詞的位置
#計(jì)算情感值
sentiment_score = poscount - negcount
sentiment_scores.append(sentiment_score)
#查看每一句的情感值
# print(‘分句分值：’,sentiment_score)
sentiment_sum = 0
for s in sentiment_scores:
#計(jì)算出一條微博的總得分
sentiment_sum +=s
return sentiment_sum

# 分析test_data.txt 中的所有微博，返回一個(gè)列表，列表中元素為（分值，微博）元組
def run_score(contents):
# 待處理數(shù)據(jù)
scores_list = []
for content in contents:
if content !=’’:
score = single_sentiment_score(content) # 對每條微博調(diào)用函數(shù)求得打分
scores_list.append((score, content)) # 形成（分?jǐn)?shù)，微博）元組
return scores_list

#主程序
if name == ‘main’:
print(‘Processing…’)
#測試
# sentence = ‘要怎么說呢! 我需要的戀愛不是現(xiàn)在的樣子, 希望是能互相鼓勵(lì)的勉勵(lì), 你現(xiàn)在的樣子讓我覺得很困惑。你到底能不能陪我一直走下去, 你是否有決心?是否你看不慣我?你是可以隨意的生活,但是我的未來我耽誤不起！’
# sentence = ‘轉(zhuǎn)有用嗎？這個(gè)事本來就是要全社會(huì)共同努力的，公交公司有沒有培訓(xùn)到位？公交車上地鐵站內(nèi)有沒有放足夠的宣傳標(biāo)語？我現(xiàn)在轉(zhuǎn)一下微博，沒有多大的意義。’
sentences = read_file(r’test_data\微博.txt’)
scores = run_score(sentences)
#人工標(biāo)注情感詞典
man_sentiment = read_file(r’test_data\人工情感標(biāo)注.txt’)
al_sentiment = []
for score in scores:
print(‘情感分值：’,score[0])
if score[0] < 0:
print(‘情感傾向：消極’)
s = ‘消極’
elif score[0] == 0:
print(‘情感傾向：中性’)
s = ‘中性’
else:
print(‘情感傾向：積極’)
s = ‘積極’
al_sentiment.append(s)
print(‘情感分析文本：’,score[1])
i = 0
#寫入文件中
filename = r’result_data\result_data.txt’
for score in scores:
write_data(filename, ‘情感分析文本：{}’.format(str(score[1]))+’\n’) #寫入情感分析文本
write_data(filename,‘情感分值：{}’.format(str(score[0]))+’\n’) #寫入情感分值
write_data(filename,‘人工標(biāo)注情感：{}’.format(str(man_sentiment[i]))+’\n’) #寫入人工標(biāo)注情感
write_data(filename, ‘機(jī)器情感標(biāo)注：{}’.format(str(al_sentiment[i]))+’\n’) #寫入機(jī)器情感標(biāo)注
write_data(filename,’\n’)
i +=1
print(‘succeed…’)

?輸出結(jié)果：

4、小結(jié)

?本次的情感分析程序完成簡單的情感傾向判斷，準(zhǔn)確率上基于BosonNLP的情感分析較低，其情感分析準(zhǔn)確率為：56.67%；而基于知網(wǎng)情感詞典的情感分析準(zhǔn)確率達(dá)到90%，效果上還是不錯(cuò)的。但是，這兩個(gè)程序都還只是情感分析簡單使用，并未涉及到更深?yuàn)W的算法，如果想要更加精確，或者再更大樣本中獲得更高精度，這兩個(gè)情感分析模型還是不夠的。但是用來練習(xí)學(xué)習(xí)還是不錯(cuò)的選擇。要想更深入的理解和弄得情感分析，還需要繼續(xù)學(xué)習(xí)。

總結(jié)

以上是生活随笔為你收集整理的基于情感词典的python情感分析的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： oatdata结构详解
下一篇： python对找到的匹配项作处理后再替换