當前位置：首頁 > 编程语言 > python >内容正文

python

python统计txt文件中文词频_Python 中文文件统计词频 + 中文词云

發布時間：2025/3/15 python 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 python统计txt文件中文词频_Python 中文文件统计词频 + 中文词云小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1. 詞頻統計：

1 importjieba2 txt = open("threekingdoms3.txt", "r", encoding=‘utf-8‘).read()3 words =jieba.lcut(txt)4 counts ={}5 for word inwords:6 if len(word) == 1:7 continue

8 else:9 counts[word] = counts.get(word,0) + 1

10 items =list(counts.items())11 items.sort(key=lambda x:x[1], reverse=True)12 for i in range(15):13 word, count =items[i]14 print ("{0:<10}{1:>5}".format(word, count))

結果是：

曹操 946

孔明 737

將軍 622

玄德 585

卻說 534

關公 509

荊州 413

二人 410

丞相 405

玄德曰 390

不可 387

孔明曰 374

張飛 358

如此 320

不能 318

進一步改進，我想只知道人物出場統計，代碼如下：

1 importjieba2 txt = open("threekingdoms3.txt", "r", encoding=‘utf-8‘).read()3 names = {‘曹操‘,‘孔明‘,‘劉備‘,‘關羽‘,‘張飛‘,‘呂布‘,‘趙云‘,‘孫權‘,‘周瑜‘,‘袁紹‘,‘黃忠‘,‘魏延‘}4 words =jieba.lcut(txt)5 counts ={}6 for word inwords:7 if len(word) == 1:8 continue

9 elif word == "諸葛亮" or word == "孔明曰":10 rword = "孔明"

11 elif word == "關公" or word == "云長":12 rword = "關羽"

13 elif word == "玄德" or word == "玄德曰":14 rword = "劉備"

15 elif word == "孟德" or word == "丞相":16 rword = "曹操"

17 else:18 rword =word19 counts[rword] = counts.get(rword,0) + 1

20 #for word in excludes:

21 #del counts[word]

22 items =list(counts.items())23 items.sort(key=lambda x:x[1], reverse=True)24 for i in range(40):25 word, count =items[i]26 if word innames:27 print ("{0:<10}{1:>5}".format(word, count))

運行結果為：

曹操 1358

孔明 1265

劉備 1251

關羽 783

張飛 358

呂布 300

趙云 278

孫權 257

周瑜 217

袁紹 191

進一步的做詞云圖：

1 importjieba2 importos3 importwordcloud4

5 defgetText(file):6 with open(file, ‘r‘, encoding= ‘UTF-8‘) as txt:7 txt =txt.read()8 jieba.lcut(txt)9 returntxt10

12 directoryname =os.getcwd()13 filename =input()14 txt = getText(filename + ‘.txt‘)15 wordclouds = wordcloud.WordCloud(width=1000, height= 800, margin=2).generate(txt)16 wordclouds.to_file(‘{}.png‘.format(filename))17

18 os.system(‘{}.png‘.format(filename))

名稱是可以進一步優化的，參見第二部分代碼。

原文：https://www.cnblogs.com/116970u/p/11611821.html

總結

以上是生活随笔為你收集整理的python统计txt文件中文词频_Python 中文文件统计词频 + 中文词云的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python发送短信接口_python发
下一篇：达内python人工智能19年大纲_20