日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

“《三国演义》人物出场统计“实例讲解

發布時間:2023/12/18 编程问答 43 豆豆
生活随笔 收集整理的這篇文章主要介紹了 “《三国演义》人物出场统计“实例讲解 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

剛學完英文詞頻統計,現在我們來看一下中文人物出場統計

下面我們以《三國演義》為例,進行統計分析

一、解題思路

1.jieba庫的使用

jieba庫是優秀的中文第三方庫,利用jieba庫我們可以對中文文本分詞獲得單個的詞語

2.詞語篩選

本次統計的目的是獲取《三國演義》中的人物出場次數,這就要求我們對詞語進行篩選,

  • 篩除一個字的詞語(不可能是人名)
  • 通過對輸出的結果進行分析,將不符合的詞語進行篩除,不斷重復該步驟,直至輸出的結果符合我們的期望
  • 有的人物可能有多鐘稱謂,需要我們進行合并

3.出場次數排序

通過字典的值,對數據進行排序,輸出出場次數排名前20的人物


二、代碼實現

1.CalThreeKingdomsV1

代碼

#CalThreeKingdomsV1.py import jieba txt = open("threekingdoms.txt", "r", encoding='utf-8').read() words = jieba.lcut(txt) counts = {} for word in words:if len(word) == 1:continueelse:counts[word] = counts.get(word,0) + 1 items = list(counts.items()) items.sort(key=lambda x:x[1], reverse=True) for i in range(15):word, count = items[i]print ("{0:<10}{1:>5}".format(word, count))

注意事項:

  • 讀取中文文本要修改編碼方式為"utf-8",不然沒有辦法讀取
  • 利用jieba.lcut()方法,把文本精確的切分開,不存在冗余單詞
  • 利用字典對出場次數進行統計,利用sorted()方法進行排序

輸出結果

?我們可以看出輸出結果并不是我們所期望的:

  • “將軍,卻說,二人,不可,不能,如此,荊州”都不是人名
  • “曹操”和“丞相”,“孔明”和“孔明曰”都是一個人

2.CalThreeKingdomsV2

將不符合的詞語從字典中篩除,有多個稱謂的進行合并處理

代碼

#CalThreeKingdomsV2.py import jieba excludes = {"將軍","卻說","荊州","二人","不可","不能","如此"} txt = open("threekingdoms.txt", "r", encoding='utf-8').read() words = jieba.lcut(txt) counts = {} for word in words:if len(word) == 1:continueelif word == "諸葛亮" or word == "孔明曰":rword = "孔明"elif word == "關公" or word == "云長":rword = "關羽"elif word == "玄德" or word == "玄德曰":rword = "劉備"elif word == "孟德" or word == "丞相":rword = "曹操"else:rword = wordcounts[rword] = counts.get(rword,0) + 1 for word in excludes:del counts[word] items = list(counts.items()) items.sort(key=lambda x:x[1], reverse=True) for i in range(10):word, count = items[i]print ("{0:<10}{1:>5}".format(word, count))

輸出結果


3.CalThreeKingdomsV3

經過對結果反復的篩選,終于得到了出場次數前20的人名:

代碼

# CalThreeKingdomsV3.py import jieba excludes = {"將軍", "卻說", "荊州", "二人", "不可", "不能", "如此", "商議", "如何","主公", "軍士", "左右", "軍馬", "引兵", "次日", "大喜", "天下", "東吳","于是", "今日", "不敢", "魏兵", "陛下", "一人", "都督", "人馬", "不知","漢中", "只見", "眾將", "蜀兵", "上馬", "大叫", "太守", "此人", "夫人","后人", "背后", "城中", "一面", "何不", "大軍", "忽報", "先生", "百姓","何故", "然后", "先鋒", "不如", "趕來", "原來", "令人", "江東", "下馬","喊聲", "正是", "徐州", "忽然", "因此", "成都", "不見", "未知", "大敗","大事", "之后", "一軍", "引軍", "起兵", "軍中", "接應", "進兵", "大驚", "可以"} txt = open("threekingdoms.txt", "r", encoding='utf-8').read() words = jieba.lcut(txt) counts = {} for word in words:if len(word) == 1:continueelif word == "諸葛亮" or word == "孔明曰":rword = "孔明"elif word == "關公" or word == "云長":rword = "關羽"elif word == "玄德" or word == "玄德曰" or word == "先主":rword = "劉備"elif word == "孟德" or word == "丞相":rword = "曹操"elif word == "后主":rword = "劉禪"elif word == "天子":rword = "劉協"else:rword = wordcounts[rword] = counts.get(rword, 0) + 1 for word in excludes:del counts[word] items = list(counts.items()) items.sort(key=lambda x: x[1], reverse=True) for i in range(20):word, count = items[i]print("{0:<10}{1:>5}".format(word, count))

輸出結果:

?備注:篩除的詞語中有些是具有歧義的,如“先生”“夫人”

看到最后的結果,出場次數最多的是曹操,你是否感到驚訝~~~


總結

以上是生活随笔為你收集整理的“《三国演义》人物出场统计“实例讲解的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。