當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

红楼梦实例分析

發(fā)布時(shí)間：2024/3/7 编程问答 56 豆豆

生活随笔收集整理的這篇文章主要介紹了红楼梦实例分析小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

文本文件“紅樓夢(mèng). txt”中包含了《紅樓夢(mèng)》小說前20章內(nèi)容，“ 停用詞. txt”包含了需要排除的詞語(yǔ)。實(shí)現(xiàn)以下功能。????????????????????????????????????????????????????????????????????????????????????????????????
1.對(duì)“紅樓夢(mèng). txt”中文本進(jìn)行分詞，并對(duì)人物名稱進(jìn)行歸-化處理，僅歸一化以下內(nèi)容:????????????????????????????????????????????????????????????????????????????????????????????????
鳳姐、鳳姐兒、鳳丫頭歸-為鳳姐
寶玉、二爺、寶二爺歸-為寶玉
黛玉、顰兒、林妹妹、黛玉道歸-為黛玉
寶釵、寶丫頭歸一為寶釵
賈母、老祖宗歸-為賈母
襲人、襲人道歸一為襲人
賈政、賈政道歸一為賈政
賈鏈、璉二爺歸一為賈璉
2.不統(tǒng)計(jì)“停用詞.txt"文件中包含詞語(yǔ)的詞頻（名字必須大于一個(gè)字）。????????????????????????????????????????????????????????????????????????????????????????????????
3.提取出場(chǎng)次數(shù)不少于40次的人物名稱，將人物名稱及其出場(chǎng)次教按照遞減排序，保存到result.csv文件中，出場(chǎng)次數(shù)相同的.則按照人物名稱的字符順序排序。????????????????????????????????????????????????????????????????????????????????????????????????
輸出示例????????????????????????????????????????????????????????????????????????????????????????????????
寶玉,597
鳳姐,296
一個(gè),179
如今,132
黛玉,113
一面,112

方法1： import jieba f = "紅樓夢(mèng).txt" sf = "停用詞.txt" f1=open(f,'r',encoding='utf-8') txt=jieba.lcut(f1.read()) f2=open(sf,'r',encoding='utf-8') lines=f2.readlines() ty=[] #存放停用詞 for line in lines:ty.append(line[:-1]) #去掉行尾換行符 txt0=[] #存放剔除停用詞后的紅樓夢(mèng)文本 for x in txt:if x not in ty:txt0.append(x) d={} for word in txt0:if len(word)<=1:continueelif word == '鳳姐兒' or word == '鳳丫頭':rword = '鳳姐'elif word == '二爺' or word == '寶二爺':rword = '寶玉'elif word == '顰兒' or word == '林妹妹' or word == '黛玉道':rword = '黛玉'elif word == '寶丫頭':rword = '寶釵'elif word == '老祖宗':rword = '賈母'elif word == '襲人道':rword = '襲人'elif word == '賈政道':rword = '賈政'elif word == '璉二爺':rword = '賈璉'else:rword=wordd[rword]=d.get(rword,0)+1 ls = list(d.items()) ls.sort(key=lambda x: x[1], reverse=True) fo=open(r'result.csv', 'a', encoding='utf-8') for i in ls:if i[1]>=40:print("{},{}".format(i[0],i[1]))fo.write("{},{}\n".format(i[0],i[1])) f1.close() f2.close() fo.close() 方法2： import jieba f = "紅樓夢(mèng).txt" sf = "停用詞.txt" txt = jieba.lcut(open(f, 'r', encoding='utf-8').read()) # open函數(shù)讀取紅樓夢(mèng)文本并分詞 ,正式考試可以不用指定編碼，用系統(tǒng)默認(rèn)。 f.read()讀入全部?jī)?nèi)容。jieba.lcut()返回一個(gè)列表類型的分詞結(jié)果。 stop_words = [] with open(sf, 'r', encoding='utf-8') as f: # 讀取停用詞文本并分割文本后添加到stop_words列表中。with語(yǔ)句打開文件，好處是讀取文件后自動(dòng)關(guān)閉，不需要手動(dòng)關(guān)閉。for i in f.read().splitlines(): #str.splitlines([keepends])：返回一個(gè)列表，分割符為('\r','\r\n','\n')即按行分割。默認(rèn)參數(shù)keepends為False，意思是不保留每行結(jié)尾的'\n'，反之保留。stop_words.append(i) # 剔除停用詞 txt0 = [x for x in txt if x not in stop_words] # 統(tǒng)計(jì)詞頻 counts = {} for word in txt0:if len(word) == 1: # 跳過標(biāo)點(diǎn)符號(hào)和字continueelif word == '鳳姐兒' or word == '鳳丫頭':rword = '鳳姐'elif word == '二爺' or word == '寶二爺':rword = '寶玉'elif word == '顰兒' or word == '林妹妹' or word == '黛玉道':rword = '黛玉'elif word == '寶丫頭':rword = '寶釵'elif word == '老祖宗':rword = '賈母'elif word == '襲人道':rword = '襲人'elif word == '賈政道':rword = '賈政'elif word == '璉二爺':rword = '賈璉'else:rword = wordcounts[rword] = counts.get(rword, 0) + 1 # 固定語(yǔ)句將字典的值進(jìn)行排序 li = list(counts.items()) li.sort(key=lambda x: x[1], reverse=True) # 列出詞頻超過40的結(jié)果 with open('result.csv', 'a', encoding='gbk') as f:for i in li:key, value = iif value < 40:breakf.write(key + ',' + str(value) + '\n') #value為intprint(key + ',' + str(value))

總結(jié)

以上是生活随笔為你收集整理的红楼梦实例分析的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： RCA清洗系统及清洗液自适应预测温度控制
下一篇：广域网协议（HDLC协议和PPP协议）

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

红楼梦实例分析

總結(jié)