日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

词云_jieba分词

發布時間:2024/9/27 编程问答 20 豆豆
生活随笔 收集整理的這篇文章主要介紹了 词云_jieba分词 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 詞云_jieba分詞

本篇是對詞云的代碼展示,詳細的見如下描述:

# -*- coding: utf-8 -*- from wordcloud import WordCloud import matplotlib.pyplot as plt import jieba import re combine_dict={} stopwords=[]#過濾停用詞 def stopwordslist(stopWord):#stopwords = [line.strip() for line in open(stopWord, encoding='UTF-8').readlines()]return stopwords#同義詞字典,以\t分割 def synonymwordslist(synonymWord):#for line in open(synonymWord, "r", encoding='UTF-8'):seperate_word = line.strip().split("\t")num = len(seperate_word)for i in range(1, num):combine_dict[seperate_word[i]] = seperate_word[0]# refer https://blog.csdn.net/jlulxg/article/details/84650683 # https://www.cnblogs.com/crawer-1/p/8341762.html # http://lzw.me/pages/unicode/ def cleanChinese():s = r"\n\r\t@#$%^&*這樣一本書大賣,hello,,12。!《。有點意外,據說已經印了四五十萬,排行榜僅次于《希拉里自傳》。大概是大眾拋棄了一位表演過火的“文化大師”后,。\n\s\r\t"#t = re.findall('[\u3002\uff1b\uff0c\uff1a\u201c\u201d\uff08\uff09\u3001\uff1f\u300a\u300b\u4e00-\u9fa5]', s)t = re.findall('[\u4e00-\u9fa5]', s) #僅保留漢字部分print(''.join(t))## 讀取文本文件+停用詞 def wordClould(inputText,splitText,outPic):fRead = open(inputText,'r',encoding='UTF-8')fWrite= open(splitText,'w',encoding='UTF-8')def replace_all_blank(value):"""去除value中的所有非字母內容,包括標點符號、空格、換行、下劃線等"""result = re.sub('[a-zA-Z0-9’!"#$%&\'()()。;,:“”()、?《》*+,-./:;<=>?@,。?★、…【】《》?“”‘’![\\]^_`{|}~\s]+', "", value)result = re.sub('[\001\002\003\004\005\006\007\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a]+','', result)return resultdef seg_depart(sentence):sentence_depart = jieba.cut(sentence)#stopwords = stopwordslist('../input/stopWords.txt')outstr = ''for word in sentence_depart:if word not in stopwords:if word in combine_dict: #同義詞替換word = combine_dict[word]outstr += replace_all_blank(word)outstr += " "return outstr#匯總成完整的文本cut_text=''for line in fRead:cut_text = cut_text + seg_depart(line)fWrite.write(cut_text)fRead.close()fWrite.close()wordcloud = WordCloud(#設置字體,不然會出現口字亂碼,文字的路徑是電腦的字體一般路徑,可以換成別的font_path="C:/Windows/Fonts/彩虹粗仿宋.TTF",background_color="white",width=2000,height=1760,max_words=2000).generate(cut_text)plt.imshow(wordcloud, interpolation="bilinear")plt.axis("off")##plt.show()wordcloud.to_file(outPic)if __name__ == '__main__':###cleanChinese()jieba.load_userdict('../input/nlp/userDic.txt')synonymwordslist(r'..\input\nlp\synonymWord.txt')stopwords = stopwordslist(r'../input/nlp/stopWords.txt')wordClould(r'D:\bidingDemo.txt',r'D:\splitSingle.txt',r'D:\bidingDemo.png')

需要文件以及結果截圖見下:

總結

以上是生活随笔為你收集整理的词云_jieba分词的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。