日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程语言 > python >内容正文

python

python高频词_python几万条微博高频词分析

發(fā)布時間:2025/3/12 python 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python高频词_python几万条微博高频词分析 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

python幾萬條微博高頻詞分析

看到別人有做影視熱評的分析統(tǒng)計,覺得挺好玩的,就來試試

看看效果

Screenshot_2018-05-21-11-00-42-879_com.master.wei.png

思路

抓取想要的微博數(shù)據(jù)寫入數(shù)據(jù)庫

分詞統(tǒng)計出詞匯出現(xiàn)次數(shù)

過濾無意義的干擾詞

存入數(shù)據(jù)庫

寫接口,然后Android端展示

代碼

數(shù)據(jù)庫連接 masterWeiBo.Utils.Sql

import pymysql

import pymysql.cursors

import threading

class Mydb(object):

tableName='master'

def __init__(self):

self.lock=threading.Lock()

self.client = pymysql.connect(host='localhost',charset='utf8', port=3306, user='root', passwd='ck123', db='weibo', cursorclass=pymysql.cursors.DictCursor)

self.client.autocommit(True)

self.cursor = self.client.cursor()

開始

import jieba

from masterWeiBo.Utils.Sql import Mydb as db

# 創(chuàng)建停用詞list

def stopwordslist(filepath):

stopwords = [line.strip() for line in open(filepath, 'r', encoding='utf-8').readlines()]

return stopwords

cursor = db().cursor

#如果不存在詞表就創(chuàng)建

cursor.execute("""CREATE TABLE IF NOT EXISTS `weibo`.`masterWeiBo_category` (

`id` INT NOT NULL AUTO_INCREMENT,

`count` INT NOT NULL DEFAULT 0,

`category` VARCHAR(100) NOT NULL,

`wordsTop10` VARCHAR(1000) NULL,

PRIMARY KEY (`id`));""")

#清空詞表

cursor.execute("DELETE FROM weibo.masterWeiBo_category")

#獲取分類分詞

cursor.execute("SELECT count(id) as countd, come FROM weibo.masterWeiBo_master GROUP BY come")

results = cursor.fetchall()

print(results)

dicts=[]

#加載過濾詞匯

stopwords = stopwordslist("/root/PYServer/myFirstPYServer/words.txt")

for result in results:

each={}

each['count']=result['countd']

each['come']=result['come']

print(result['countd'])

print(result['come'])

cursor.execute("SELECT content from weibo.masterWeiBo_master where come= '"+result['come']+"'")

contents = cursor.fetchall()

articals=''

#把指定分類的內(nèi)容拼接起來

for artical in contents:

articals+=","+artical['content']

#結(jié)巴分詞

cuts = jieba.cut(articals)

words={}

#統(tǒng)計詞頻

for cut in cuts:

if(cut in words):

words[cut]=words[cut]+1

else:

words[cut]=1

#按詞頻倒序排列

sortedWords = sorted(words.items(), key=lambda d: d[1], reverse=True)

wordsTop10=''

i=0

#獲取top10詞匯

for key ,value in sortedWords:

#過濾無效詞匯

if(key in stopwords or key.__len__()<2):

continue

wordsTop10+=key+","+str(value)+";"

i+=1

if(i==10):

wordsTop10=wordsTop10[:wordsTop10.__len__()-1]

break

each['wordsTop10']=wordsTop10

dicts.append(each)

#寫入數(shù)據(jù)庫

for value in dicts:

sql = "INSERT INTO weibo.masterWeiBo_category (count,category,wordsTop10) values( '" + str(

value['count']) + "','" + value['come'] + "','" + value['wordsTop10'] + "')"

print(sql)

cursor.execute(sql)

cursor.close()

print(dicts)

大功告成

總結(jié)

以上是生活随笔為你收集整理的python高频词_python几万条微博高频词分析的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。