日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

ptyhon中文本挖掘精简版

發布時間:2023/12/1 编程问答 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 ptyhon中文本挖掘精简版 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
import xlrd import jieba import sys import importlib import os #python內置的包,用于進行文件目錄操作,我們將會用到os.listdir函數 import pickle #導入cPickle包并且取一個別名pickle #持久化類 import random import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from pylab import mpl from sklearn.naive_bayes import MultinomialNB # 導入多項式貝葉斯算法包 from sklearn import svmfrom sklearn import metrics from sklearn.datasets.base import Bunch from sklearn.feature_extraction.text import TfidfVectorizer importlib.reload(sys)#把內容和類別轉化成一個向量的形式 trainContentdatasave=[] #存儲所有訓練和測試數據的分詞 testContentdatasave=[]trainContentdata = [] testContentdata = [] trainlabeldata = [] testlabeldata = []#導入文本描述的訓練和測試數據 def importTrainContentdata():file = '20180716_train.xls'wb = xlrd.open_workbook(file)ws = wb.sheet_by_name("Sheet1")for r in range(ws.nrows):trainContentdata.append(ws.cell(r, 0).value)def importTestContentdata():file = '20180716_test.xls'wb = xlrd.open_workbook(file)ws = wb.sheet_by_name("Sheet1")for r in range(ws.nrows):testContentdata.append(ws.cell(r, 0).value) #導入類別的訓練和測試數據 def importTrainlabeldata():file = '20180716_train_label.xls'wb = xlrd.open_workbook(file)ws = wb.sheet_by_name("Sheet1")for r in range(ws.nrows):trainlabeldata.append(ws.cell(r, 0).value)def importTestlabeldata():file = '20180716_test_label.xls'wb = xlrd.open_workbook(file)ws = wb.sheet_by_name("Sheet1")for r in range(ws.nrows):testlabeldata.append(ws.cell(r, 0).value)if __name__=="__main__": importTrainContentdata()importTestContentdata()importTrainlabeldata()importTestlabeldata()'''貝葉斯clf = MultinomialNB(alpha=0.052).fit(train_set.tdm, train_set.label) #clf = svm.SVC(C=0.7, kernel='poly', gamma=10, decision_function_shape='ovr')clf.fit(train_set.tdm, train_set.label) predicted=clf.predict(test_set.tdm)邏輯回歸tv = TfidfVectorizer()train_data = tv.fit_transform(X_train)test_data = tv.transform(X_test)lr = LogisticRegression(C=3)lr.fit(train_set.tdm, train_set.label)predicted=lr.predict(test_set.tdm)print(lr.score(test_set.tdm, test_set.label))#print(test_set.tdm)#SVMclf = SVC(C=1500)clf.fit(train_set.tdm, train_set.label)predicted=clf.predict(test_set.tdm)print(clf.score(test_set.tdm, test_set.label))'''tv = TfidfVectorizer()train_data = tv.fit_transform(trainContentdata)test_data = tv.transform(testContentdata)clf = SVC(C=1500)clf.fit(train_data, trainlabeldata)print(clf.score(test_data, testlabeldata))a=[]b=[]for i in range(len(predicted)):b.append((int)(float(predicted[i])))a.append(int(test_set.label[i][0]))'''f=open('F:/goverment/ArticleMining/predict.txt', 'w')for i in range(len(predicted)):f.write(str(b[i]))f.write('\n')f.write("寫好了")f.close()#for i in range(len(predicted)):#print(b[i])'''#metrics_result(a, b)

?

轉載于:https://www.cnblogs.com/caiyishuai/p/9354035.html

總結

以上是生活随笔為你收集整理的ptyhon中文本挖掘精简版的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。