日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

libsvm 数据预处理 模块化程序

發布時間:2025/3/20 编程问答 15 豆豆
生活随笔 收集整理的這篇文章主要介紹了 libsvm 数据预处理 模块化程序 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
實驗框架圖見libsvm文本分類:二分類(二) 實驗框架圖 下面是主模塊代碼,暫不公布全部代碼 代碼 #?-*-?coding:?cp936?-*-
#
coding?gb2312
from?SVM?import?FoldersCreation
import?os
##############################################################################################
#
參數設計
N=100?#N:?half?of?total?corpus?size
vfold=5?#vfold:?循環驗證的次數?
featureDimension=2000?#featureDimension:VSM模型特征維度
toCalInfoGain=0#是否計算詞袋子模型中的詞集合的信息增益=1則不計算
count_done_research_times=0?#已經進行了幾次試驗
#
?N,count_done_research?為CorpusPartition.moveAccordingPartition的參數
#
featureDimension,toCalInfoGain?2*N/vfold?為FeatureSelectionModel.featureSelectionIG
#
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

##############創建文件夾########################################################################
os.mkdir(r'D:\TextCategorization')
FoldersCreation.CreateAssist()
print?'創建文件夾模塊運行結束'
print?'***************************************************************************'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

################處理文檔集合,對文檔集合進行劃分,區分測試集合和訓練集合###############################
from?SVM?import?CorpusPartition
CorpusPartition.MoveCorpus(N)

CorpusPartition.moveAccordingPartition(N,count_done_research_times)
print?'分割文本集模塊運行結束'
print?'*******************************************************************'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#


#########################文檔集合分詞##########################################################
from?SVM?import?DataManager
from?ctypes?import?*
import?os
import?cPickle?as?p
import?re
roots
=[r'D:\TextCategorization\training',r'D:\TextCategorization\testing']
rootfinals
=[r'D:\TextCategorization\segmented',r'D:\TextCategorization\tsegmented']
#root=r'D:\TextCategorization\training'
#
rootfinal=r'D:\TextCategorization\segmented'


for?i?in?range(0,2):
????dm
=DataManager.DataManager(roots[i])
????subdir
=dm.GetSubDir()
????filepathstotalsrc
=[]
????
for?sub??in?subdir:
????????dm.SetFilePathsFromsubDir(roots[i]
+os.sep+sub)
????????filepaths
=dm.GetFilePaths()
????????filepathsassist
=[sub+os.sep+path?for?path?in?filepaths?]
????????filepathstotalsrc
=filepathstotalsrc+filepathsassist??
????
for?path?in?filepathstotalsrc:
????????myfile
=file(roots[i]+os.sep+path)
????????s
=myfile.read()
????????myfile.close()
????????dll
=cdll.LoadLibrary("ICTCLAS30.dll")????
????????dll.ICTCLAS_Init(c_char_p(
"."))??
????????bSuccess?
=?dll.ICTCLAS_ParagraphProcess(c_char_p(s),0)
????????segmented
=c_char_p(bSuccess).value
????????segmentedtmp
=re.sub("\s+",'|',segmented,0)
????????segmentedfinal
=re.sub('\xa1\xa1','',segmentedtmp)
????????fid
=file(rootfinals[i]+os.sep+path,'w')
????????fid.write(segmentedfinal)
????????fid.close()
????????dll.ICTCLAS_Exit()
????????
#print?'finalfinish?congratulations!'?????
print?'文檔集分詞模塊運行結束'
print?'**********************************************************************'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

##################建立詞袋子模型######################################################################
from?SVM?import?BagOfWordsConstruction
BagOfWordsConstruction.BagOfWordsConstruction(r
'D:\TextCategorization\segmented')
print?'建立詞袋子模型模塊運行結束'
print?'***********************************************************************************'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

#######################特征詞選擇##################################################################
from?SVM?import?FeatureSelectionModel
featurewords
=FeatureSelectionModel.featureSelectionIG(featureDimension,toCalInfoGain,2*N/vfold)#feature
import?cPickle?as?mypickle
fid
=file(r'D:\TextCategorization\VITData\keywords.dat','w')
mypickle.dump(featurewords,fid)
fid.close()
print?'特征詞選擇模塊運行結束'
print?'*******************************************************************************************'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

#######################文檔向量模型建立模塊##############################################################
from?SVM?import?VSMformation
root1
=r'D:\TextCategorization\segmented'
root2
=r'D:\TextCategorization\tsegmented'
print?'begin.....'
VSMformation.LibSVMFormat(r
'D:\TextCategorization\data\train.libsvm',root1)
print?'訓練語料庫轉化完畢'
VSMformation.LibSVMFormat(r
'D:\TextCategorization\data\test.libsvm',root2)
print?'測試語料庫轉化完畢'
print?'文檔向量模型建立模塊運行結束'
print?'批處理完畢,congratulations!'

?

總結

以上是生活随笔為你收集整理的libsvm 数据预处理 模块化程序的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。