當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

libsvm 数据预处理模块化程序

發布時間：2025/3/20 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 libsvm 数据预处理模块化程序小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

實驗框架圖見libsvm文本分類:二分類（二）實驗框架圖下面是主模塊代碼，暫不公布全部代碼代碼 #?-*-?coding:?cp936?-*-
#coding?gb2312
from?SVM?import?FoldersCreation
import?os
##############################################################################################
#參數設計
N=100?#N:?half?of?total?corpus?size
vfold=5?#vfold:?循環驗證的次數?
featureDimension=2000?#featureDimension:VSM模型特征維度
toCalInfoGain=0#是否計算詞袋子模型中的詞集合的信息增益=1則不計算
count_done_research_times=0?#已經進行了幾次試驗
#?N，count_done_research?為CorpusPartition.moveAccordingPartition的參數
#featureDimension,toCalInfoGain?2*N/vfold?為FeatureSelectionModel.featureSelectionIG
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

##############創建文件夾########################################################################
os.mkdir(r'D:\TextCategorization')
FoldersCreation.CreateAssist()
print?'創建文件夾模塊運行結束'
print?'***************************************************************************'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

################處理文檔集合，對文檔集合進行劃分，區分測試集合和訓練集合###############################
from?SVM?import?CorpusPartition
CorpusPartition.MoveCorpus(N)

CorpusPartition.moveAccordingPartition(N,count_done_research_times)
print?'分割文本集模塊運行結束'
print?'*******************************************************************'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

#########################文檔集合分詞##########################################################
from?SVM?import?DataManager
from?ctypes?import?*
import?os
import?cPickle?as?p
import?re
roots=[r'D:\TextCategorization\training',r'D:\TextCategorization\testing']
rootfinals=[r'D:\TextCategorization\segmented',r'D:\TextCategorization\tsegmented']
#root=r'D:\TextCategorization\training'
#rootfinal=r'D:\TextCategorization\segmented'

for?i?in?range(0,2):
????dm=DataManager.DataManager(roots[i])
????subdir=dm.GetSubDir()
????filepathstotalsrc=[]
????for?sub??in?subdir:
????????dm.SetFilePathsFromsubDir(roots[i]+os.sep+sub)
????????filepaths=dm.GetFilePaths()
????????filepathsassist=[sub+os.sep+path?for?path?in?filepaths?]
????????filepathstotalsrc=filepathstotalsrc+filepathsassist??
????for?path?in?filepathstotalsrc:
????????myfile=file(roots[i]+os.sep+path)
????????s=myfile.read()
????????myfile.close()
????????dll=cdll.LoadLibrary("ICTCLAS30.dll")????
????????dll.ICTCLAS_Init(c_char_p("."))??
????????bSuccess?=?dll.ICTCLAS_ParagraphProcess(c_char_p(s),0)
????????segmented=c_char_p(bSuccess).value
????????segmentedtmp=re.sub("\s+",'|',segmented,0)
????????segmentedfinal=re.sub('\xa1\xa1','',segmentedtmp)
????????fid=file(rootfinals[i]+os.sep+path,'w')
????????fid.write(segmentedfinal)
????????fid.close()
????????dll.ICTCLAS_Exit()
????????#print?'finalfinish?congratulations!'?????
print?'文檔集分詞模塊運行結束'
print?'**********************************************************************'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

##################建立詞袋子模型######################################################################
from?SVM?import?BagOfWordsConstruction
BagOfWordsConstruction.BagOfWordsConstruction(r'D:\TextCategorization\segmented')
print?'建立詞袋子模型模塊運行結束'
print?'***********************************************************************************'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

#######################特征詞選擇##################################################################
from?SVM?import?FeatureSelectionModel
featurewords=FeatureSelectionModel.featureSelectionIG(featureDimension,toCalInfoGain,2*N/vfold)#feature
import?cPickle?as?mypickle
fid=file(r'D:\TextCategorization\VITData\keywords.dat','w')
mypickle.dump(featurewords,fid)
fid.close()
print?'特征詞選擇模塊運行結束'
print?'*******************************************************************************************'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

#######################文檔向量模型建立模塊##############################################################
from?SVM?import?VSMformation
root1=r'D:\TextCategorization\segmented'
root2=r'D:\TextCategorization\tsegmented'
print?'begin.....'
VSMformation.LibSVMFormat(r'D:\TextCategorization\data\train.libsvm',root1)
print?'訓練語料庫轉化完畢'
VSMformation.LibSVMFormat(r'D:\TextCategorization\data\test.libsvm',root2)
print?'測試語料庫轉化完畢'
print?'文檔向量模型建立模塊運行結束'
print?'批處理完畢，congratulations!'

總結

以上是生活随笔為你收集整理的libsvm 数据预处理模块化程序的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：恭喜我自己装上了Windows Ser
下一篇： ibatis3 一对一搞定

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

libsvm 数据预处理 模块化程序

總結

libsvm 数据预处理模块化程序