日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

python最大分词_python正向最大匹配分词和逆向最大匹配分词的实例

發布時間:2025/3/20 python 18 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python最大分词_python正向最大匹配分词和逆向最大匹配分词的实例 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

正向最大匹配

# -*- coding:utf-8 -*-

CODEC='utf-8'

def u(s, encoding):

'converted other encoding to unicode encoding'

if isinstance(s, unicode):

return s

else:

return unicode(s, encoding)

def fwd_mm_seg(wordDict, maxLen, str):

'forward max match segment'

wordList = []

segStr = str

segStrLen = len(segStr)

for word in wordDict:

print 'word: ', word

print "\n"

while segStrLen > 0:

if segStrLen > maxLen:

wordLen = maxLen

else:

wordLen = segStrLen

subStr = segStr[0:wordLen]

print "subStr: ", subStr

while wordLen > 1:

if subStr in wordDict:

print "subStr1: %r" % subStr

break

else:

print "subStr2: %r" % subStr

wordLen = wordLen - 1

subStr = subStr[0:wordLen]

# print "subStr3: ", subStr

wordList.append(subStr)

segStr = segStr[wordLen:]

segStrLen = segStrLen - wordLen

for wordstr in wordList:

print "wordstr: ", wordstr

return wordList

def main():

fp_dict = open('words.dic')

wordDict = {}

for eachWord in fp_dict:

wordDict[u(eachWord.strip(), 'utf-8')] = 1

segStr = u'你好世界hello world'

print segStr

wordList = fwd_mm_seg(wordDict, 10, segStr)

print "==".join(wordList)

if __name__ == '__main__':

main()

逆向最大匹配

# -*- coding:utf-8 -*-

def u(s, encoding):

'converted other encoding to unicode encoding'

if isinstance(s, unicode):

return s

else:

return unicode(s, encoding)

CODEC='utf-8'

def bwd_mm_seg(wordDict, maxLen, str):

'forward max match segment'

wordList = []

segStr = str

segStrLen = len(segStr)

for word in wordDict:

print 'word: ', word

print "\n"

while segStrLen > 0:

if segStrLen > maxLen:

wordLen = maxLen

else:

wordLen = segStrLen

subStr = segStr[-wordLen:None]

print "subStr: ", subStr

while wordLen > 1:

if subStr in wordDict:

print "subStr1: %r" % subStr

break

else:

print "subStr2: %r" % subStr

wordLen = wordLen - 1

subStr = subStr[-wordLen:None]

# print "subStr3: ", subStr

wordList.append(subStr)

segStr = segStr[0: -wordLen]

segStrLen = segStrLen - wordLen

wordList.reverse()

for wordstr in wordList:

print "wordstr: ", wordstr

return wordList

def main():

fp_dict = open('words.dic')

wordDict = {}

for eachWord in fp_dict:

wordDict[u(eachWord.strip(), 'utf-8')] = 1

segStr = ur'你好世界hello world'

print segStr

wordList = bwd_mm_seg(wordDict, 10, segStr)

print "==".join(wordList)

if __name__ == '__main__':

main()

以上這篇python正向最大匹配分詞和逆向最大匹配分詞的實例就是小編分享給大家的全部內容了,希望能給大家一個參考,也希望大家多多支持腳本之家。

總結

以上是生活随笔為你收集整理的python最大分词_python正向最大匹配分词和逆向最大匹配分词的实例的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。