當前位置：首頁 >

NLP之BoWNLTK：自然语言处理中常用的技术——词袋法Bow、NLTK库

發布時間：2025/3/21 44 豆豆

生活随笔收集整理的這篇文章主要介紹了 NLP之BoWNLTK：自然语言处理中常用的技术——词袋法Bow、NLTK库小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

NLP之BoW&NLTK：自然語言處理中常用的技術——詞袋法Bow、NLTK庫

輸出結果

實現代碼

輸出結果

[[0 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0][1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1]]BoW：輸出句子中的每個單詞(包括符號)—按照順序： ['by', 'career', 'combined', 'congress', 'for', 'government', 'huawei', 'imposed', 'in', 'james', 'jordan', 'lebron', 'michael', 'passed', 'playoffs', 'points', 'regular', 'restrictions', 'sales', 'season', 'sues', 'the', 'today', 'unconstitutional', 'us']NLTK：輸出句子中的每個單詞(包括符號)： ['Today', ',', 'LeBron', 'James', 'passed', 'Michael', 'Jordan', 'in', 'career', 'points', 'for', 'regular', 'season', ',', 'playoffs', 'combined', '.'] NLTK：輸出句子中的每個單詞(包括符號)： ['Today', ',', 'Huawei', 'Sues', 'the', 'US', 'Government', 'for', 'Unconstitutional', 'Sales', 'Restrictions', 'Imposed', 'by', 'Congress', '.']NLTK：輸出句子中的每個單詞(包括符號)—按照順序： [',', '.', 'James', 'Jordan', 'LeBron', 'Michael', 'Today', 'career', 'combined', 'for', 'in', 'passed', 'playoffs', 'points', 'regular', 'season'] NLTK：輸出句子中的每個單詞(包括符號)—按照順序： [',', '.', 'Congress', 'Government', 'Huawei', 'Imposed', 'Restrictions', 'Sales', 'Sues', 'Today', 'US', 'Unconstitutional', 'by', 'for', 'the']['today', ',', 'lebron', 'jame', 'pass', 'michael', 'jordan', 'in', 'career', 'point', 'for', 'regular', 'season', ',', 'playoff', 'combin', '.'] ['today', ',', 'huawei', 'sue', 'the', 'US', 'govern', 'for', 'unconstitut', 'sale', 'restrict', 'impos', 'by', 'congress', '.']NLTK：輸出句子中的每個單詞(包括符號)—及其對應詞性： [('Today', 'NN'), (',', ','), ('LeBron', 'NNP'), ('James', 'NNP'), ('passed', 'VBD'), ('Michael', 'NNP'), ('Jordan', 'NNP'), ('in', 'IN'), ('career', 'NN'), ('points', 'NNS'), ('for', 'IN'), ('regular', 'JJ'), ('season', 'NN'), (',', ','), ('playoffs', 'NNS'), ('combined', 'VBD'), ('.', '.')] NLTK：輸出句子中的每個單詞(包括符號)—及其對應詞性： [('Today', 'NN'), (',', ','), ('Huawei', 'NNP'), ('Sues', 'NNP'), ('the', 'DT'), ('US', 'NNP'), ('Government', 'NNP'), ('for', 'IN'), ('Unconstitutional', 'NNP'), ('Sales', 'NNS'), ('Restrictions', 'NNS'), ('Imposed', 'VBN'), ('by', 'IN'), ('Congress', 'NNP'), ('.', '.')]

實現代碼

測試的句子：來自今天的新聞
sent1 = 'Today, LeBron James passed Michael Jordan in career points for regular season, playoffs combined.'
sent2 = 'Today, Huawei Sues the US Government for Unconstitutional Sales Restrictions Imposed by Congress.'
sent1='今天，勒布朗·詹姆斯在常規賽和季后賽的總得分中超過了邁克爾·喬丹。
sent2='今天，華為起訴美國政府違反國會規定的銷售限制。'

#1、使用詞袋法( Bag-of-Words)對示例文本進行特征向量化from sklearn.feature_extraction.text import CountVectorizersent1 = 'Today, LeBron James passed Michael Jordan in career points for regular season, playoffs combined.' sent2 = 'Today, Huawei Sues the US Government for Unconstitutional Sales Restrictions Imposed by Congress.' count_vec = CountVectorizer() sentences = [sent1, sent2]print(count_vec.fit_transform(sentences).toarray()) print('BoW：輸出句子中的每個單詞(包括符號)—按照順序：',count_vec.get_feature_names())#2、使用NLTK對這兩句里面所有詞匯的形成與性質類屬乃至詞匯如何組成短語或者句子的規則，做了更加細致地分析。 import nltktokens_1 = nltk.word_tokenize(sent1) tokens_2 = nltk.word_tokenize(sent2) print('NLTK：輸出句子中的每個單詞(包括符號)：',tokens_1) print('NLTK：輸出句子中的每個單詞(包括符號)：',tokens_2)vocab_1 = sorted(set(tokens_1)) vocab_2 = sorted(set(tokens_2)) print('NLTK：輸出句子中的每個單詞(包括符號)—按照順序：',vocab_1) print('NLTK：輸出句子中的每個單詞(包括符號)—按照順序：',vocab_2)stemmer = nltk.stem.PorterStemmer() stem_1 = [stemmer.stem(t) for t in tokens_1] stem_2 = [stemmer.stem(t) for t in tokens_2] print(stem_1) print(stem_2)pos_tag_1 = nltk.tag.pos_tag(tokens_1) pos_tag_2 = nltk.tag.pos_tag(tokens_2) print('NLTK：輸出句子中的每個單詞(包括符號)—及其對應詞性：',pos_tag_1) print('NLTK：輸出句子中的每個單詞(包括符號)—及其對應詞性：',pos_tag_2)

總結

以上是生活随笔為你收集整理的NLP之BoWNLTK：自然语言处理中常用的技术——词袋法Bow、NLTK库的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： ML之SVM：利用SVM算法(超参数组合
下一篇： NLP之词向量：利用word2vec对2

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

NLP之BoWNLTK：自然语言处理中常用的技术——词袋法Bow、NLTK库

輸出結果

實現代碼

總結