日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

Python LDA主题模型实战

發布時間:2025/4/16 python 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Python LDA主题模型实战 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
  • 導入相關的包
    https://github.com/lda-project/lda :lda包的文檔!

    采用LDA庫,pip install lda

import numpy as np import lda 12 X = lda.datasets.load_reuters() X.shape 12 (395, 4258) 1
  • 這里說明X是395行4258列的數據,說明有395個訓練樣本
vocab = lda.datasets.load_reuters_vocab() len(vocab)# 這里是所有的詞匯 12 4258 1
  • 這里說明一個有4258個不重復的詞語
1
  • 選取前十個訓練數據看一看
title = lda.datasets.load_reuters_titles() title[:10] 12 ('0 UK: Prince Charles spearheads British royal revolution. LONDON 1996-08-20','1 GERMANY: Historic Dresden church rising from WW2 ashes. DRESDEN, Germany 1996-08-21',"2 INDIA: Mother Teresa's condition said still unstable. CALCUTTA 1996-08-23",'3 UK: Palace warns British weekly over Charles pictures. LONDON 1996-08-25','4 INDIA: Mother Teresa, slightly stronger, blesses nuns. CALCUTTA 1996-08-25',"5 INDIA: Mother Teresa's condition unchanged, thousands pray. CALCUTTA 1996-08-25",'6 INDIA: Mother Teresa shows signs of strength, blesses nuns. CALCUTTA 1996-08-26',"7 INDIA: Mother Teresa's condition improves, many pray. CALCUTTA, India 1996-08-25",'8 INDIA: Mother Teresa improves, nuns pray for "miracle". CALCUTTA 1996-08-26','9 UK: Charles under fire over prospect of Queen Camilla. LONDON 1996-08-26') 12345678910 1
  • 開始訓練,這頂主題數目是20,迭代次數是1500次
model = lda.LDA(n_topics = 20, n_iter = 1500, random_state = 1) #初始化模型, n_iter 迭代次數 model.fit(X)

控制臺輸出:

INFO:lda:n_documents: 395 INFO:lda:vocab_size: 4258 INFO:lda:n_words: 84010 INFO:lda:n_topics: 20 INFO:lda:n_iter: 1500 INFO:lda:<0> log likelihood: -1051748 INFO:lda:<10> log likelihood: -719800 INFO:lda:<20> log likelihood: -699115 INFO:lda:<30> log likelihood: -689370 INFO:lda:<40> log likelihood: -684918 ... INFO:lda:<1450> log likelihood: -654884 INFO:lda:<1460> log likelihood: -655493 INFO:lda:<1470> log likelihood: -655415 INFO:lda:<1480> log likelihood: -655192 INFO:lda:<1490> log likelihood: -655728 INFO:lda:<1499> log likelihood: -655858<lda.lda.LDA at 0x7effa0508550> 1234567891011121314151617181920212223
  • 查看20個主題中的詞分布
topic_word = model.topic_word_ print(topic_word.shape) topic_word

查看輸出:

(20, 4258)array([[3.62505347e-06, 3.62505347e-06, 3.62505347e-06, ...,3.62505347e-06, 3.62505347e-06, 3.62505347e-06],[1.87498968e-02, 1.17916463e-06, 1.17916463e-06, ...,1.17916463e-06, 1.17916463e-06, 1.17916463e-06],[1.52206232e-03, 5.05668544e-06, 4.05040504e-03, ...,5.05668544e-06, 5.05668544e-06, 5.05668544e-06],...,[4.17266923e-02, 3.93610908e-06, 9.05698699e-03, ...,3.93610908e-06, 3.93610908e-06, 3.93610908e-06],[2.37609835e-06, 2.37609835e-06, 2.37609835e-06, ...,2.37609835e-06, 2.37609835e-06, 2.37609835e-06],[3.46310752e-06, 3.46310752e-06, 3.46310752e-06, ...,3.46310752e-06, 3.46310752e-06, 3.46310752e-06]])
  • 得到每個主題的前8個詞
for i, topic_dist in enumerate(topic_word):print(np.array(vocab)[np.argsort(topic_dist)][:-9:-1]) 12 ['british' 'churchill' 'sale' 'million' 'major' 'letters' 'west' 'britain'] ['church' 'government' 'political' 'country' 'state' 'people' 'party''against'] ['elvis' 'king' 'fans' 'presley' 'life' 'concert' 'young' 'death'] ['yeltsin' 'russian' 'russia' 'president' 'kremlin' 'moscow' 'michael''operation'] ['pope' 'vatican' 'paul' 'john' 'surgery' 'hospital' 'pontiff' 'rome'] ['family' 'funeral' 'police' 'miami' 'versace' 'cunanan' 'city' 'service'] ['simpson' 'former' 'years' 'court' 'president' 'wife' 'south' 'church'] ['order' 'mother' 'successor' 'election' 'nuns' 'church' 'nirmala' 'head'] ['charles' 'prince' 'diana' 'royal' 'king' 'queen' 'parker' 'bowles'] ['film' 'french' 'france' 'against' 'bardot' 'paris' 'poster' 'animal'] ['germany' 'german' 'war' 'nazi' 'letter' 'christian' 'book' 'jews'] ['east' 'peace' 'prize' 'award' 'timor' 'quebec' 'belo' 'leader'] ["n't" 'life' 'show' 'told' 'very' 'love' 'television' 'father'] ['years' 'year' 'time' 'last' 'church' 'world' 'people' 'say'] ['mother' 'teresa' 'heart' 'calcutta' 'charity' 'nun' 'hospital''missionaries'] ['city' 'salonika' 'capital' 'buddhist' 'cultural' 'vietnam' 'byzantine''show'] ['music' 'tour' 'opera' 'singer' 'israel' 'people' 'film' 'israeli'] ['church' 'catholic' 'bernardin' 'cardinal' 'bishop' 'wright' 'death''cancer'] ['harriman' 'clinton' 'u.s' 'ambassador' 'paris' 'president' 'churchill''france'] ['city' 'museum' 'art' 'exhibition' 'century' 'million' 'churches' 'set'] 1234567891011121314151617181920212223242526 - 得到每句話在每個主題的分布,并得到每句話的最大主題 1 doc_topic = model.doc_topic_ print(doc_topic.shape) # 主題分布式395行,20列的矩陣,其中每一行對應一個訓練樣本在20個主題上的分布 print("第一個樣本的主題分布是",doc_topic[0]) # 打印一下第一個樣本的主題分布 print("第一個樣本的最終主題是",doc_topic[0].argmax()) 1234 (395, 20) 第一個樣本的主題分布是 [4.34782609e-04 3.52173913e-02 4.34782609e-04 9.13043478e-034.78260870e-03 4.34782609e-04 9.13043478e-03 3.08695652e-025.04782609e-01 4.78260870e-03 4.34782609e-04 4.34782609e-043.08695652e-02 2.17826087e-01 4.34782609e-04 4.34782609e-044.34782609e-04 3.95652174e-02 4.34782609e-04 1.09130435e-01] 第一個樣本的最終主題是 8

轉載至:https://blog.csdn.net/jiangzhenkang/article/details/84335646

總結

以上是生活随笔為你收集整理的Python LDA主题模型实战的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。