當(dāng)前位置：首頁(yè) >

结构主题模型（一）stm包工作流

發(fā)布時(shí)間：2025/4/5 59 豆豆

生活随笔收集整理的這篇文章主要介紹了结构主题模型（一）stm包工作流小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

前言

對(duì)論文(stm: An R Package for Structural Topic Models)中stm代碼的工作流進(jìn)行梳理，總體結(jié)構(gòu)參考論文原文，但對(duì)部分代碼執(zhí)行的順序提出個(gè)人想法。因時(shí)間有限，存在未能解決的問(wèn)題（如選擇合適的主題數(shù)……），論文后面的部分內(nèi)容仍未詳細(xì)敘述，后續(xù)有時(shí)間將會(huì)補(bǔ)充。若有朋友能提出有效的修改建議和解決方案，博主將在第一時(shí)間做出反饋。最后，希望對(duì)使用STM結(jié)構(gòu)主題模型的朋友們有幫助😁

論文復(fù)現(xiàn)過(guò)程中的相關(guān)問(wèn)題匯總
結(jié)構(gòu)主題模型（二）復(fù)現(xiàn)

論文原文、數(shù)據(jù)及代碼
stm: An R Package for Structural Topic Model

stm庫(kù)官方文檔

3.0 讀取數(shù)據(jù)

樣例數(shù)據(jù)poliblogs2008.csv為一個(gè)關(guān)于美國(guó)政治的博文集，來(lái)自CMU2008年政治博客語(yǔ)料庫(kù)：American Thinker, Digby, Hot Air, Michelle Malkin, Think Progress, and Talking Points Memo。每個(gè)博客論壇都有自己的政治傾向，所以每篇博客都有寫(xiě)作日期和政治意識(shí)形態(tài)的元數(shù)據(jù)。

建議讀取xlsx，因?yàn)閏sv文件以逗號(hào)作為分隔符，有時(shí)會(huì)出現(xiàn)問(wèn)題。pandas：csv-excel文件相互轉(zhuǎn)換

# data <- read.csv("./poliblogs2008.csv", sep =",", quote = "", header = TRUE, fileEncoding = "UTF-8") data <- read_excel(path = "./poliblogs2008.xlsx", sheet = "Sheet1", col_names = TRUE)

若數(shù)據(jù)為中文，可參考以下文章對(duì)中文文本進(jìn)行分詞等預(yù)處理操作后，再進(jìn)行后續(xù)步驟

讀取word文件中的文本信息

中文文本預(yù)處理

以3.0為開(kāi)始序號(hào)是為了和論文原文保持一致

3.1 Ingest: Reading and processing text data

提取數(shù)據(jù)：將原始數(shù)據(jù)處理成STM可以分析的三塊內(nèi)容（分別是documents，vocab ，meta），用到的是textProcessor或readCorpus這兩個(gè)函數(shù)。

textProcessor()函數(shù)旨在提供一種方便快捷的方式來(lái)處理相對(duì)較小的文本，以便使用軟件包進(jìn)行分析。它旨在以簡(jiǎn)單的形式快速攝取數(shù)據(jù)，例如電子表格，其中每個(gè)文檔都位于單個(gè)單元格中。

# 調(diào)用textProcessor算法，將 data$document、data 作為參數(shù) processed <- textProcessor(documents = data$documents, metadata = data, wordLengths = c(1, Inf))

textProcessor()函數(shù)中的參數(shù)wordLengths = c(3, Inf)表示：短于最小字長(zhǎng)（默認(rèn)為3字符）或長(zhǎng)于最大字長(zhǎng)（默認(rèn)為inf）的字?jǐn)?shù)將被丟棄，[用戶(hù)@qq_39172034]建議設(shè)置該參數(shù)為wordLengths = c(1, Inf)，以避免避免單個(gè)漢字被刪除

論文中提到，textProcessor()可以處理多種語(yǔ)言，需設(shè)置變量language = "en", customstopwords = NULL,。截至0.5支持的版本“丹麥語(yǔ)、荷蘭語(yǔ)、英語(yǔ)、芬蘭語(yǔ)、法語(yǔ)、德語(yǔ)、匈牙利語(yǔ)、意大利語(yǔ)、挪威語(yǔ)、葡萄牙語(yǔ)、羅馬尼亞語(yǔ)、俄語(yǔ)、瑞典語(yǔ)、土耳其語(yǔ)”，不支持中文
詳見(jiàn)：textProcessor function - RDocumentation

3.2 Prepare: Associating text with metadata

數(shù)據(jù)預(yù)處理：轉(zhuǎn)換數(shù)據(jù)格式，根據(jù)閾值刪除低頻單詞等，用到的是prepDocuments()和plotRemoved()兩個(gè)函數(shù)

plotRemoved()函數(shù)可繪制不同閾值下刪除的document、words、token數(shù)量

pdf("output/stm-plot-removed.pdf") plotRemoved(processed$documents, lower.thresh = seq(1, 200, by = 100)) dev.off()

根據(jù)此pdf文件的結(jié)果（output/stm-plot-removed.pdf），確定prepDocuments()中的參數(shù)lower.thresh的取值，以此確定變量docs、vocab、meta

論文中提到如果在處理過(guò)程中發(fā)生任何更改，PrepDocuments還將重新索引所有元數(shù)據(jù)/文檔關(guān)系。例如，當(dāng)文檔因?yàn)楹械皖l單詞而在預(yù)處理階段被完全刪除，那么PrepDocuments()也將刪除元數(shù)據(jù)中的相應(yīng)行。因此在讀入和處理文本數(shù)據(jù)后，檢查文檔的特征和相關(guān)詞匯表以確保它們已被正確預(yù)處理是很重要的。

# 去除詞頻低于15的詞匯 out <- prepDocuments(documents = processed$documents, vocab = processed$vocab, meta = processed$meta, lower.thresh = 15)docs <- out$documents vocab <- out$vocab meta <- out$meta

docs：documents。包含單詞索引及其相關(guān)計(jì)數(shù)的文檔列表
vocab：a vocab character vector。包含與單詞索引關(guān)聯(lián)的單詞
meta：a metadata matrix。包含文檔協(xié)變量

以下表示兩篇短文章documents：第一篇文章包含5個(gè)單詞，每個(gè)單詞出現(xiàn)在vocab vector的第21、23、87、98、112位置上，除了第一個(gè)單詞出現(xiàn)兩次，其余單詞都僅出現(xiàn)一次。第二篇文章包含3個(gè)單詞，解釋同上。

[[1]]

	[,1]	[,2]	[,3]	[,4]	[,5]
[1,]	21	23	87	98	112
[2,]	2	1	1	1	1

[[2]]	[,1]	[,2]	[,3]
[1,]	16	61	90
[2,]	1	1	1

3.3 Estimate: Estimating the structural topic model

STM的關(guān)鍵創(chuàng)新是它將元數(shù)據(jù)合并到主題建?？蚣苤?/strong>。在STM中，元數(shù)據(jù)可以通過(guò)兩種方式輸入到主題模型中：**主題流行度（topical prevalence）**和主題內(nèi)容（topical content）。主題流行度中的元數(shù)據(jù)協(xié)變量允許觀察到的元數(shù)據(jù)影響被討論主題的頻率。主題內(nèi)容中的協(xié)變量允許觀察到的元數(shù)據(jù)影響給定主題內(nèi)的詞率使用——即如何討論特定主題。對(duì)主題流行率和主題內(nèi)容的估計(jì)是通過(guò)stm()函數(shù)進(jìn)行的。

主題流行度（topical prevalence）表示每個(gè)主題對(duì)某篇文檔的貢獻(xiàn)程度，因?yàn)椴煌奈臋n來(lái)自不同的地方，所以自然地希望主題流行度能隨著元數(shù)據(jù)的變化而變化。

具體而言，論文將變量rating（意識(shí)形態(tài)，Liberal，Conservative）作為主題流行度的協(xié)變量，除了意識(shí)形態(tài)，還可以通過(guò)+號(hào)增加其他協(xié)變量，如增加原始數(shù)據(jù)中的day”變量（表示發(fā)帖日期）

s(day)中的s()為spline function，a fairly flexible b-spline basis

day這個(gè)變量是從2008年的第一天到最后一天，就像panel data一樣，如果帶入時(shí)序設(shè)置為天（365個(gè)penal），則會(huì)損失300多個(gè)自由度，所以引入spline function解決自由度損失的問(wèn)題。

The stm package also includes a convenience functions(), which selects a fairly flexible b-spline basis. In the current example we allow for the variabledayto be estimated with a spline.
poliblogPrevFit <- stm(documents = out$documents, vocab = out$vocab, K = 20, prevalence = ~rating + s(day), max.em.its = 75, data = out$meta, init.type = "Spectral")
R中主題流行度協(xié)變量prevalence能表示為含有多個(gè)斜變量和階乘或連續(xù)協(xié)變量的公式，在spline包中還有其他的標(biāo)準(zhǔn)轉(zhuǎn)換函數(shù)：log()、ns()、bs()

隨著迭代的進(jìn)行，如果bound變化足夠小，則認(rèn)為模型收斂converge了。

3.4 Evaluate: Model selection and search

Model initialization for a fixed number of topics 為指定數(shù)量的主題數(shù)創(chuàng)建初始化模型

因?yàn)榛旌现黝}模型的后驗(yàn)往往非凸和難以解決，模型的確定取決于參數(shù)的起始值（例如，特定主題的單詞分布）。兩種實(shí)現(xiàn)模型初始化的方式：

spectral initialization。init.type="Spectral"。優(yōu)先選取此方式
a collapsed Gibbs sampler for LDA
poliblogPrevFit <- stm(documents = out$documents, vocab = out$vocab, K = 20, prevalence = ~rating + s(day), max.em.its = 75, data = out$meta, init.type = "Spectral")
Model selection for a fixed number of topics 為指定數(shù)量的主題數(shù)選擇模型
poliblogSelect <- selectModel(out$documents, out$vocab, K = 20, prevalence = ~rating + s(day), max.em.its = 75, data = out$meta, runs = 20, seed = 8458159)
selectModel()首先建立一個(gè)運(yùn)行模型的網(wǎng)絡(luò)（net），并依次將所有模型運(yùn)行（小于10次）E step和M step，拋棄低likelihood的模型，接著僅運(yùn)行高likelihood的前20%的模型，直到收斂（convergence）或達(dá)到最大迭代次數(shù)（max.em.its）

通過(guò)plotModels()函數(shù)顯示的語(yǔ)義一致性（semantic coherence）和排他性（exclusivity）選擇合適的模型，semcoh和exclu越大則模型越好
# 繪制圖形平均得分每種模型采用不同的圖例 plotModels(poliblogSelect, pch=c(1,2,3,4), legend.position="bottomright") # 選擇模型3 selectedmodel <- poliblogSelect$runout[[3]]

Model search across numbers of topics 確定合適的主題數(shù)
storage <- searchK(out$documents, out$vocab, K = c(7, 10), prevalence = ~rating + s(day), data = meta)# 借助圖表可視化的方式直觀選擇主題數(shù) pdf("stm-plot-ntopics.pdf") plot(storage) dev.off()# 借助實(shí)際數(shù)據(jù)選擇主題數(shù) t <- storage$out[[1]] t <- storage$out[[2]]
對(duì)比兩種或多個(gè)主題數(shù)，通過(guò)對(duì)比語(yǔ)義連貫性SemCoh和排他性Exl確定合適的主題數(shù)

3.5 Understand: Interpreting the STM by plotting and inspecting results

選擇好模型后，就是通過(guò)stm包中提供的一些函數(shù)來(lái)展示模型的結(jié)果。為與原論文保持一致，使用初始模型poliblogPrevFit作為參數(shù)，而非SelectModel

每個(gè)主題下的高頻單詞排序：labelTopics()、sageLabels()

兩個(gè)函數(shù)都將與每個(gè)主題相關(guān)的單詞輸出，其中sageLabels()僅對(duì)于包含內(nèi)容協(xié)變量的模型使用。此外，sageLabels()函數(shù)結(jié)果比labelTopics()更詳細(xì)，而且默認(rèn)輸出所有主題下的高頻詞等信息
# labelTopics() Label topics by listing top words for selected topics 1 to 5. labelTopicsSel <- labelTopics(poliblogPrevFit, c(1:5)) sink("output/labelTopics-selected.txt", append=FALSE, split=TRUE) print(labelTopicsSel) sink()# sageLabels() 比 labelTopics() 輸出更詳細(xì) sink("stm-list-sagelabel.txt", append=FALSE, split=TRUE) print(sageLabels(poliblogPrevFit)) sink()
TODO：兩個(gè)函數(shù)輸出結(jié)果存在差異

列出與某個(gè)主題高度相關(guān)的文檔：findthoughts()
shortdoc <- substr(out$meta$documents, 1, 200) # 參數(shù) 'texts=shortdoc' 表示輸出每篇文檔前200個(gè)字符，n表示輸出相關(guān)文檔的篇數(shù) thoughts1 <- findThoughts(poliblogPrevFit, texts=shortdoc, n=2, topics=1)$docs[[1]] pdf("findThoughts-T1.pdf") plotQuote(thoughts1, width=40, main="Topic 1") dev.off()# how about more documents for more of these topics? thoughts6 <- findThoughts(poliblogPrevFit, texts=shortdoc, n=2, topics=6)$docs[[1]] thoughts18 <- findThoughts(poliblogPrevFit, texts=shortdoc, n=2, topics=18)$docs[[1]] pdf("stm-plot-find-thoughts.pdf") # mfrow=c(2, 1)將會(huì)把圖輸出到2行1列的表格中 par(mfrow = c(2, 1), mar = c(.5, .5, 1, .5)) plotQuote(thoughts6, width=40, main="Topic 6") plotQuote(thoughts18, width=40, main="Topic 18") dev.off()

估算元數(shù)據(jù)和主題/主題內(nèi)容之間的關(guān)系：estimateEffect
out$meta$rating<-as.factor(out$meta$rating) # since we're preparing these coVariates by estimating their effects we call these estimated effects 'prep' # we're estimating Effects across all 20 topics, 1:20. We're using 'rating' and normalized 'day,' using the topic model poliblogPrevFit. # The meta data file we call meta. We are telling it to generate the model while accounting for all possible uncertainty. Note: when estimating effects of one covariate, others are held at their mean prep <- estimateEffect(1:20 ~ rating+s(day), poliblogPrevFit, meta=out$meta, uncertainty = "Global") summary(prep, topics=1) summary(prep, topics=2) summary(prep, topics=3) summary(prep, topics=4)
uncertainty有"Global", “Local”, "None"三個(gè)選擇，The default is “Global”, which will incorporate estimation uncertainty of the topic proportions into the uncertainty estimates using the method of composition. If users do not propagate the full amount of uncertainty, e.g., in order to speed up computational time, they can choose uncertainty = “None”, which will generally result in narrower confidence intervals because it will not include the additional estimation uncertainty.

summary(prep, topics=1)輸出結(jié)果：
Call: estimateEffect(formula = 1:20 ~ rating + s(day), stmobj = poliblogPrevFit, metadata = meta, uncertainty = "Global")Topic 1:Coefficients:Estimate Std. Error t value Pr(>|t|) (Intercept) 0.068408 0.011233 6.090 1.16e-09 *** ratingLiberal -0.002513 0.002588 -0.971 0.33170 s(day)1 -0.008596 0.021754 -0.395 0.69276 s(day)2 -0.035476 0.012314 -2.881 0.00397 ** s(day)3 -0.002806 0.015696 -0.179 0.85813 s(day)4 -0.030237 0.013056 -2.316 0.02058 * s(day)5 -0.026256 0.013791 -1.904 0.05695 . s(day)6 -0.010658 0.013584 -0.785 0.43269 s(day)7 -0.005835 0.014381 -0.406 0.68494 s(day)8 0.041965 0.016056 2.614 0.00897 ** s(day)9 -0.101217 0.016977 -5.962 2.56e-09 *** s(day)10 -0.024237 0.015679 -1.546 0.12216 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
3.6 Visualize: Presenting STM results

Summary visualization

主題占比條形圖
# see PROPORTION OF EACH TOPIC in the entire CORPUS. Just insert your STM output pdf("top-topic.pdf") plot(poliblogPrevFit, type = "summary", xlim = c(0, .3)) dev.off()

Metadata/topic relationship visualization

主題關(guān)系對(duì)比圖
pdf("stm-plot-topical-prevalence-contrast.pdf") plot(prep, covariate = "rating", topics = c(6, 13, 18),model = poliblogPrevFit, method = "difference",cov.value1 = "Liberal", cov.value2 = "Conservative",xlab = "More Conservative ... More Liberal",main = "Effect of Liberal vs. Conservative",xlim = c(-.1, .1), labeltype = "custom",custom.labels = c("Obama/McCain", "Sarah Palin", "Bush Presidency")) dev.off()
主題6、13、18自定義標(biāo)簽為"Obama/McCain"、“Sarah Palin”、“Bush Presidency”，主題6、主題13的意識(shí)形態(tài)偏中立，既不是保守，也不是自由，主題18的意識(shí)形態(tài)偏向于保守。

主題隨著時(shí)間變化的趨勢(shì)圖
pdf("stm-plot-topic-prevalence-with-time.pdf") plot(prep, "day", method = "continuous", topics = 13, model = z, printlegend = FALSE, xaxt = "n", xlab = "Time (2008)") monthseq <- seq(from = as.Date("2008-01-01"), to = as.Date("2008-12-01"), by = "month") monthnames <- months(monthseq) # There were 50 or more warnings (use warnings() to see the first 50) axis(1, at = as.numeric(monthseq) - min(as.numeric(monthseq)), labels = monthnames) dev.off()
運(yùn)行報(bào)錯(cuò)，但可以輸出以下圖片，原因不明

topic content

顯示某主題中哪些詞匯與一個(gè)變量值與另一個(gè)變量值的關(guān)聯(lián)度更大。
# TOPICAL CONTENT. # STM can plot the influence of covariates included in as a topical content covariate. # A topical content variable allows for the vocabulary used to talk about a particular # topic to vary. First, the STM must be fit with a variable specified in the content option. # Let's do something different. Instead of looking at how prevalent a topic is in a class of documents categorized by meta-data covariate... # ... let's see how the words of the topic are emphasized differently in documents of each category of the covariate # First, we we estimate a new stm. It's the same as the old one, including prevalence option, but we add in a content option poliblogContent <- stm(out$documents, out$vocab, K = 20, prevalence = ~rating + s(day), content = ~rating, max.em.its = 75, data = out$meta, init.type = "Spectral") pdf("stm-plot-content-perspectives.pdf") plot(poliblogContent, type = "perspectives", topics = 10) dev.off()

主題10與古巴有關(guān)。它最常用的詞是“拘留、監(jiān)禁、法庭、非法、酷刑、強(qiáng)制執(zhí)行、古巴”。上顯示了自由派和保守派對(duì)這個(gè)主題的不同看法，自由派強(qiáng)調(diào)“酷刑”，而保守派則強(qiáng)調(diào)“非法”和“法律”等典型的法庭用語(yǔ)

原文：Its top FREX words were “detaine, prison, court, illeg, tortur, enforc, guantanamo”中的tortur應(yīng)為torture

繪制主題間的詞匯差異
pdf("stm-plot-content-perspectives-16-18.pdf") plot(poliblogPrevFit, type = "perspectives", topics = c(16, 18)) dev.off()

Plotting covariate interactions
# Interactions between covariates can be examined such that one variable may ??moderate?? # the effect of another variable. ###Interacting covariates. Maybe we have a hypothesis that cities with low $$/capita become more repressive sooner, while cities with higher budgets are more patient ##first, we estimate an STM with the interaction poliblogInteraction <- stm(out$documents, out$vocab, K = 20,prevalence = ~rating * day, max.em.its = 75,data = out$meta, init.type = "Spectral") # Prep covariates using the estimateEffect() function, only this time, we include the # interaction variable. Plot the variables and save as pdf files. prep <- estimateEffect(c(16) ~ rating * day, poliblogInteraction,metadata = out$meta, uncertainty = "None") pdf("stm-plot-two-topic-contrast.pdf") plot(prep, covariate = "day", model = poliblogInteraction,method = "continuous", xlab = "Days", moderator = "rating",moderator.value = "Liberal", linecol = "blue", ylim = c(0, 0.12),printlegend = FALSE) plot(prep, covariate = "day", model = poliblogInteraction,method = "continuous", xlab = "Days", moderator = "rating",moderator.value = "Conservative", linecol = "red", add = TRUE,printlegend = FALSE) legend(0, 0.06, c("Liberal", "Conservative"),lwd = 2, col = c("blue", "red")) dev.off()

上圖描繪了時(shí)間（博客發(fā)帖的日子）和評(píng)分（自由派和保守派）之間的關(guān)系。主題16患病率以時(shí)間的線性函數(shù)繪制，評(píng)分為0（自由）或1（保守）。

3.7 Extend: Additional tools for interpretation and visualization

繪制詞云圖
pdf("stm-plot-wordcloud.pdf") cloud(poliblogPrevFit, topic = 13, scale = c(2, 0.25)) dev.off()

主題相關(guān)性
# topicCorr(). # STM permits correlations between topics. Positive correlations between topics indicate # that both topics are likely to be discussed within a document. A graphical network # display shows how closely related topics are to one another (i.e., how likely they are # to appear in the same document). This function requires 'igraph' package. # see GRAPHICAL NETWORK DISPLAY of how closely related topics are to one another, (i.e., how likely they are to appear in the same document) Requires 'igraph' package mod.out.corr <- topicCorr(poliblogPrevFit) pdf("stm-plot-topic-correlations.pdf") plot(mod.out.corr) dev.off()

stmCorrViz

stmCorrViz軟件包提供了一個(gè)不同的d3可視化環(huán)境，該環(huán)境側(cè)重于使用分層聚類(lèi)方法將主題分組，從而可視化主題相關(guān)性。

存在亂碼問(wèn)題
# The stmCorrViz() function generates an interactive visualisation of topic hierarchy/correlations in a structural topicl model. The package performs a hierarchical # clustering of topics that are then exported to a JSON object and visualised using D3. # corrViz <- stmCorrViz(poliblogPrevFit, "stm-interactive-correlation.html", documents_raw=data$documents, documents_matrix=out$documents)stmCorrViz(poliblogPrevFit, "stm-interactive-correlation.html", documents_raw=data$documents, documents_matrix=out$documents)

4 Changing basic estimation defaults

此部分為解釋如何更改stm包的估算命令中的默認(rèn)設(shè)置

首先討論如何在初始化模型參數(shù)的不同方法中進(jìn)行選擇，然后討論如何設(shè)置和評(píng)估收斂標(biāo)準(zhǔn)，再描述一種在分析包含數(shù)萬(wàn)個(gè)或更多文檔時(shí)加速收斂的方法，最后，討論內(nèi)容協(xié)變量模型的一些變化，這些變化允許用戶(hù)控制模型的復(fù)雜性。

問(wèn)題

ems.its和run的區(qū)別是什么？ems.its表示的組大迭代數(shù)，每次迭代run=20？

3.4-3中如何根據(jù)四個(gè)圖確定合適的主題數(shù)？

補(bǔ)充

在Ingest部分，作者提到其他用于文本處理的quanteda包，該包可以方便地導(dǎo)入文本和相關(guān)元數(shù)據(jù)，準(zhǔn)備要處理的文本，并將文檔轉(zhuǎn)換為文檔術(shù)語(yǔ)矩陣（document-term matrix）。另一個(gè)包，readtext包含非常靈活的工具，用于讀取多種文本格式，如純文本、XML和JSON格式，可以輕松地從中創(chuàng)建語(yǔ)料庫(kù)。

為從其他文本處理程序中讀取數(shù)據(jù)，可使用txtorg，此程序可以創(chuàng)建三種獨(dú)立的文件：a metadata file, a vocabulary file, and a file with the original documents。默認(rèn)導(dǎo)出格式為L(zhǎng)DA-C sparse matrix format，可以用readCorpus()設(shè)置"ldac"option以讀取

論文：stm: An R Package for Structural Topic Models (harvard.edu)

參考文章：R軟件 STM package實(shí)操- 嗶哩嗶哩 (bilibili.com)

相關(guān)github倉(cāng)庫(kù)：

JvH13/FF-STM: Web Appendix - Methodology for Structural Topic Modeling (github.com)

dondealban/learning-stm: Learning structural topic modeling using the stm R package. (github.com)

bstewart/stm: An R Package for the Structural Topic Model (github.com)

總結(jié)

以上是生活随笔為你收集整理的结构主题模型（一）stm包工作流的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

模型

主题

结构

工作流

歡迎分享！

轉(zhuǎn)載請(qǐng)說(shuō)明來(lái)源于"生活随笔"，并保留原作者的名字。

本文地址：结构主题模型（一）stm包工作流

上一篇： JS数据类型与分支结构

下一篇：双NameNode的同步机制

最新發(fā)布

点击弹窗 input直接是待输入状态_第六课：你知道如何用两行代码做个弹窗吗？看这里...

暖通专业标准规范大全_中高级职称专业分类改革机械类十大热门专业分享

动态添加的路由直接访问_VUE 动态路由（二）

重新分区_手机DATA重新分区教程(超详细)

怎么挪动_你真的懂iPhone上的小圆点怎么玩吗

熱門(mén)推薦

蓝牙厂商代码与公司对应列表

历年高考报考人数和录取人数

河南王牌计算机专业,河南计算机专业实力突出的7所大学，郑大位列次席，榜首实至名归...

UniCode编码对照表及过滤方案

LeetCode——Backtracking

標(biāo)簽云

计算机资源

连接数据库

单元格

游戏开发者

蓝牙耳机

设计理念

程序语言

微信游戏

软件安装

双系统

封微信

SerDeInfo

VS600HD

制作人

教你用

脑膜炎

土亘

领着

气势如虹

新时达

急跌

水千丞龙血

得来

hc165C

小米加步枪

冰罗皇

RIM

求海字

SEAndroid

vpath

华谊嘉信是

汗蒸后

Goodfellow

出一证多卡

scandir

java_SlickEdit

单行本

洗硬后

五菱

获选

罪会判

诺贝润

快速通道

不考退费

荧光棒

出口商品

前列腺癌

旅游区

鹰角弓

第十一版

荣荣

isSuccess

学霸们来答呀

turner

DTStructure

BEGIN_MESSAGE_MAP

换胎一

一直瘦

duanshishi

GraphRNN

pyYAML

QSerialPortInfo

雏鸟

ElastieSearch

卡龙

军民

解放后

风生

唐三

责任险

翻墨未遮

gta5ifruit

H3C

礼贤下士

求鲨影

MockingBrid

gnuplot

react_React

怯场

金彭吉途

BITPOS

EasyLogging

蒜香鸡柳

setprecision

wenku

datefromat

陈劲

stdafx

交强险不

双标男是

电视监控

身心

结尾

铁杵磨成针

追到

骑士团

眼帘

Siberia

一线连

美猴王

拟声

十六

新倚天

激变

叽叽

IDEAGIT

船台

GetLBText

HIGO

肠痉挛

在我看来

forEachcount

天猫年

DataReceivedEventHandler

长隆玩

相關知識點

结束

开锁

柏拉图

高考数学

徐水

布尔值

白树栖鸦

石器

雨夜

拉着

Rounted

马去车回一望尘

三军可夺帅

_miaov

邓州

httpry

FILETIME

TZS

食管癌

iDST

java_Jemalloc

上这多

py2neo

cube_mysql

喊个麦是

myhive

冥王星

无所

榼口流隘石

丁爽

老子

夺命

帮看下

卫士安

全聚德

香山

第十四版

轩逸低配

聊一聊呀

抖币时

aline

FilesApp

变一维

前列腺肥大

没量坑

谈贾跃亭

from_java

尖括号

快充直屏

kjb

钱站一直

吃鸡四排炫酷

运输部

小笑话

五氯酚

大会

让人感动

大灰熊

零售店

km

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

结构主题模型（一）stm包工作流

前言

3.0 讀取數(shù)據(jù)

3.1 Ingest: Reading and processing text data

3.2 Prepare: Associating text with metadata

3.3 Estimate: Estimating the structural topic model

3.4 Evaluate: Model selection and search

3.5 Understand: Interpreting the STM by plotting and inspecting results

3.6 Visualize: Presenting STM results

Summary visualization

Metadata/topic relationship visualization

主題隨著時(shí)間變化的趨勢(shì)圖

topic content

繪制主題間的詞匯差異

Plotting covariate interactions

3.7 Extend: Additional tools for interpretation and visualization

繪制詞云圖

主題相關(guān)性

stmCorrViz

4 Changing basic estimation defaults

問(wèn)題

補(bǔ)充

總結(jié)