日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【转】汇总:LDA理论、变形、优化、应用、工具库

發(fā)布時間:2024/4/14 编程问答 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【转】汇总:LDA理论、变形、优化、应用、工具库 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

轉(zhuǎn)自:http://site.douban.com/204776/widget/notes/12599608/note/287085506/

?


#LDA理論
——Topic Model相關(guān)論文匯總
http://site.douban.com/204776/widget/notes/12599608/note/286839088/
##Survey:
1. 基于文檔主題結(jié)構(gòu)的關(guān)鍵詞抽取方法研究
劉知遠的博士論文,他是當(dāng)時微博關(guān)鍵詞應(yīng)用的作者我記得。
在短文本上也提出了一些方法改進。
2. Parameter estimation for text analysis
這篇絕對是重量級。


#Short-Text:
1. Automatic Keyphrase Extraction by Bridging Vocabulary Gap

#Practice / In Action (especially in Chinese)
1. A new method of N-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese?
2. A Statistical Approach to Extract Chinese Chunk Candidates from Large Corpora
Statistical Substring Reduction in Linear Time
3. The Mathematics of Statistical Machine Translation: Parameter Estimation

##Anecdote:
LDA數(shù)學(xué)八卦
rickjin寫的,統(tǒng)計之都上連載的。
http://vdisk.weibo.com/s/qghK5

##LDA variation:
最近有個女人極其強大,總結(jié)了各種LDA變形。
在她發(fā)的兩篇近期論文里:
1. On the design of LDA models for aspect-based opinion mining
2. The FLDA model for aspect-based opinion mining: addressing the cold start problem (WWW'13)



##我看過的幾乎LDA paper所有打包
有一定是加過重點的(-noted):
有上面提到的一些論文,但比那個多的多。
可以直接看里面noted的文件夾,因為沒note過的我覺得沒用。
http://vdisk.weibo.com/s/BA3xC














#LDA優(yōu)化
——LDA優(yōu)化實現(xiàn)論文匯總
http://site.douban.com/204776/widget/notes/12599608/note/286923972/
覺得比較有實際應(yīng)用上的價值,因為文本數(shù)量有時候很多,實現(xiàn)上的優(yōu)化就很必要了。

快速推理算法:
Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation

在線學(xué)習(xí):
Online Learning for Latent Dirichlet Allocation
http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf
http://videolectures.net/nips2010_hoffman_oll/
www.ece.duke.edu/~lcarin/Lingbo4.15.2011.pptx

文本流的推理算法;
Topic models over text streams: a study of batch and online unsupervised learning
Ef?cient Methods for Topic Model Inference on Streaming Document Collections

分布式學(xué)習(xí):
Distributed Inference for Latent Dirichlet Allocation
PLDA+: Parallel Latent Dirichlet Allocation with Data Placement and Pipeline Processing











#LDA應(yīng)用
——LDA應(yīng)用變形
http://site.douban.com/204776/widget/notes/12599608/note/286930572/
說說LDA在不同應(yīng)用上的幾個變形,都有細微調(diào)整也都帶來了新的問題。

##情感分析
Opinion Integration Through Semi-supervised Topic Modeling
把傳統(tǒng)的Topic Model作為非監(jiān)督的典型,拓展成了半監(jiān)督。加入了模型的先驗信息,對于一些汽車產(chǎn)品,從維基百科中提出它的各個特征的描述,然后訓(xùn)練成先驗信息。

Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid
聯(lián)合抽取主題和觀點。引入監(jiān)督學(xué)習(xí)的方法,區(qū)分主題和情感詞匯。進一步再用LDA進行聚類。



##學(xué)術(shù)挖掘
比如KDD2013今年也有的作者建模,再比如學(xué)術(shù)熱點探測……
The author-topic model for authors and documents
同時對作者和主題進行建模。每個作者再限定該作者只能對應(yīng)一個主題,每個作者也是主題上的一個分布,同時用作者~主題的分布取代文檔~主題的分布。

Joint latent topic models for text and citations
對主題和引用同事建模,建立引用關(guān)系鏈接。

Detecting Topic Evolution in Scienti?c Literature: How Can Citations Help?
通過引用信息,建立主題進化模型。



##社會媒體主題

Twitter的研究太多了,小站SNA部分也總結(jié)過很多了。不多寫了。





















#LDA工具庫
——LDA工具庫
http://site.douban.com/204776/widget/notes/12599608/note/287084873/
(這部分還缺R,等我自己用過再做評價)



先發(fā)一個格式比較好的鏈接(但不全):
http://mengjunxie.github.io/ae-lda/topic-modeling.html
?





####
Latent Dirichlet allocation
http://www.cs.princeton.edu/~blei/lda-c/
This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data. LDA allows you to analyze of corpus, and extract the topics that combined to form its documents. For example, click here to see the topics estimated from a small corpus of Associated Press documents. LDA is fully described in Blei et al. (2003) .



####
Discrete Component Analysis
http://www.nicta.com.au/people/buntinew/discrete_component_analysis
The Discrete Component Analysis (DCA) software is being developed as a stand-alone package, and as a plug-in to the Elefant system, a machine learning toolbox from NICTA. Currently the software is being run in stand-alone mode using the data streaming libraries from the older and now unsupported MPCA system, developed at Helsinki Institute for IT. The software itself is written in the C language and compiles on a Linux and a Mac OS X environment.

The models presented here are known under many names, such as latent Dirichlet allocation, multi-aspect models, multinomial PCA, and non-negative matrix factorisation.



####
Infinite LDA
http://www.arbylon.net/projects/knowceans-ilda/readme.txt
https://bitbucket.org/gchrupala/colada/wiki/Resources
Implementations of Latent Dirichlet Allocation (LDA) and
Hierarchical Dirichlet Processes (HDP)

@author Gregor Heinrich, gregor :: arbylon : net
@version 0.96
@date 1 Mar 2011?

- History: ILDA version 0.1: May 2008, LDA version 0.1: Feb. 2005, based
on?http://arbylon.net/projects/LdaGibbsSampler.java

- Simple implementations of Gibbs sampling for LDA and HDP

- Scientific documentation: see texts lda.pdf and ilda.pdf

- Technical documentation: see Javadoc and source (packages *.corpus and?
*.utils are from knowceans-tools on SourceForge)

- Data documentation: see nips/readme.txt including source references

- License: All code is licensed under GPL v3.0.?

- If the code is used in scientific work, please refer to its source
via the URL:?

http://arbylon.net/projects/knowceans-ilda.zip

or the documentation of the ILDA or LDA implementations:

G. Heinrich. "Infinite LDA" -- implementing the HDP with minimum code
complexity. TN2011/1,?http://arbylon.net/publications/ilda.pdf,2011

G. Heinrich. Parameter estimation for text analysis. Technical report,
No. 09RP008-FIGD, Fraunhofer IGD, 2009?

TODO:

- Diverse checks, e.g., Antoniak distribution sampling, hyperparameter
estimators, general quantitative validation of HDP model

- Output formatting

- Visual matrix implementation for HDP / IldaGibbs






####
MAchine Learning for LanguagE Toolkit?
http://mallet.cs.umass.edu/
MALLET is open source software [License]. For research use, please remember to cite MALLET.
Download MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
MALLET includes sophisticated tools for document classification: efficient routines for converting text to "features", a wide variety of algorithms (including Na?ve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.?





####
Multithreaded LDA
https://sites.google.com/site/rameshnallapati/software
Multithreaded extension of Blei's LDA implementation. C Ramesh Nallapati Speeds up the computation by orders of magnitude depending on the number of processors.





####
GibbsLDA++: A C/C++ Implementation of Latent Dirichlet Allocation
https://sites.google.com/site/rameshnallapati/software
GibbsLDA++ is a C/C++ implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling technique for parameter estimation and inference. It is very fast and is designed to analyze hidden/latent topic structures of large-scale datasets including large collections of text/Web documents. LDA was first introduced by David Blei et al [Blei03]. There have been several implementations of this model in C (using Variational Methods), Java, and Matlab. We decided to release this implementation of LDA in C/C++ using Gibbs Sampling to provide an alternative to the topic-model community.

GibbsLDA++ is useful for the following potential application areas:

Information retrieval and search (analyzing semantic/latent topic/concept structures of large text collection for a more intelligent information search).
Document classification/clustering, document summarization, and text/web mining community in general.
Content-based image clustering, object recognition, and other applications of computer vision in general.
Other potential applications in biological data.







####
Gensim
http://radimrehurek.com/gensim/
Gensim is a FREE Python library
Scalable statistical semantics
Analyze plain-text documents for semantic structure
Retrieve semantically similar documents






####
Stanford Topic Modeling Toolbox
http://nlp.stanford.edu/software/tmt/tmt-0.4/
The Stanford Topic Modeling Toolbox (TMT) brings topic modeling tools to social scientists and others who wish to perform analysis on datasets that have a substantial textual component. The toolbox features that ability to:


Import and manipulate text from cells in Excel and other spreadsheets.
Train topic models (LDA, Labeled LDA, and PLDA new) to create summaries of the text.
Select parameters (such as the number of topics) via a data-driven process.
Generate rich Excel-compatible outputs for tracking word usage across topics, time, and other groupings of data.
The Stanford Topic Modeling Toolbox was written at the Stanford NLP group by:?
Daniel Ramage and Evan Rosen, first released in September 2009.




####
Matlab Topic Modeling Toolbox 1.4
http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm
Installation & Licensing

Download the zipped toolbox (18Mb).?
NOTE: this toolbox now works with 64 bit compilers. If you are looking for the old version of this toolbox that has the code for 32 bit compilers, download this version

The program is free for scientific use. Please contact the authors, if you are planning to use the software for commercial purposes. The software must not be further distributed without prior permission of the author. By using this software, you are agreeing to this license statement.

Type 'help function' at command prompt for more information on each function

Read these notes on data format for a description on the input and output format for the different topic models

Note for MAC and Linux users: some of the Matlab functions are implemented with mex code (C code linked to Matlab). For windows based platforms, the dll's are already provided in the distribution package. For other platforms, please compile the mex functions by executing "compilescripts" at the Matlab prompt



















#
最后的最后,
發(fā)個Topic Modeling Bibliography
http://www.cs.princeton.edu/~mimno/topics.html

?

轉(zhuǎn)載于:https://www.cnblogs.com/parapax/p/3714239.html

總結(jié)

以上是生活随笔為你收集整理的【转】汇总:LDA理论、变形、优化、应用、工具库的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。