sql查询涵盖的时段_涵盖的主题
sql查詢涵蓋的時(shí)段
涵蓋的主題: (Topics Covered:)
1. 什么是NLP? (1. What is NLP?)
- A changing field 不斷變化的領(lǐng)域
- Resources 資源資源
- Tools 工具類
- Python libraries Python庫
- Example applications 應(yīng)用范例
- Ethics issues 道德問題
2. 使用NMF和SVD進(jìn)行主題建模 (2. Topic Modeling with NMF and SVD)
part-1part-2 (click to follow the link to article)
第1 部分,第2部分 (單擊以跟隨文章鏈接)
- Stop words, stemming, & lemmatization 停用詞,詞干和詞形化
- Term-document matrix 術(shù)語文檔矩陣
- Topic Frequency-Inverse Document Frequency (TF-IDF) 主題頻率-逆文檔頻率(TF-IDF)
- Singular Value Decomposition (SVD) 奇異值分解(SVD)
- Non-negative Matrix Factorization (NMF) 非負(fù)矩陣分解(NMF)
- Truncated SVD, Randomized SVD 截?cái)郤VD,隨機(jī)SVD
3. 使用樸素貝葉斯,邏輯回歸和ngram進(jìn)行情感分類 (3. Sentiment classification with Naive Bayes, Logistic regression, and ngrams)
part -1(click to follow the link to article)
-1部分 (單擊以跟隨文章鏈接)
- Sparse matrix storage 稀疏矩陣存儲
- Counters 專柜
- the fastai library 法斯特圖書館
- Naive Bayes 樸素貝葉斯
- Logistic regression 邏輯回歸
- Ngrams 克
- Logistic regression with Naive Bayes features, with trigrams 具有樸素貝葉斯功能的邏輯回歸,帶有三字母組合
4.正則表達(dá)式(和重新訪問令牌化) (4. Regex (and re-visiting tokenization))
5.語言建模和情感分類與深度學(xué)習(xí) (5. Language modeling & sentiment classification with deep learning)
- Language model 語言模型
- Transfer learning 轉(zhuǎn)移學(xué)習(xí)
- Sentiment classification 情感分類
6.使用RNN進(jìn)行翻譯 (6. Translation with RNNs)
- Review Embeddings 查看嵌入
- Bleu metric 藍(lán)光指標(biāo)
- Teacher Forcing 教師強(qiáng)迫
- Bidirectional 雙向的
- Attention 注意
7.使用Transformer架構(gòu)進(jìn)行翻譯 (7. Translation with the Transformer architecture)
- Transformer Model 變壓器型號
- Multi-head attention 多頭注意力
- Masking 掩蔽
- Label smoothing 標(biāo)簽平滑
8. NLP中的偏見與道德 (8. Bias & ethics in NLP)
- bias in word embeddings 詞嵌入中的偏見
- types of bias 偏見類型
- attention economy 注意經(jīng)濟(jì)
- drowning in fraudulent/fake info 淹沒在欺詐/虛假信息中
使用NMF和SVD進(jìn)行主題建模:第2部分 (Topic Modeling with NMF and SVD : Part-2)
please find part-1 here: Topic Modeling with NMF and SVD
請?jiān)诖颂幷业降?部分: 使用NMF和SVD進(jìn)行主題建模
Let’s wrap up some loose ends from last time.
讓我們從上次總結(jié)一些松散的結(jié)局。
兩種文化 (The two cultures)
This “debate” captures the tension between two approaches:
這個(gè)“辯論”抓住了兩種方法之間的張力:
- modeling the underlying mechanism of a phenomena 建模現(xiàn)象的潛在機(jī)制
- using machine learning to predict outputs (without necessarily understanding the mechanisms that create them) 使用機(jī)器學(xué)習(xí)來預(yù)測輸出(不必了解創(chuàng)建它們的機(jī)制)
There was a research project (in 2007) that involved manually coding each of the above reactions. The scientist were determining if the final system could generate the same ouputs (in this case, levels in the blood of various substrates) as were observed in clinical studies.
有一個(gè)研究項(xiàng)目(2007年)涉及對上述每個(gè)React進(jìn)行手動(dòng)編碼。 科學(xué)家正在確定最終系統(tǒng)是否可以產(chǎn)生與臨床研究中觀察到的相同的輸出量(在這種情況下,是各種底物血液中的水平)。
The equation for each reaction could be quite complex:
每個(gè)React的方程式可能非常復(fù)雜:
This is an example of modeling the underlying mechanism, and is very different from a machine learning approach.Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2391141/
這是對基礎(chǔ)機(jī)制進(jìn)行建模的示例,與機(jī)器學(xué)習(xí)方法有很大不同。 資料來源: https : //www.ncbi.nlm.nih.gov/pmc/articles/PMC2391141/
每個(gè)州最受歡迎的單詞 (The most popular word in each state)
A time to remove stop words
刪除停用詞的時(shí)間
因式分解與矩陣分解相似 (Factorization is analgous to matrix decomposition)
帶整數(shù) (With Integers)
Multiplication:
乘法:
Factorization is the “opposite” of multiplication:Here, the factors have the nice property of being prime.Prime factorization is much harder than multiplication (which is good, because it’s the heart of encryption).
因子分解是乘法的“對立”:在這里,因子具有素?cái)?shù)的優(yōu)良特性。素因分解比乘法難得多(這很好,因?yàn)樗羌用艿暮诵?。
與矩陣 (With Matrices)
Matrix decompositions are a way of taking matrices apart (the “opposite” of matrix multiplication).Similarly, we use matrix decompositions to come up with matrices with nice properties.Taking matrices apart is harder than putting them together.
矩陣分解是將矩陣分開(矩陣乘法的“對面”)的一種方法。類似地,我們使用矩陣分解來得出具有良好屬性的矩陣。將矩陣分開比將它們放在一起要困難。
One application:
一個(gè)應(yīng)用程序 :
What are the nice properties that matrices in an SVD decomposition have?
SVD分解中的矩陣有哪些好的屬性?
一些線性代數(shù)復(fù)習(xí) (Some Linear Algebra Review)
矩陣向量乘法 (Matrix-vector multiplication)
takes a linear combination of the columns of A, using coefficients xhttp://matrixmultiplication.xyz/
使用系數(shù)x http://matrixmultiplication.xyz/對A列進(jìn)行線性組合
矩陣矩陣乘法 (Matrix-matrix multiplication)
each column of C is a linear combination of columns of A, where the coefficients come from the corresponding column of C
C的每一列都是A的列的線性組合,其中系數(shù)來自C的對應(yīng)列
(來源: NMF教程 ) 矩陣作為變換 ((source: NMF Tutorial)Matrices as Transformations)
The 3Blue 1Brown Essence of Linear Algebra videos are fantastic. They give a much more visual & geometric perspective on linear algreba than how it is typically taught. These videos are a great resource if you are a linear algebra beginner, or feel uncomfortable or rusty with the material.
線性代數(shù)的3Blue 1Brown 本質(zhì)視頻非常棒。 他們對線性algreba的視覺和幾何透視比通常講授的要多得多。 如果您是線性代數(shù)初學(xué)者,或者對材料感到不舒服或生銹,這些視頻將是一個(gè)很好的資源。
Even if you are a linear algrebra pro, I still recommend these videos for a new perspective, and they are very well made.
即使您是線性algrebra專業(yè)人士,我仍然建議您以新角度觀看這些視頻,而且它們的制作精良。
In [2]:
在[2]中:
from IPython.display import YouTubeVideoYouTubeVideo("kYB8IZa5AuE")Excel中的英國文學(xué)SVD和NMF (British Literature SVD & NMF in Excel)
Data was downloaded from here
數(shù)據(jù)從這里下載
The code below was used to create the matrices which are displayed in the SVD and NMF of British Literature excel workbook. The data is intended to be viewed in Excel, I’ve just included the code here for thoroughness.
下面的代碼用于創(chuàng)建在英國文學(xué)excel工作簿的SVD和NMF中顯示的矩陣。 數(shù)據(jù)打算在Excel中查看,為了完整起見,這里僅包含代碼。
初始化,創(chuàng)建文檔術(shù)語矩陣 (Initializing, create document-term matrix)
In [2]:
在[2]中:
import numpy as npfrom sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizerfrom sklearn import decompositionfrom glob import globimport osIn [3]:
在[3]中:
np.set_printoptions(suppress=True)In [46]:
在[46]中:
filenames = []for folder in ["british-fiction-corpus"]: #, "french-plays", "hugo-les-misérables"]:filenames.extend(glob("data/literature/" + folder + "/*.txt"))
In [47]:
在[47]中:
len(filenames)Out[47]:
出[47]:
27In [134]:
在[134]中:
vectorizer = TfidfVectorizer(input='filename', stop_words='english')dtm = vectorizer.fit_transform(filenames).toarray()
vocab = np.array(vectorizer.get_feature_names())
dtm.shape, len(vocab)
Out[134]:
出[134]:
((27, 55035), 55035)In [135]:
在[135]中:
[f.split("/")[3] for f in filenames]Out[135]:
出[135]:
['Sterne_Tristram.txt','Austen_Pride.txt',
'Thackeray_Pendennis.txt',
'ABronte_Agnes.txt',
'Austen_Sense.txt',
'Thackeray_Vanity.txt',
'Trollope_Barchester.txt',
'Fielding_Tom.txt',
'Dickens_Bleak.txt',
'Eliot_Mill.txt',
'EBronte_Wuthering.txt',
'Eliot_Middlemarch.txt',
'Fielding_Joseph.txt',
'ABronte_Tenant.txt',
'Austen_Emma.txt',
'Trollope_Prime.txt',
'CBronte_Villette.txt',
'CBronte_Jane.txt',
'Richardson_Clarissa.txt',
'CBronte_Professor.txt',
'Dickens_Hard.txt',
'Eliot_Adam.txt',
'Dickens_David.txt',
'Trollope_Phineas.txt',
'Richardson_Pamela.txt',
'Sterne_Sentimental.txt',
'Thackeray_Barry.txt']
NMF (NMF)
In [136]:
在[136]中:
clf = decomposition.NMF(n_components=10, random_state=1)W1 = clf.fit_transform(dtm)H1 = clf.components_
In [137]:
在[137]中:
num_top_words=8def show_topics(a):top_words = lambda t: [vocab[i] for i in np.argsort(t)[:-num_top_words-1:-1]]
topic_words = ([top_words(t) for t in a])return [' '.join(t) for t in topic_words]
In [138]:
在[138]中:
def get_all_topic_words(H):top_indices = lambda t: {i for i in np.argsort(t)[:-num_top_words-1:-1]}
topic_indices = [top_indices(t) for t in H]return sorted(set.union(*topic_indices))
In [139]:
在[139]中:
ind = get_all_topic_words(H1)In [140]:
在[140]中:
vocab[ind]Out[140]:
出[140]:
array(['adams', 'allworthy', 'bounderby', 'brandon', 'catherine', 'cathy','corporal', 'crawley', 'darcy', 'dashwood', 'did', 'earnshaw',
'edgar', 'elinor', 'emma', 'father', 'ferrars', 'finn', 'glegg',
'good', 'gradgrind', 'hareton', 'heathcliff', 'jennings', 'jones',
'joseph', 'know', 'lady', 'laura', 'like', 'linton', 'little', 'll',
'lopez', 'louisa', 'lyndon', 'maggie', 'man', 'marianne', 'miss',
'mr', 'mrs', 'old', 'osborne', 'pendennis', 'philip', 'phineas',
'quoth', 'said', 'sissy', 'sophia', 'sparsit', 'stephen', 'thought',
'time', 'tis', 'toby', 'tom', 'trim', 'tulliver', 'uncle', 'wakem',
'wharton', 'willoughby'],
dtype='<U31')
In [141]:
在[141]中:
show_topics(H1)Out[141]:
出[141]:
['mr said mrs miss emma darcy little know','said little like did time know thought good',
'adams jones said lady allworthy sophia joseph mr',
'elinor marianne dashwood jennings willoughby mrs brandon ferrars',
'maggie tulliver said tom glegg philip mr wakem',
'heathcliff linton hareton catherine earnshaw cathy edgar ll',
'toby said uncle father corporal quoth tis trim',
'phineas said mr lopez finn man wharton laura',
'said crawley lyndon pendennis old little osborne lady',
'bounderby gradgrind sparsit said mr sissy louisa stephen']
In [142]:
在[142]中:
W1.shape, H1[:, ind].shapeOut[142]:
出[142]:
((27, 10), (10, 64))導(dǎo)出為CSV (Export to CSVs)
In [72]:
在[72]中:
from IPython.display import FileLink, FileLinksIn [119]:
在[119]中:
np.savetxt("britlit_W.csv", W1, delimiter=",", fmt='%.14f')FileLink('britlit_W.csv')
Out[119]:
出[119]:
britlit_W.csv
britlit_W.csv
In [120]:
在[120]中:
np.savetxt("britlit_H.csv", H1[:,ind], delimiter=",", fmt='%.14f')FileLink('britlit_H.csv')
Out[120]:
出[120]:
britlit_H.csv
britlit_H.csv
In [131]:
在[131]中:
np.savetxt("britlit_raw.csv", dtm[:,ind], delimiter=",", fmt='%.14f')FileLink('britlit_raw.csv')
Out[131]:
出[131]:
britlit_raw.csv
britlit_raw.csv
In [121]:
在[121]中:
[str(word) for word in vocab[ind]]Out[121]:
出[121]:
['adams','allworthy',
'bounderby',
'brandon',
'catherine',
'cathy',
'corporal',
'crawley',
'darcy',
'dashwood',
'did',
'earnshaw',
'edgar',
'elinor',
'emma',
'father',
'ferrars',
'finn',
'glegg',
'good',
'gradgrind',
'hareton',
'heathcliff',
'jennings',
'jones',
'joseph',
'know',
'lady',
'laura',
'like',
'linton',
'little',
'll',
'lopez',
'louisa',
'lyndon',
'maggie',
'man',
'marianne',
'miss',
'mr',
'mrs',
'old',
'osborne',
'pendennis',
'philip',
'phineas',
'quoth',
'said',
'sissy',
'sophia',
'sparsit',
'stephen',
'thought',
'time',
'tis',
'toby',
'tom',
'trim',
'tulliver',
'uncle',
'wakem',
'wharton',
'willoughby']
SVD (SVD)
In [143]:
在[143]中:
U, s, V = decomposition.randomized_svd(dtm, 10)In [144]:
在[144]中:
ind = get_all_topic_words(V)In [145]:
在[145]中:
len(ind)Out[145]:
出[145]:
52In [146]:
在[146]中:
vocab[ind]Out[146]:
出[146]:
array(['adams', 'allworthy', 'bounderby', 'bretton', 'catherine','crimsworth', 'darcy', 'dashwood', 'did', 'elinor', 'elton', 'emma',
'finn', 'fleur', 'glegg', 'good', 'gradgrind', 'hareton', 'hath',
'heathcliff', 'hunsden', 'jennings', 'jones', 'joseph', 'knightley',
'know', 'lady', 'linton', 'little', 'lopez', 'louisa', 'lydgate',
'madame', 'maggie', 'man', 'marianne', 'miss', 'monsieur', 'mr',
'mrs', 'pelet', 'philip', 'phineas', 'said', 'sissy', 'sophia',
'sparsit', 'toby', 'tom', 'tulliver', 'uncle', 'weston'],
dtype='<U31')
In [147]:
在[147]中:
show_topics(H1)Out[147]:
出[147]:
['mr said mrs miss emma darcy little know','said little like did time know thought good',
'adams jones said lady allworthy sophia joseph mr',
'elinor marianne dashwood jennings willoughby mrs brandon ferrars',
'maggie tulliver said tom glegg philip mr wakem',
'heathcliff linton hareton catherine earnshaw cathy edgar ll',
'toby said uncle father corporal quoth tis trim',
'phineas said mr lopez finn man wharton laura',
'said crawley lyndon pendennis old little osborne lady',
'bounderby gradgrind sparsit said mr sissy louisa stephen']
In [148]:
在[148]中:
np.savetxt("britlit_U.csv", U, delimiter=",", fmt='%.14f')FileLink('britlit_U.csv')
Out[148]:
出[148]:
britlit_U.csv
britlit_U.csv
In [149]:
在[149]中:
np.savetxt("britlit_V.csv", V[:,ind], delimiter=",", fmt='%.14f')FileLink('britlit_V.csv')
Out[149]:
出[149]:
britlit_V.csv
britlit_V.csv
In [150]:
在[150]中:
np.savetxt("britlit_raw_svd.csv", dtm[:,ind], delimiter=",", fmt='%.14f')FileLink('britlit_raw_svd.csv')
Out[150]:
出[150]:
britlit_raw_svd.csv
britlit_raw_svd.csv
In [151]:
在[151]中:
np.savetxt("britlit_S.csv", np.diag(s), delimiter=",", fmt='%.14f')FileLink('britlit_S.csv')
Out[151]:
出[151]:
britlit_S.csv
britlit_S.csv
In [152]:
在[152]中:
[str(word) for word in vocab[ind]]Out[152]:
出[152]:
['adams','allworthy',
'bounderby',
'bretton',
'catherine',
'crimsworth',
'darcy',
'dashwood',
'did',
'elinor',
'elton',
'emma',
'finn',
'fleur',
'glegg',
'good',
'gradgrind',
'hareton',
'hath',
'heathcliff',
'hunsden',
'jennings',
'jones',
'joseph',
'knightley',
'know',
'lady',
'linton',
'little',
'lopez',
'louisa',
'lydgate',
'madame',
'maggie',
'man',
'marianne',
'miss',
'monsieur',
'mr',
'mrs',
'pelet',
'philip',
'phineas',
'said',
'sissy',
'sophia',
'sparsit',
'toby',
'tom',
'tulliver',
'uncle',
'weston']
隨機(jī)SVD可加快速度 (Randomized SVD offers a speed up)
One way to address this is to use randomized SVD. In the below chart, the error is the difference between A — U S V, that is, what you’ve failed to capture in your decomposition:
解決此問題的一種方法是使用隨機(jī)SVD。 在下圖中,錯(cuò)誤是A — U S V之間的差,即您在分解中未能捕獲的內(nèi)容:
For more on randomized SVD, check out my PyBay 2017 talk.For significantly more on randomized SVD, check out the Computational Linear Algebra course.
有關(guān)隨機(jī)SVD的更多信息,請查看我的PyBay 2017演講 。有關(guān)隨機(jī)SVD的更多信息,請查看計(jì)算線性代數(shù)課程 。
完整與精簡SVD (Full vs Reduced SVD)
Remember how we were calling np.linalg.svd(vectors, full_matrices=False)? We set full_matrices=False to calculate the reduced SVD. For the full SVD, both U and V are square matrices, where the extra columns in U form an orthonormal basis (but zero out when multiplied by extra rows of zeros in S).
還記得我們?nèi)绾握{(diào)用np.linalg.svd(vectors, full_matrices=False)嗎? 我們設(shè)置full_matrices=False來計(jì)算簡化的SVD。 對于完整的SVD,U和V均為平方矩陣,其中U中的額外列構(gòu)成正交基(但與S中的額外零行相乘則為零)。
Diagrams from Trefethen:
來自Trefethen的圖表:
結(jié)束 (End)
學(xué)分: (Credits:)
https://www.fast.ai/
https://www.fast.ai/
翻譯自: https://medium.com/ai-in-plain-english/topics-covered-7feba459180f
sql查詢涵蓋的時(shí)段
總結(jié)
以上是生活随笔為你收集整理的sql查询涵盖的时段_涵盖的主题的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 后端存储Base64码传输的图片
- 下一篇: 软件接口测试是什么?怎么测?