MIT自然语言处理第三讲:概率语言模型(第一、二、三部分)
MIT自然語言處理第三講:概率語言模型(第一部分)
自然語言處理:概率語言模型
Natural Language Processing: Probabilistic Language Modeling
作者:Regina Barzilay(MIT,EECS Department, November 15, 2004)
譯者:我愛自然語言處理(www.52nlp.cn ,2009年1月16日)
上一講主要內(nèi)容回顧(Last time)
語料庫處理(Corpora processing)
齊夫定律(Zipf’s law)
數(shù)據(jù)稀疏問題(Data sparseness)
本講主要內(nèi)容(Today):
概率語言模型(Probabilistic language Modeling)
一、 簡(jiǎn)單介紹
a) 預(yù)測(cè)字符串概率(Predicting String Probabilities)
i. 那一個(gè)字符串更有可能或者更符合語法Which string is more likely? (Which string is more grammatical?)
1. Grill doctoral candidates.
2. Grill doctoral updates.
(example from Lee 1997)
ii. 向字符串賦予概率的方法被稱之為語言模型(Methods for assigning probabilities to strings are called Language Models.)
b) 動(dòng)機(jī)(Motivation)
i. 語音識(shí)別,拼寫檢查,光學(xué)字符識(shí)別和其他應(yīng)用領(lǐng)域(Speech recognition, spelling correction, optical character recognition and other applications)
ii. 讓E作為物證(?不確定翻譯),我們需要決定字符串W是否是有E編碼而得到的消息(Let E be physical evidence, and we need to determine whether the string W is the message encoded by E)
iii. 使用貝葉斯規(guī)則(Use Bayes Rule):
P(W/E)={P_{LM}(W)P(E/W)}/{P(E)}
其中P_{LM}(W)是語言模型概率(where?P_{LM}(W)is language model probability)
iv. P_{LM}(W)提供了必要的消歧信息(P_{LM}(W)provides the information necessary for isambiguation (esp. when the physical evidence is not sufficient for disambiguation))
c) 如何計(jì)算(How to Compute it)?
i. 樸素方法(Naive approach):
1. 使用最大似然估計(jì)(Use the maximum likelihood estimates (MLE))——字符串在語料庫S中存在次數(shù)的值由語料庫規(guī)模歸一化(the number of times that the string occurs in the corpus S, normalized by the corpus size):
P_{MLE}(Grill~doctorate~candidates)={count(Grill~doctorate~candidates)}/delim{|}{S}{|}
2. 對(duì)于未知事件,最大似然估計(jì)P_{MLE}=0(For unseen events,?P_{MLE}=0)
——數(shù)據(jù)稀疏問題比較“可怕”(Dreadful behavior in the presence of Data Sparseness)
d) 兩個(gè)著名的句子(Two Famous Sentences)
i. “It is fair to assume that neither sentence
“Colorless green ideas sleep furiously”
nor
“Furiously sleep ideas green colorless”
… has ever occurred … Hence, in any statistical model … these sentences will be ruled out on identical grounds as equally “remote” from English. Yet (1), though nonsensical, is grammatical, while (2) is not.” [Chomsky 1957]
ii. 注:這是喬姆斯基《句法結(jié)構(gòu)》第9頁上的:下面的兩句話從來沒有在一段英語談話中出現(xiàn)過,從統(tǒng)計(jì)角度看離英語同樣的“遙遠(yuǎn)”,但只有句1是合乎語法的:
1) Colorless green ideas sleep furiously.
2) Furiously sleep ideas sleep green colorless .
“從來沒有在一段英語談話中出現(xiàn)過”、“從統(tǒng)計(jì)角度看離英語同樣的‘遙遠(yuǎn)’”要看從哪個(gè)角度去看了,如果拋開具體的詞匯、從形類角度看,恐怕句1的統(tǒng)計(jì)頻率要高于句2而且在英語中出現(xiàn)過。
未完待續(xù):第二部分
附:課程及課件pdf下載MIT英文網(wǎng)頁地址:
http://people.csail.mit.edu/regina/6881/
注:本文遵照麻省理工學(xué)院開放式課程創(chuàng)作共享規(guī)范翻譯發(fā)布,轉(zhuǎn)載請(qǐng)注明出處“我愛自然語言處理”:www.52nlp.cn
from:http://www.52nlp.cn/mit-nlp-third-lesson-probabilistic-language-modeling-first-part/
MIT自然語言處理第三講:概率語言模型(第二部分)
自然語言處理:概率語言模型
Natural Language Processing: Probabilistic Language Modeling
作者:Regina Barzilay(MIT,EECS Department, November 15, 2004)
譯者:我愛自然語言處理(www.52nlp.cn ,2009年1月17日)
二、語言模型構(gòu)造
a) 語言模型問題提出(The Language Modeling Problem)
i. 始于一些詞匯集合(Start with some vocabulary):
ν= {the, a, doctorate, candidate, Professors, grill, cook, ask, …}
ii. 得到一個(gè)與詞匯集合v關(guān)的訓(xùn)練樣本(Get a training sample of v):
Grill doctorate candidate.
Cook Professors.
Ask Professors.
……
iii. 假設(shè)(Assumption):訓(xùn)練樣本是由一些隱藏的分布P刻畫的(training sample is drawn from some underlying distribution P)
iv. 目標(biāo)(Goal): 學(xué)習(xí)一個(gè)概率分布P prime盡可能的與P近似(learn a probability distribution?P prime?“as close” to P as possible)
sum{x in v}{}{P prime (x)}=1, P prime (x) >=0
P prime (candidates)=10^{-5}
{P prime (ask~candidates)}=10^{-8}
b) 獲得語言模型(Deriving Language Model)
i. 向一組單詞序列w_{1}w_{2}…w_{n}賦予概率(Assign probability to a sequencew_{1}w_{2}…w_{n}?)
ii. 應(yīng)用鏈?zhǔn)椒▌t(Apply chain rule):
1. P(w1w2…wn)= P(w1|S)?P(w2|S,w1)?P(w3|S,w1,w2)…P(E|S,w1,w2,…,wn)
2. 基于“歷史”的模型(History-based model): 我們從過去的事件中預(yù)測(cè)未來的事件(we predict following things from past things)
3. 我們需要考慮多大范圍的上下文(How much context do we need to take into account)?
c) 馬爾科夫假設(shè)(Markov Assumption)
i. 對(duì)于任意長(zhǎng)度的單詞序列P(wi|w(i-n) …w(i?1))是比較難預(yù)測(cè)的(For arbitrary long contexts P(wi|w(i-n) …w(i?1))difficult to estimate)
ii. 馬爾科夫假設(shè)(Markov Assumption): 第i個(gè)單詞wi僅依賴于前n個(gè)單詞(wi depends only on n preceding words)
iii. 三元語法模型(又稱二階馬爾科夫模型)(Trigrams (second order)):
1. P(wi|START,w1,w2,…,w(i?1))=P(wi|w(i?1),w(i?2))
2. P(w1w2…wn)= P(w1|S)?P(w2|S,w1)?P(w3|w1,w2)?…P(E|w(n?1),wn)
d) 一種語言計(jì)算模型(A Computational Model of Language)
i. 一種有用的概念和練習(xí)裝置(A useful conceptual and practical device):“拋硬幣”模型(coin-flipping models)
1. 由隨機(jī)算法生成句子(A sentence is generated by a randomized algorithm)
——生成器可以是許多“狀態(tài)”中的一個(gè)(The generator can be one of several “states”)
——拋硬幣決定下一個(gè)狀態(tài)(Flip coins to choose the next state)
——拋其他硬幣決定哪一個(gè)字母或單詞輸出(Flip other coins to decide which letter or word to output)
ii. 香農(nóng)(Shannon): “The states will correspond to the“residue of influence” from preceding letters”
e) 基于單詞的逼近(Word-Based Approximations)
注:以下是用莎士比亞作品訓(xùn)練后隨機(jī)生成的句子,可參考《自然語言處理綜論》
i. 一元語法逼近(這里MIT課件有誤,不是一階逼近(First-order approximation))
1. To him swallowed confess hear both. which. OF save
2. on trail for are ay device and rote life have
3. Every enter now severally so, let
4. Hill he late speaks; or! a more to leg less first you
5. enter
ii. 三元語法逼近(這里課件有誤,不是三階逼近(Third-order approximation))
1. King Henry. What! I will go seek the traitor Gloucester.
2. Exeunt some of the watch. A great banquet serv’s in;
3. Will you tell me how I am?
4. It cannot be but so.
未完待續(xù):第三部分
附:課程及課件pdf下載MIT英文網(wǎng)頁地址:
http://people.csail.mit.edu/regina/6881/
注:本文遵照麻省理工學(xué)院開放式課程創(chuàng)作共享規(guī)范翻譯發(fā)布,轉(zhuǎn)載請(qǐng)注明出處“我愛自然語言處理”:www.52nlp.cn
from:http://www.52nlp.cn/mit-nlp-third-lesson-probabilistic-language-modeling-second-part/
MIT自然語言處理第三講:概率語言模型(第三部分)
自然語言處理:概率語言模型
Natural Language Processing: Probabilistic Language Modeling
作者:Regina Barzilay(MIT,EECS Department, November 15, 2004)
譯者:我愛自然語言處理(www.52nlp.cn ,2009年1月18日)
三、 語言模型的評(píng)估
a) 評(píng)估一個(gè)語言模型(Evaluating a Language Model)
i. 我們有n個(gè)測(cè)試單詞串(We have n test string):
S_{1},S_{2},…,S_{n}
ii. 考慮在我們模型之下這段單詞串的概率(Consider the probability under our model):
prod{i=1}{n}{P(S_{i})}
或?qū)?shù)概率(or log probability):
log{prod{i=1}{n}{P(S_{i})}}=sum{i=1}{n}{logP(S_{i})}
iii. 困惑度(Perplexity):
Perplexity = 2^{-x}
這里x = {1/W}sum{i=1}{n}{logP(S_{i})}
W是測(cè)試數(shù)據(jù)里總的單詞數(shù)(W is the total number of words in the test data.)
iv. 困惑度是一種有效的“分支因子”評(píng)測(cè)方法(Perplexity is a measure of effective “branching factor”)
1. 我們有一個(gè)規(guī)模為N的詞匯集v,模型預(yù)測(cè)(We have a vocabulary v of size N, and model predicts):
P(w) = 1/N 對(duì)于v中所有的單詞(for all the words in v.)
v. 困惑度是什么(What about Perplexity)?
Perplexity = 2^{-x}
這里?x = log{1/N}
于是 Perplexity = N
vi. 人類行為的評(píng)估(estimate of human performance (Shannon, 1951)
1. 香農(nóng)游戲(Shannon game)— 人們?cè)谝欢挝谋局胁聹y(cè)下一個(gè)字母(humans guess next letter in text)
2. PP=142(1.3 bits/letter), uncased, open vocabulary
vii. 三元語言模型的評(píng)估(estimate of trigram language model (Brown et al. 1992))
PP=790(1.75 bits/letter), cased, open vocabulary
未完待續(xù):第四部分
附:課程及課件pdf下載MIT英文網(wǎng)頁地址:
http://people.csail.mit.edu/regina/6881/
注:本文遵照麻省理工學(xué)院開放式課程創(chuàng)作共享規(guī)范翻譯發(fā)布,轉(zhuǎn)載請(qǐng)注明出處“我愛自然語言處理”:www.52nlp.cn
from:http://www.52nlp.cn/mit-nlp-third-lesson-probabilistic-language-modeling-third-part/
總結(jié)
以上是生活随笔為你收集整理的MIT自然语言处理第三讲:概率语言模型(第一、二、三部分)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: MIT自然语言处理第二讲:单词计数(第三
- 下一篇: MIT自然语言处理第三讲:概率语言模型(