當(dāng)前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

pytorch BiLSTM+CRF代码详解重点

發(fā)布時間：2023/11/28 生活经验 37 豆豆

生活随笔收集整理的這篇文章主要介紹了 pytorch BiLSTM+CRF代码详解重点小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

一. BILSTM + CRF介紹

https://www.jianshu.com/p/97cb3b6db573

1.介紹

基于神經(jīng)網(wǎng)絡(luò)的方法，在命名實體識別任務(wù)中非常流行和普遍。如果你不知道Bi-LSTM和CRF是什么，你只需要記住他們分別是命名實體識別模型中的兩個層。

1.1開始之前

我們假設(shè)我們的數(shù)據(jù)集中有兩類實體——人名和地名，與之相對應(yīng)在我們的訓(xùn)練數(shù)據(jù)集中，有五類標(biāo)簽：

B-Person， I- Person，B-Organization，I-Organization

假設(shè)句子x由五個字符w1,w2,w3,w4,w5組成，其中【w1,w2】為人名類實體，【w3】為地名類實體，其他字符標(biāo)簽為“O”。

1.2BiLSTM-CRF模型

以下將給出模型的結(jié)構(gòu)：
第一，句子x中的每一個單元都代表著由字嵌入或詞嵌入構(gòu)成的向量。其中，字嵌入是隨機(jī)初始化的，詞嵌入是通過數(shù)據(jù)訓(xùn)練得到的。所有的嵌入在訓(xùn)練過程中都會調(diào)整到最優(yōu)。
第二，這些字或詞嵌入為BiLSTM-CRF模型的輸入，輸出的是句子x中每個單元的標(biāo)簽。

Bi-LSTM結(jié)構(gòu)圖

盡管一般不需要詳細(xì)了解BiLSTM層的原理，但是為了更容易知道CRF層的運行原理，我們需要知道BiLSTM的輸出層。

圖2.Bi-LSTM標(biāo)簽預(yù)測原理圖

如上圖所示，BiLSTM層的輸出為每一個標(biāo)簽的預(yù)測分值，例如，對于單元w0,BiLSTM層輸出的是

1.5 (B-Person), 0.9 (I-Person), 0.1 (B-Organization), 0.08 (I-Organization) 0.05 (O).

這些分值將作為CRF的輸入。

1.3 如果沒有CRF層會怎樣

你也許已經(jīng)發(fā)現(xiàn)了，即使沒有CRF層，我們也可以訓(xùn)練一個BiLSTM命名實體識別模型，如圖所示：

圖3.去除CRF的BiLSTM命名實體識別模型

由于BiLSTM的輸出為單元的每一個標(biāo)簽分值，我們可以挑選分值最高的一個作為該單元的標(biāo)簽。例如，對于單元w0,“B-Person”有最高分值—— 1.5，因此我們可以挑選“B-Person”作為w0的預(yù)測標(biāo)簽。同理，我們可以得到w1——“I-Person”，w2—— “O” ，w3——“B-Organization”，w4——“O”。
雖然我們可以得到句子x中每個單元的正確標(biāo)簽，但是我們不能保證標(biāo)簽每次都是預(yù)測正確的。例如，圖4.中的例子，標(biāo)簽序列是“I-Organization I-Person” and “B-Organization I-Person”，很顯然這是錯誤的。

在這里插入圖片描述

1.4 CRF層能從訓(xùn)練數(shù)據(jù)中獲得約束性的規(guī)則

CRF層可以為最后預(yù)測的標(biāo)簽添加一些約束來保證預(yù)測的標(biāo)簽是合法的。在訓(xùn)練數(shù)據(jù)訓(xùn)練過程中，這些約束可以通過CRF層自動學(xué)習(xí)到。
這些約束可以是：
I：句子中第一個詞總是以標(biāo)簽“B-“ 或 “O”開始，而不是“I-”
II：標(biāo)簽“B-label1 I-label2 I-label3 I-…”,label1, label2, label3應(yīng)該屬于同一類實體。例如，“B-Person I-Person” 是合法的序列, 但是“B-Person I-Organization” 是非法標(biāo)簽序列.
III：標(biāo)簽序列“O I-label” is 非法的.實體標(biāo)簽的首個標(biāo)簽應(yīng)該是 “B-“ ，而非 “I-“, 換句話說,有效的標(biāo)簽序列應(yīng)該是“O B-label”。
有了這些約束，標(biāo)簽序列預(yù)測中非法序列出現(xiàn)的概率將會大大降低。

二. 標(biāo)簽的score和損失函數(shù)的定義

https://zhuanlan.zhihu.com/p/27338210

Bi-LSTM layer的輸出維度是tag size，這就相當(dāng)于是每個詞 w_i 映射到tag的發(fā)射概率值，設(shè)Bi-LSTM的輸出矩陣為P，其中P_i,j代表詞w_i映射到tag_j的非歸一化概率。對于CRF來說，我們假定存在一個轉(zhuǎn)移矩陣A，則A_i,j代表tag_i轉(zhuǎn)移到tag_j的轉(zhuǎn)移概率。
對于輸入序列 X 對應(yīng)的輸出tag序列 y，定義分?jǐn)?shù)為

在這里插入圖片描述

利用Softmax函數(shù)，我們?yōu)槊恳粋€正確的tag序列y定義一個概率值（Y_X代表所有的tag序列，包括不可能出現(xiàn)的）

在這里插入圖片描述
因而在訓(xùn)練中，我們只需要最大化似然概率p(y|X)即可，這里我們利用對數(shù)似然
在這里插入圖片描述
所以我們將損失函數(shù)定義為-log(p(y|X))，就可以利用梯度下降法來進(jìn)行網(wǎng)絡(luò)的學(xué)習(xí)了。
loss function:
在這里插入圖片描述

在對損失函數(shù)進(jìn)行計算的時候，S(X,y)的計算很簡單，而

在這里插入圖片描述（下面記作logsumexp）的計算稍微復(fù)雜一些，因為需要計算每一條可能路徑的分?jǐn)?shù)。這里用一種簡便的方法，對于到詞w_i+1的路徑，可以先把到詞w_i的logsumexp計算出來，因為
在這里插入圖片描述

因此先計算每一步的路徑分?jǐn)?shù)和直接計算全局分?jǐn)?shù)相同，但這樣可以大大減少計算的時間。

三. 對于損失函數(shù)的詳細(xì)解釋

這篇文章對于理解十分有用

https://blog.csdn.net/cuihuijun1hao/article/details/79405740

舉例說【我愛中國人民】對應(yīng)標(biāo)簽【N V N】那這個標(biāo)簽就是一個完整的路徑，也就對應(yīng)一個Score值。
接下來我想講的是這個公式:

在這里插入圖片描述

這個公式成立是很顯然的，動筆算一算就知道了，代碼里其實就是用了這個公式的原理。

def _forward_alg(self, feats): # Do the forward algorithm to compute the partition function init_alphas = torch.full((1, self.tagset_size), -10000.) # START_TAG has all of the score. init_alphas[0][self.tag_to_ix[START_TAG]] = 0. # Wrap in a variable so that we will get automatic backprop forward_var = init_alphas # Iterate through the sentence for feat in feats: alphas_t = [] # The forward tensors at this timestep for next_tag in range(self.tagset_size): # broadcast the emission score: it is the same regardless of # the previous tag emit_score = feat[next_tag].view( 1, -1).expand(1, self.tagset_size) # the ith entry of trans_score is the score of transitioning to # next_tag from i trans_score = self.transitions[next_tag].view(1, -1) # The ith entry of next_tag_var is the value for the # edge (i -> next_tag) before we do log-sum-exp next_tag_var = forward_var + trans_score + emit_score # The forward variable for this tag is log-sum-exp of all the # scores. alphas_t.append(log_sum_exp(next_tag_var).view(1)) forward_var = torch.cat(alphas_t).view(1, -1) terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]] alpha = log_sum_exp(terminal_var) return alpha

我們看到有這么一段代碼 :

next_tag_var = forward_var + trans_score + emit_score

我們主要就是來講講他。
首先這個算法的思想是：假設(shè)我們要做一個詞性標(biāo)注的任務(wù)，對句子【我愛中華人民】，我們要對這個句子做

在這里插入圖片描述
意思就是對這個句子所有可能的標(biāo)注，都算出來他們的Score，然后按照指數(shù)次冪加起來，再取對數(shù)。一般來說取所有可能的標(biāo)注情況比較復(fù)雜，我們這里舉例是長度為三，但是實際過程中，可能比這個要大得多，所以我們需要有一個簡單高效得算法。也就是我們程序中得用得算法，他是這么算得:
先算出【我，愛】可能標(biāo)注得所有情況，取 log_sum_exp 然后加上轉(zhuǎn)換到【中國人民】得特征值再加上【中國人民】對應(yīng)得某個標(biāo)簽得特征值。其等價于【我，愛，中國人民】所有可能特征值指數(shù)次冪相加，然后取對數(shù).
接下來我們來驗證一下是不是這樣

首先我們假設(shè)詞性一共只有兩種名詞N 和動詞 V
那么【我，愛】得詞性組合一共有四種 N + N，N + V, V + N, V + V
那么【愛】標(biāo)注為N時得log_sum_exp 為

在這里插入圖片描述

【愛】標(biāo)注為 V時的 log_sum_exp為

在這里插入圖片描述
我們的forward列表里就是存在著這兩個值，即：
在這里插入圖片描述

假設(shè)【中華人民】得詞性為N,我們按照代碼來寫一下公式,在forward列表對應(yīng)位置相加就是這樣

在這里插入圖片描述
在這里插入圖片描述

四. 代碼塊詳細(xì)說明：

https://blog.csdn.net/Jason__Liang/article/details/81772632

先說明兩個重要的矩陣:

feats: 發(fā)射矩陣(emit score)是sentence 在embedding后,再經(jīng)過LSTM后得到的矩陣(也就是LSTM的輸出), 維度為11 * 5 (11為sentence 的length，5是標(biāo)簽數(shù)）。這個矩陣表示經(jīng)過LSTM后sentence的每個word對應(yīng)的每個labels的得分)。表示發(fā)射概率。

self.transitions:轉(zhuǎn)移矩陣，維度為55，transitions[i][j]表示label j轉(zhuǎn)移到label i的概率。transtion[i]維度為15,表示每個label轉(zhuǎn)移到label i的概率。表示概率矩陣

1. def log_sum_exp(vec)

# compute log sum exp in numerically stable way for the forward algorithm
def log_sum_exp(vec): #vec是1*5, type是Variable max_score = vec[0, argmax(vec)] # max_score維度是１，　max_score.view(1,-1)維度是１＊１， # max_score.view(1, -1).expand(1, vec.size()[1])的維度是１＊５ max_score_broadcast = max_score.view(1, -1).expand(1, vec.size()[1]) # 里面先做減法，減去最大值可以避免e的指數(shù)次，計算機(jī)上溢 return max_score + \ torch.log(torch.sum(torch.exp(vec - max_score_broadcast)))

你可能會疑問return 的結(jié)果為什么先減去max_score.其實這是一個小技巧，因為一上來就做指數(shù)運算可能會引起計算結(jié)果溢出，先減去score，經(jīng)過log_sum_exp后，再把max_score給加上。
其實就等同于：

return torch.log(torch.sum(torch.exp(vec)))

2. def neg_log_likelihood(self, sentence, tags)

如果你完整地把代碼讀完，你會發(fā)現(xiàn)neg_log_likelihood()這個函數(shù)是loss function.

loss = model.neg_log_likelihood(sentence_in, targets)

我們來分析一下neg_log_likelihood()函數(shù)代碼：

    def neg_log_likelihood(self, sentence, tags): # feats: 11*5 經(jīng)過了LSTM+Linear矩陣后的輸出，之后作為CRF的輸入。 feats = self._get_lstm_features(sentence) forward_score = self._forward_alg(feats) gold_score = self._score_sentence(feats, tags) return forward_score - gold_score

你在這里可能會有疑問：問什么forward_score - gold_score可以作為loss呢。
這里，我們回顧一下我們在上文中說明的loss function函數(shù)公式:

在這里插入圖片描述

你就會發(fā)現(xiàn)forward_score和gold_score分別表示上述等式右邊的兩項。

3. def _forward_alg(self, feats)：

我們通過上一個函數(shù)的分析得知這個函數(shù)就是用來求forward_score的,也就是loss function等式右邊的第一項：

在這里插入圖片描述

 # 預(yù)測序列的得分# 只是根據(jù)隨機(jī)的transitions，前向傳播算出的一個score#用到了動態(tài)規(guī)劃的思想，但因為用的是隨機(jī)的轉(zhuǎn)移矩陣，算出的值很大score>20def _forward_alg(self, feats): # do the forward algorithm to compute the partition function init_alphas = torch.full((1, self.tagset_size), -10000.) # 1*5 而且全是-10000 # START_TAG has all of the score # 因為start tag是4，所以tensor([[-10000., -10000., -10000., 0., -10000.]])， # 將start的值為零，表示開始進(jìn)行網(wǎng)絡(luò)的傳播， init_alphas[0][self.tag_to_ix[START_TAG]] = 0 # warp in a variable so that we will get automatic backprop forward_var = init_alphas # 初始狀態(tài)的forward_var，隨著step t變化 # iterate through the sentence # 會迭代feats的行數(shù)次 for feat in feats: #feat的維度是５ 依次把每一行取出來~ alphas_t = [] # the forward tensors at this timestep for next_tag in range(self.tagset_size): #next tag 就是簡單 i，從0到len # broadcast the emission(發(fā)射) score: # it is the same regardless of the previous tag # 維度是1*5 LSTM生成的矩陣是emit score emit_score = feat[next_tag].view( 1, -1).expand(1, self.tagset_size) # the i_th entry of trans_score is the score of transitioning # to next_tag from i trans_score = self.transitions[next_tag].view(1, -1) # 維度是1*5 # The ith entry of next_tag_var is the value for the # edge (i -> next_tag) before we do log-sum-exp # 第一次迭代時理解： # trans_score所有其他標(biāo)簽到Ｂ標(biāo)簽的概率 # 由lstm運行進(jìn)入隱層再到輸出層得到標(biāo)簽Ｂ的概率，emit_score維度是１＊５，5個值是相同的 next_tag_var = forward_var + trans_score + emit_score # The forward variable for this tag is log-sum-exp of all the scores alphas_t.append(log_sum_exp(next_tag_var).view(1)) # 此時的alphas t 是一個長度為5，例如<class 'list'>: # [tensor(0.8259), tensor(2.1739), tensor(1.3526), tensor(-9999.7168), tensor(-0.7102)] forward_var = torch.cat(alphas_t).view(1, -1) #到第（t-1）step時５個標(biāo)簽的各自分?jǐn)?shù) # 最后只將最后一個單詞的forward var與轉(zhuǎn)移 stop tag的概率相加 # tensor([[ 21.1036, 18.8673, 20.7906, -9982.2734, -9980.3135]]) terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]] alpha = log_sum_exp(terminal_var) # alpha是一個0維的tensor return alpha

4. def _score_sentence(self, feats, tags)

由2的函數(shù)分析，我們知道這個函數(shù)就是求gold_score,即loss function的第二項

# 根據(jù)真實的標(biāo)簽算出的一個score，# 這與上面的def _forward_alg(self, feats)共同之處在于：# 兩者都是用的隨機(jī)轉(zhuǎn)移矩陣算的score# 不同地方在于，上面那個函數(shù)算了一個最大可能路徑，但實際上可能不是真實的各個標(biāo)簽轉(zhuǎn)移的值 # 例如：真實標(biāo)簽是N V V,但是因為transitions是隨機(jī)的，所以上面的函數(shù)得到其實是N N N這樣， # 兩者之間的score就有了差距。而后來的反向傳播，就能夠更新transitions，使得轉(zhuǎn)移矩陣逼近 #真實的“轉(zhuǎn)移矩陣” # 得到gold_seq tag的score 即根據(jù)真實的label 來計算一個score， # 但是因為轉(zhuǎn)移矩陣是隨機(jī)生成的，故算出來的score不是最理想的值 def _score_sentence(self, feats, tags): #feats 11*5 tag 11 維 # gives the score of a provied tag sequence score = torch.zeros(1) # 將START_TAG的標(biāo)簽３拼接到tag序列最前面，這樣tag就是12個了 tags = torch.cat([torch.tensor([self.tag_to_ix[START_TAG]], dtype=torch.long), tags]) for i, feat in enumerate(feats): # self.transitions[tags[i + 1], tags[i]] 實際得到的是從標(biāo)簽i到標(biāo)簽i+1的轉(zhuǎn)移概率 # feat[tags[i+1]], feat是step i 的輸出結(jié)果，有５個值， # 對應(yīng)B, I, E, START_TAG, END_TAG, 取對應(yīng)標(biāo)簽的值 # transition【j,i】 就是從i ->j 的轉(zhuǎn)移概率值 score = score + \ self.transitions[tags[i+1], tags[i]] + feat[tags[i + 1]] score = score + self.transitions[self.tag_to_ix[STOP_TAG], tags[-1]] return score

5. def _viterbi_decode(self, feats):

# 維特比解碼， 實際上就是在預(yù)測的時候使用了， 輸出得分與路徑值# 預(yù)測序列的得分def _viterbi_decode(self, feats): backpointers = [] # initialize the viterbi variables in long space init_vvars = torch.full((1, self.tagset_size), -10000.) init_vvars[0][self.tag_to_ix[START_TAG]] = 0 # forward_var at step i holds the viterbi variables for step i-1 forward_var = init_vvars for feat in feats: bptrs_t = [] # holds the backpointers for this step viterbivars_t = [] # holds the viterbi variables for this step for next_tag in range(self.tagset_size): # next-tag_var[i] holds the viterbi variable for tag i # at the previous step, plus the score of transitioning # from tag i to next_tag. # we don't include the emission scores here because the max # does not depend on them(we add them in below) # 其他標(biāo)簽（B,I,E,Start,End）到標(biāo)簽next_tag的概率 next_tag_var = forward_var + self.transitions[next_tag] best_tag_id = argmax(next_tag_var) bptrs_t.append(best_tag_id) viterbivars_t.append(next_tag_var[0][best_tag_id].view(1)) # now add in the emssion scores, and assign forward_var to the set # of viterbi variables we just computed # 從step0到step(i-1)時5個序列中每個序列的最大score forward_var = (torch.cat(viterbivars_t) + feat).view(1, -1) backpointers.append(bptrs_t) # bptrs_t有５個元素 # transition to STOP_TAG # 其他標(biāo)簽到STOP_TAG的轉(zhuǎn)移概率 terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]] best_tag_id = argmax(terminal_var) path_score = terminal_var[0][best_tag_id] # follow the back pointers to decode the best path best_path = [best_tag_id] for bptrs_t in reversed(backpointers): best_tag_id = bptrs_t[best_tag_id] best_path.append(best_tag_id) # pop off the start tag # we don't want to return that ti the caller start = best_path.pop() assert start == self.tag_to_ix[START_TAG] # Sanity check best_path.reverse() # 把從后向前的路徑正過來 return path_score, best_path

如果對于該函數(shù)還沒有太理解，可以參考這篇博客：

https://blog.csdn.net/appleml/article/details/78579675

總結(jié)

以上就是我結(jié)合了幾篇比較不錯的博客后的總結(jié)，歡迎大家提問

作者：17shouuu
鏈接：https://www.jianshu.com/p/566c6faace64
來源：簡書
簡書著作權(quán)歸作者所有，任何形式的轉(zhuǎn)載都請聯(lián)系作者獲得授權(quán)并注明出處。

總結(jié)

以上是生活随笔為你收集整理的pytorch BiLSTM+CRF代码详解重点的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： PyTorch里面的torch.nn.P
下一篇： Bi-LSTM-CRF for Sequ

生活经验

pytorch BiLSTM+CRF代码详解 重点

一. BILSTM + CRF介紹

1.介紹

1.1開始之前

1.2BiLSTM-CRF模型

1.3 如果沒有CRF層會怎樣

1.4 CRF層能從訓(xùn)練數(shù)據(jù)中獲得約束性的規(guī)則

二. 標(biāo)簽的score和損失函數(shù)的定義

三. 對于損失函數(shù)的詳細(xì)解釋

四. 代碼塊詳細(xì)說明：

1. def log_sum_exp(vec)

2. def neg_log_likelihood(self, sentence, tags)

3. def _forward_alg(self, feats)：

4. def _score_sentence(self, feats, tags)

5. def _viterbi_decode(self, feats):

總結(jié)

總結(jié)

pytorch BiLSTM+CRF代码详解重点