[Python人工智能] 三十四.Bert模型 (3)keras-bert库构建Bert模型实现微博情感分析
從本專欄開始,作者正式研究Python深度學習、神經網絡及人工智能相關知識。前一篇文章開啟了新的內容——Bert,首先介紹Keras-bert庫安裝及基礎用法及文本分類工作。這篇文章將通過keras-bert庫構建Bert模型,并實現微博情感分析。基礎性文章,希望對您有所幫助!
這篇文章代碼參考“山陰少年”大佬的博客,并結合自己的經驗,對其代碼進行了詳細的復現和理解。希望對您有所幫助,尤其是初學者,也強烈推薦大家關注這位老師的文章。
- NLP(三十五)使用keras-bert實現文本多分類任務
- https://github.com/percent4/keras_bert_text_classification
微博情感預測結果如下所示:
原文: 《長津湖》這部電影真的非常好看,今天看完好開心,愛了愛了。強烈推薦大家,哈哈!!! 預測標簽: 喜悅原文: 聽到這個消息真心難受,很傷心,怎么這么悲劇。保佑保佑,哭 預測標簽: 哀傷原文: 憤怒,我真的挺生氣的,怒其不爭,哀其不幸啊! 預測標簽: 憤怒文章目錄
- 一.Bert模型引入
- 二.數據集介紹
- 三.機器學習微博情感分析
- 四.Bert模型微博情感分析
- 1.模型訓練
- 2.模型評估
- 3.模型預測
- 五.總結
本專欄主要結合作者之前的博客、AI經驗和相關視頻及論文介紹,后面隨著深入會講解更多的Python人工智能案例及應用。基礎性文章,希望對您有所幫助,如果文章中存在錯誤或不足之處,還請海涵!作者作為人工智能的菜鳥,希望大家能與我在這一筆一劃的博客中成長起來。寫了這么多年博客,嘗試第一個付費專欄,為小寶賺點奶粉錢,但更多博客尤其基礎性文章,還是會繼續免費分享,該專欄也會用心撰寫,望對得起讀者。如果有問題隨時私聊我,只望您能從這個系列中學到知識,一起加油喔~
- Keras下載地址:https://github.com/eastmountyxz/AI-for-Keras
- TensorFlow下載地址:https://github.com/eastmountyxz/AI-for-TensorFlow
前文賞析:
- [Python人工智能] 一.TensorFlow2.0環境搭建及神經網絡入門
- [Python人工智能] 二.TensorFlow基礎及一元直線預測案例
- [Python人工智能] 三.TensorFlow基礎之Session、變量、傳入值和激勵函數
- [Python人工智能] 四.TensorFlow創建回歸神經網絡及Optimizer優化器
- [Python人工智能] 五.Tensorboard可視化基本用法及繪制整個神經網絡
- [Python人工智能] 六.TensorFlow實現分類學習及MNIST手寫體識別案例
- [Python人工智能] 七.什么是過擬合及dropout解決神經網絡中的過擬合問題
- [Python人工智能] 八.卷積神經網絡CNN原理詳解及TensorFlow編寫CNN
- [Python人工智能] 九.gensim詞向量Word2Vec安裝及《慶余年》中文短文本相似度計算
- [Python人工智能] 十.Tensorflow+Opencv實現CNN自定義圖像分類案例及與機器學習KNN圖像分類算法對比
- [Python人工智能] 十一.Tensorflow如何保存神經網絡參數
- [Python人工智能] 十二.循環神經網絡RNN和LSTM原理詳解及TensorFlow編寫RNN分類案例
- [Python人工智能] 十三.如何評價神經網絡、loss曲線圖繪制、圖像分類案例的F值計算
- [Python人工智能] 十四.循環神經網絡LSTM RNN回歸案例之sin曲線預測
- [Python人工智能] 十五.無監督學習Autoencoder原理及聚類可視化案例詳解
- [Python人工智能] 十六.Keras環境搭建、入門基礎及回歸神經網絡案例
- [Python人工智能] 十七.Keras搭建分類神經網絡及MNIST數字圖像案例分析
- [Python人工智能] 十八.Keras搭建卷積神經網絡及CNN原理詳解
[Python人工智能] 十九.Keras搭建循環神經網絡分類案例及RNN原理詳解 - [Python人工智能] 二十.基于Keras+RNN的文本分類vs基于傳統機器學習的文本分類
- [Python人工智能] 二十一.Word2Vec+CNN中文文本分類詳解及與機器學習(RF\DTC\SVM\KNN\NB\LR)分類對比
- [Python人工智能] 二十二.基于大連理工情感詞典的情感分析和情緒計算
- [Python人工智能] 二十三.基于機器學習和TFIDF的情感分類(含詳細的NLP數據清洗)
- [Python人工智能] 二十四.易學智能GPU搭建Keras環境實現LSTM惡意URL請求分類
- [Python人工智能] 二十六.基于BiLSTM-CRF的醫學命名實體識別研究(上)數據預處理
- [Python人工智能] 二十七.基于BiLSTM-CRF的醫學命名實體識別研究(下)模型構建
- [Python人工智能] 二十八.Keras深度學習中文文本分類萬字總結(CNN、TextCNN、LSTM、BiLSTM、BiLSTM+Attention)
- [Python人工智能] 二十九.什么是生成對抗網絡GAN?基礎原理和代碼普及(1)
- [Python人工智能] 三十.Keras深度學習構建CNN識別阿拉伯手寫文字圖像
- [Python人工智能] 三十一.Keras實現BiLSTM微博情感分類和LDA主題挖掘分析
- [Python人工智能] 三十二.Bert模型 (1)Keras-bert基本用法及預訓練模型
- [Python人工智能] 三十三.Bert模型 (2)keras-bert庫構建Bert模型實現文本分類
- [Python人工智能] 三十四.Bert模型 (3)keras-bert庫構建Bert模型實現微博情感分析
一.Bert模型引入
Bert模型的原理知識將在后面的文章介紹,主要結合結合谷歌論文和模型優勢講解。
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- https://arxiv.org/pdf/1810.04805.pdf
- https://github.com/google-research/bert
BERT(Bidirectional Encoder Representation from Transformers)是一個預訓練的語言表征模型,是由谷歌AI團隊在2018年提出。該模型在機器閱讀理解頂級水平測試SQuAD1.1中表現出驚人的成績,并且在11種不同NLP測試中創出最佳成績,包括將GLUE基準推至80.4%(絕對改進7.6%),MultiNLI準確度達到86.7% (絕對改進率5.6%)等。可以預見的是,BERT將為NLP帶來里程碑式的改變,也是NLP領域近期最重要的進展。
Bert強調了不再像以往一樣采用傳統的單向語言模型或者把兩個單向語言模型進行淺層拼接的方法進行預訓練,而是采用新的masked language model(MLM),以致能生成深度的雙向語言表征。其模型框架圖如下所示,后面的文章再詳細介紹,這里僅作引入,推薦讀者閱讀原文。
二.數據集介紹
數據描述:
| 數據概覽 | 36 萬多條,帶情感標注 新浪微博,包含 4 種情感,其中喜悅約 20 萬條,憤怒、厭惡、低落各約 5 萬條 |
| 推薦實驗 | 情感/觀點/評論 傾向性分析 |
| 數據來源 | 新浪微博 |
| 原數據集 | 微博情感分析數據集,網上搜集,具體作者、來源不詳 |
| 數據描述 | 微博總體數目為361744: 喜悅-199496、憤怒-51714、厭惡-55267、低落-55267 |
| 對應類標 | 0: 喜悅, 1: 憤怒, 2: 厭惡, 3: 低落 |
數據示例:
注意,做到實驗分析,作者才發現“厭惡-55267”和“低落-55267”的數據集完全相同,因此我們做三分類問題,更重要的是思想。
下載地址:
- https://github.com/eastmountyxz/Datasets-Text-Mining
參考鏈接:
- https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/simplifyweibo_4_moods/intro.ipynb
三.機器學習微博情感分析
首先,我們介紹機器學習微博情感分析代碼。
- 讀取數據
- 數據預處理(中文分詞)
- TF-IDF計算
- 分類模型構建
- 預測及實驗評估
完整代碼如下:
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*- """ Created on Mon Sep 27 22:21:53 2021 @author: xiuzhang """ import jieba import pandas as pd import numpy as np from collections import Counter from scipy.sparse import coo_matrix from sklearn import feature_extraction from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfTransformer from sklearn.model_selection import train_test_split from sklearn.metrics import classification_reportfrom sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.tree import DecisionTreeClassifier from sklearn import svm from sklearn import neighbors from sklearn.naive_bayes import MultinomialNB from sklearn.ensemble import AdaBoostClassifier#----------------------------------------------------------------------------- #讀取數據 train_path = 'data/weibo_3_moods_train.csv' test_path = 'data/weibo_3_moods_test.csv' types = {0: '喜悅', 1: '憤怒', 2: '哀傷'} pd_train = pd.read_csv(train_path) pd_test = pd.read_csv(test_path) print('訓練集數目(總體):%d' % pd_train.shape[0]) print('測試集數目(總體):%d' % pd_test.shape[0])#中文分詞 train_words = [] test_words = [] train_labels = [] test_labels = [] stopwords = ["[", "]", ")", "(", ")", "(", "【", "】", "!", ",", "$","·", "?", ".", "、", "-", "—", ":", ":", "《", "》", "=","。", "…", "“", "?", "”", "~", " ", "-", "+", "\\", "‘","~", ";", "’", "...", "..", "&", "#", "....", ",", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10""的", "和", "之", "了", "哦", "那", "一個", ]for line in range(len(pd_train)):dict_label = pd_train['label'][line]dict_content = str(pd_train['content'][line]) #float=>str#print(dict_label,dict_content)cut_words = ""data = dict_content.strip("\n")data = data.replace(",", "") #一定要過濾符號 ","否則多列seg_list = jieba.cut(data, cut_all=False)for seg in seg_list:if seg not in stopwords:cut_words += seg + " "#print(cut_words)label = -1if dict_label=="喜悅":label = 0elif dict_label=="憤怒":label = 1elif dict_label=="哀傷":label = 2else:label = -1train_labels.append(label)train_words.append(cut_words) print(len(train_labels),len(train_words)) #209043 209043for line in range(len(pd_test)):dict_label = pd_test['label'][line]dict_content = str(pd_test['content'][line])cut_words = ""data = dict_content.strip("\n")data = data.replace(",", "")seg_list = jieba.cut(data, cut_all=False)for seg in seg_list:if seg not in stopwords:cut_words += seg + " "label = -1if dict_label=="喜悅":label = 0elif dict_label=="憤怒":label = 1elif dict_label=="哀傷":label = 2else:label = -1test_labels.append(label)test_words.append(cut_words) print(len(test_labels),len(test_words)) #97366 97366 print(test_labels[:5]) #['喜悅', '喜悅', '憤怒', '哀傷', '喜悅']#----------------------------------------------------------------------------- #TFIDF計算 #將文本中的詞語轉換為詞頻矩陣 矩陣元素a[i][j] 表示j詞在i類文本下的詞頻 vectorizer = CountVectorizer(min_df=100) #MemoryError控制參數#該類會統計每個詞語的tf-idf權值 transformer = TfidfTransformer()#第一個fit_transform是計算tf-idf 第二個fit_transform是將文本轉為詞頻矩陣 tfidf = transformer.fit_transform(vectorizer.fit_transform(train_words+test_words)) for n in tfidf[:5]:print(n) print(type(tfidf))#獲取詞袋模型中的所有詞語 word = vectorizer.get_feature_names() for n in word[:10]:print(n) print("單詞數量:", len(word))#將tf-idf矩陣抽取 元素w[i][j]表示j詞在i類文本中的tf-idf權重 X = coo_matrix(tfidf, dtype=np.float32).toarray() #稀疏矩陣 print(X.shape) print(X[:10])X_train = X[:len(train_labels)] X_test = X[len(train_labels):] y_train = train_labels y_test = test_labels print(len(X_train),len(X_test),len(y_train),len(y_test))#----------------------------------------------------------------------------- #分類模型 clf = MultinomialNB() #clf = svm.LinearSVC() #clf = LogisticRegression(solver='liblinear') #clf = RandomForestClassifier(n_estimators=10) #clf = neighbors.KNeighborsClassifier(n_neighbors=7) #clf = AdaBoostClassifier() clf.fit(X_train, y_train) print('模型的準確度:{}'.format(clf.score(X_test, y_test))) pre = clf.predict(X_test) print("分類") print(len(pre), len(y_test)) print(classification_report(y_test, pre, digits=4))輸出結果如下所示:
訓練集數目(總體):209043 測試集數目(總體):97366 Building prefix dict from the default dictionary ... Dumping model to file cache C:\Users\xdtech\AppData\Local\Temp\jieba.cache Loading model cost 0.885 seconds. Prefix dict has been built succesfully.<class 'scipy.sparse.csr.csr_matrix'> 單詞數量: 6997 (306409, 6997) [[0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.]...[0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.]] 209043 97366 209043 97366模型的準確度:0.6670398290984533 分類 97366 97366precision recall f1-score support0 0.6666 0.9833 0.7945 614531 0.6365 0.1184 0.1997 174612 0.7071 0.1330 0.2240 18452avg / total 0.6689 0.6670 0.5797 97366四.Bert模型微博情感分析
模型框架如下圖所示:
1.模型訓練
blog34_kerasbert_train.py
代碼如下:
# -*- coding: utf-8 -*- """ Created on Wed Nov 24 00:09:48 2021 @author: xiuzhang """ import json import codecs import pandas as pd from keras_bert import load_trained_model_from_checkpoint, Tokenizer from keras.layers import * from keras.models import Model from keras.optimizers import Adamimport os import tensorflow as tf os.environ["CUDA_DEVICES_ORDER"] = "PCI_BUS_IS" os.environ["CUDA_VISIBLE_DEVICES"] = "0"#指定了每個GPU進程中使用顯存的上限,0.9表示可以使用GPU 90%的資源進行訓練 gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.9) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))maxlen = 300 BATCH_SIZE = 8 config_path = 'chinese_L-12_H-768_A-12/bert_config.json' checkpoint_path = 'chinese_L-12_H-768_A-12/bert_model.ckpt' dict_path = 'chinese_L-12_H-768_A-12/vocab.txt'#讀取vocab詞典 token_dict = {} with codecs.open(dict_path, 'r', 'utf-8') as reader:for line in reader:token = line.strip()token_dict[token] = len(token_dict)#------------------------------------------類函數定義-------------------------------------- #詞典中添加否則Unknown class OurTokenizer(Tokenizer):def _tokenize(self, text):R = []for c in text:if c in self._token_dict:R.append(c)else:R.append('[UNK]') #剩余的字符是[UNK]return R tokenizer = OurTokenizer(token_dict)#數據填充 def seq_padding(X, padding=0):L = [len(x) for x in X]ML = max(L)return np.array([np.concatenate([x, [padding] * (ML - len(x))]) if len(x) < ML else x for x in X])class DataGenerator:def __init__(self, data, batch_size=BATCH_SIZE):self.data = dataself.batch_size = batch_sizeself.steps = len(self.data) // self.batch_sizeif len(self.data) % self.batch_size != 0:self.steps += 1def __len__(self):return self.stepsdef __iter__(self):while True:idxs = list(range(len(self.data)))np.random.shuffle(idxs)X1, X2, Y = [], [], []for i in idxs:d = self.data[i]text = d[0][:maxlen]x1, x2 = tokenizer.encode(first=text)y = d[1]X1.append(x1)X2.append(x2)Y.append(y)if len(X1) == self.batch_size or i == idxs[-1]:X1 = seq_padding(X1)X2 = seq_padding(X2)Y = seq_padding(Y)yield [X1, X2], Y[X1, X2, Y] = [], [], []#構建模型 def create_cls_model(num_labels):bert_model = load_trained_model_from_checkpoint(config_path, checkpoint_path, seq_len=None)for layer in bert_model.layers:layer.trainable = Truex1_in = Input(shape=(None,))x2_in = Input(shape=(None,))x = bert_model([x1_in, x2_in])cls_layer = Lambda(lambda x: x[:, 0])(x) #取出[CLS]對應的向量用來做分類p = Dense(num_labels, activation='softmax')(cls_layer) #多分類model = Model([x1_in, x2_in], p)model.compile(loss='categorical_crossentropy',optimizer=Adam(1e-5),metrics=['accuracy'])model.summary()return model#------------------------------------------主函數----------------------------------------- if __name__ == '__main__':#數據預處理train_df = pd.read_csv("data/weibo_3_moods_train.csv").fillna(value="")test_df = pd.read_csv("data/weibo_3_moods_test.csv").fillna(value="")print("begin data processing...")labels = train_df["label"].unique()print(labels)with open("label.json", "w", encoding="utf-8") as f:f.write(json.dumps(dict(zip(range(len(labels)), labels)), ensure_ascii=False, indent=2))train_data = []test_data = []for i in range(train_df.shape[0]):label, content = train_df.iloc[i, :]label_id = [0] * len(labels)for j, _ in enumerate(labels):if _ == label:label_id[j] = 1train_data.append((content, label_id))print(train_data[0])for i in range(test_df.shape[0]):label, content = test_df.iloc[i, :]label_id = [0] * len(labels)for j, _ in enumerate(labels):if _ == label:label_id[j] = 1test_data.append((content, label_id))print(len(train_data),len(test_data))print("finish data processing!\n")#模型訓練model = create_cls_model(len(labels))train_D = DataGenerator(train_data)test_D = DataGenerator(test_data)print("begin model training...")print(len(train_D), len(test_D)) #26131 12171model.fit_generator(train_D.__iter__(),steps_per_epoch=len(train_D),epochs=10,validation_data=test_D.__iter__(),validation_steps=len(test_D))print("finish model training!")#模型保存model.save('cls_mood.h5')print("Model saved!")result = model.evaluate_generator(test_D.__iter__(), steps=len(test_D))print("模型評估結果:", result)模型的架構如下圖所示,本文調用GPU實現。
- [‘哀傷’ ‘喜悅’ ‘憤怒’]
- 209043 97366
訓練結果如下:
Epoch 1/3 15000/15000 [==============================] - 3561s 237ms/step - loss: 0.6973 - acc: 0.6974 - val_loss: 1.2818 - val_acc: 0.6068Epoch 2/3 15000/15000 [==============================] - 3544s 236ms/step - loss: 0.5900 - acc: 0.7523 - val_loss: 1.5190 - val_acc: 0.6007Epoch 3/3 15000/15000 [==============================] - 3545s 236ms/step - loss: 0.4615 - acc: 0.8137 - val_loss: 1.6390 - val_acc: 0.5981finish model training! Model saved!如下圖所示,輸出訓練模型h5,約2GB大小。
最終輸出結果如下:
- 模型評估結果: [1.6390499637700617, 0.5981]
問題:單步太慢,整個訓練花費了3小時
If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
原因就是GPU的使用率太高了,顯存不足,將batch_size調小,占比90%。至今沒找到好的解決方法。
- train_D = DataGenerator(train_data)
26131 - test_D = DataGenerator(test_data)
12171
注意前面的batch_size會控制我們的批次,比如修改為32后
- 6533 3043
LSTM訓練核心代碼如下:
2.模型評估
blog34_kerasbert_evaluate.py
# -*- coding: utf-8 -*- """ Created on Thu Nov 25 00:09:02 2021 @author: xiuzhang 引用:https://github.com/percent4/keras_bert_text_classification """ import json import numpy as np import pandas as pd from keras.models import load_model from keras_bert import get_custom_objects from sklearn.metrics import classification_report from blog34_kerasbert_train import token_dict, OurTokenizermaxlen = 300#加載訓練好的模型 model = load_model("cls_mood.h5", custom_objects=get_custom_objects()) tokenizer = OurTokenizer(token_dict) with open("label.json", "r", encoding="utf-8") as f:label_dict = json.loads(f.read())#單句預測 def predict_single_text(text):text = text[:maxlen]x1, x2 = tokenizer.encode(first=text) #BERT TokenizeX1 = x1 + [0] * (maxlen - len(x1)) if len(x1) < maxlen else x1X2 = x2 + [0] * (maxlen - len(x2)) if len(x2) < maxlen else x2#print(X1,X2)#模型預測predicted = model.predict([[X1], [X2]])y = np.argmax(predicted[0])return label_dict[str(y)]#模型評估 def evaluate():test_df = pd.read_csv("data/weibo_3_moods_test.csv").fillna(value="")true_y_list, pred_y_list = [], []for i in range(test_df.shape[0]):true_y, content = test_df.iloc[i, :]pred_y = predict_single_text(content)print("predict %d samples" % (i+1))print(true_y,pred_y)true_y_list.append(true_y)pred_y_list.append(pred_y)return classification_report(true_y_list, pred_y_list, digits=4)#------------------------------------模型評估--------------------------------- output_data = evaluate() print("model evaluate result:\n") print(output_data)輸出結果如下所示:
這預測結果低得可怕,哈哈!可能和我數據集標注有關,好處是數據預測比較分散,而不是某個類別極高,其模型遷移效果如何呢?讀者可以嘗試,尤其是在質量更高的數據集上實驗。
輸出結果如下:
model evaluate result:precision recall f1-score support哀傷 0.4162 0.4301 0.4230 18452喜悅 0.7957 0.4244 0.5535 61453憤怒 0.2629 0.6854 0.3800 17461avg / total 0.6282 0.4723 0.4977 973663.模型預測
blog34_kerasbert_predict.py
# -*- coding: utf-8 -*- """ Created on Thu Nov 25 00:10:06 2021 @author: xiuzhang 引用:https://github.com/percent4/keras_bert_text_classification """ import time import json import numpy as npfrom blog34_kerasbert_train import token_dict, OurTokenizer from keras.models import load_model from keras_bert import get_custom_objectsmaxlen = 256 s_time = time.time()#加載訓練好的模型 model = load_model("cls_mood.h5", custom_objects=get_custom_objects()) tokenizer = OurTokenizer(token_dict) with open("label.json", "r", encoding="utf-8") as f:label_dict = json.loads(f.read())#預測示例語句 text = "《長津湖》這部電影真的非常好看,今天看完好開心,愛了愛了。強烈推薦大家,哈哈!!!" #text = "聽到這個消息真心難受,很傷心,怎么這么悲劇。保佑保佑,哭" #text = "憤怒,我真的挺生氣的,怒其不爭,哀其不幸啊!"#Tokenize text = text[:maxlen] x1, x2 = tokenizer.encode(first=text) X1 = x1 + [0] * (maxlen-len(x1)) if len(x1) < maxlen else x1 X2 = x2 + [0] * (maxlen-len(x2)) if len(x2) < maxlen else x2#模型預測 predicted = model.predict([[X1], [X2]]) y = np.argmax(predicted[0]) e_time = time.time() print("原文: %s" % text) print("預測標簽: %s" % label_dict[str(y)]) print("Cost time:", e_time-s_time)輸出結果如下所示,可以看到準確對三種類型的評價進行預測。
原文: 《長津湖》這部電影真的非常好看,今天看完好開心,愛了愛了。強烈推薦大家,哈哈!!! 預測標簽: 喜悅原文: 聽到這個消息真心難受,很傷心,怎么這么悲劇。保佑保佑,哭 預測標簽: 哀傷原文: 憤怒,我真的挺生氣的,怒其不爭,哀其不幸啊! 預測標簽: 憤怒五.總結
寫到這里,這篇文章就介紹結束了,后面還會持續分享,包括Bert實現命名實體識別及原理知識。真心希望這篇文章對您有所幫助,加油~
- 一.Bert模型引入
- 二.數據集介紹
- 三.機器學習微博情感分析
- 四.Bert模型微博情感分析
1.模型訓練
2.模型評估
3.模型預測
下載地址:
- https://github.com/eastmountyxz/AI-for-Keras
- https://github.com/eastmountyxz/AI-for-TensorFlow
(By:Eastmount 2021-12-06 夜于武漢 http://blog.csdn.net/eastmount/ )
參考文獻:
- [1] https://github.com/google-research/bert
- [2] https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip
- [3] https://github.com/percent4/keras_bert_text_classification
- [4] https://github.com/huanghao128/bert_example
- [5] 如何評價 BERT 模型? - 知乎
- [6] 【NLP】Google BERT模型原理詳解 - 李rumor
- [7] NLP(三十五)使用keras-bert實現文本多分類任務 - 山陰少年
- [8] tensorflow2+keras簡單實現BERT模型 - 小黃
- [9] NLP必讀 | 十分鐘讀懂谷歌BERT模型 - 奇點機智
- [10] BERT模型的詳細介紹 - IT小佬
- [11] [深度學習] 自然語言處理— 基于Keras Bert使用(上)- 天空是很藍
- [12] https://github.com/CyberZHG/keras-bert
- [13] https://github.com/bojone/bert4keras
- [14] https://github.com/ymcui/Chinese-BERT-wwm
- [15] 用深度學習做命名實體識別(六)-BERT介紹 - 滌生
- [16] https://blog.csdn.net/qq_36949278/article/details/117637187
總結
以上是生活随笔為你收集整理的[Python人工智能] 三十四.Bert模型 (3)keras-bert库构建Bert模型实现微博情感分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: [Python人工智能] 三十三.Ber
- 下一篇: [Python图像识别] 四十九.图像生