使用CNN进行情感分类
生活随笔
收集整理的這篇文章主要介紹了
使用CNN进行情感分类
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
文章目錄
- 1. 讀取數(shù)據(jù)
- 2. 數(shù)據(jù)集拆分
- 3. 文本向量化
- 4. 建立CNN模型
- 5. 訓(xùn)練、測試
參考 基于深度學(xué)習(xí)的自然語言處理
1. 讀取數(shù)據(jù)
數(shù)據(jù)文件:
2. 數(shù)據(jù)集拆分
# 訓(xùn)練集 測試集拆分 from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(sentence, label, test_size=0.3, random_state=1)3. 文本向量化
- 訓(xùn)練 tokenizer,文本轉(zhuǎn)成 ids 序列
- pad ids 序列,使之有相同的長度
4. 建立CNN模型
from keras import layers embeddings_dim = 150 filters = 64 kernel_size = 5 batch_size = 64nn_model = keras.Sequential() nn_model.add(layers.Embedding(input_dim=vocab_size, output_dim=embeddings_dim, input_length=maxlen)) nn_model.add(layers.Conv1D(filters=filters,kernel_size=kernel_size,activation='relu')) nn_model.add(layers.GlobalMaxPool1D()) nn_model.add(layers.Dropout(0.3)) # 上面 GlobalMaxPool1D 后,維度少了一維,下面自定義layers再擴展一維 nn_model.add(layers.Lambda(lambda x : keras.backend.expand_dims(x, axis=-1))) nn_model.add(layers.Conv1D(filters=filters,kernel_size=kernel_size,activation='relu')) nn_model.add(layers.GlobalMaxPool1D()) nn_model.add(layers.Dropout(0.3)) nn_model.add(layers.Dense(10, activation='relu')) nn_model.add(layers.Dense(1, activation='sigmoid')) # 二分類sigmoid, 多分類 softmax參考文章:
Embedding層詳解
Keras: GlobalMaxPooling vs. MaxPooling
- 配置模型
5. 訓(xùn)練、測試
history = nn_model.fit(X_train,y_train,batch_size=batch_size,epochs=50,verbose=2,validation_data=(X_test,y_test)) # verbose 是否顯示日志信息,0不顯示,1顯示進度條,2不顯示進度條 loss, accuracy = nn_model.evaluate(X_train, y_train, verbose=1) print("訓(xùn)練集:loss {0:.3f}, 準(zhǔn)確率:{1:.3f}".format(loss, accuracy)) loss, accuracy = nn_model.evaluate(X_test, y_test, verbose=1) print("測試集:loss {0:.3f}, 準(zhǔn)確率:{1:.3f}".format(loss, accuracy))# 繪制訓(xùn)練曲線 from matplotlib import pyplot as plt pd.DataFrame(history.history).plot(figsize=(8, 5)) plt.grid(True) plt.gca().set_ylim(0, 1) # set the vertical range to [0-1] plt.show()輸出:
Epoch 1/50 11/11 - 1s - loss: 0.6933 - accuracy: 0.5014 - val_loss: 0.6933 - val_accuracy: 0.4633 Epoch 2/50 11/11 - 0s - loss: 0.6931 - accuracy: 0.5214 - val_loss: 0.6935 - val_accuracy: 0.4633 Epoch 3/50 11/11 - 1s - loss: 0.6930 - accuracy: 0.5257 - val_loss: 0.6936 - val_accuracy: 0.4633 ....省略 11/11 - 0s - loss: 0.0024 - accuracy: 1.0000 - val_loss: 0.7943 - val_accuracy: 0.7600 Epoch 49/50 11/11 - 1s - loss: 0.0016 - accuracy: 1.0000 - val_loss: 0.7970 - val_accuracy: 0.7600 Epoch 50/50 11/11 - 0s - loss: 0.0027 - accuracy: 1.0000 - val_loss: 0.7994 - val_accuracy: 0.7600 22/22 [==============================] - 0s 4ms/step - loss: 9.0586e-04 - accuracy: 1.0000 訓(xùn)練集:loss 0.001, 準(zhǔn)確率:1.000 10/10 [==============================] - 0s 5ms/step - loss: 0.7994 - accuracy: 0.7600 測試集:loss 0.799, 準(zhǔn)確率:0.760訓(xùn)練集:loss 0.001, 準(zhǔn)確率:1.000
測試集:loss 0.799, 準(zhǔn)確率:0.760
存在過擬合,訓(xùn)練集準(zhǔn)確率很高,測試集效果差
- 隨意測試
輸出:
預(yù)測i am not very good.的類別為: 0 預(yù)測i am very good.的類別為: 1總結(jié)
以上是生活随笔為你收集整理的使用CNN进行情感分类的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: LeetCode 587. 安装栅栏 /
- 下一篇: LeetCode 1230. 抛掷硬币(