日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

使用CNN进行情感分类

發(fā)布時間:2024/7/5 编程问答 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 使用CNN进行情感分类 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

    • 1. 讀取數(shù)據(jù)
    • 2. 數(shù)據(jù)集拆分
    • 3. 文本向量化
    • 4. 建立CNN模型
    • 5. 訓(xùn)練、測試

參考 基于深度學(xué)習(xí)的自然語言處理

1. 讀取數(shù)據(jù)

數(shù)據(jù)文件:

import numpy as np import pandas as pddata = pd.read_csv("yelp_labelled.txt", sep='\t', names=['sentence', 'label'])data.head() # 1000條數(shù)據(jù)

# 數(shù)據(jù) X 和 標(biāo)簽 y sentence = data['sentence'].values label = data['label'].values

2. 數(shù)據(jù)集拆分

# 訓(xùn)練集 測試集拆分 from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(sentence, label, test_size=0.3, random_state=1)

3. 文本向量化

  • 訓(xùn)練 tokenizer,文本轉(zhuǎn)成 ids 序列
# 文本向量化 import keras from keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(num_words=6000) tokenizer.fit_on_texts(X_train) # 訓(xùn)練tokenizer X_train = tokenizer.texts_to_sequences(X_train) # 轉(zhuǎn)成 [[ids...],[ids...],...] X_test = tokenizer.texts_to_sequences(X_test) vocab_size = len(tokenizer.word_index)+1 # +1 是因為index 0, 0 不對應(yīng)任何詞,用來pad
  • pad ids 序列,使之有相同的長度
maxlen = 100 # pad 保證每個句子的長度相等 from keras.preprocessing.sequence import pad_sequences X_train = pad_sequences(X_train, maxlen=maxlen, padding='post') # post 尾部補0,pre 前部補0 X_test = pad_sequences(X_test, maxlen=maxlen, padding='post')

4. 建立CNN模型

from keras import layers embeddings_dim = 150 filters = 64 kernel_size = 5 batch_size = 64nn_model = keras.Sequential() nn_model.add(layers.Embedding(input_dim=vocab_size, output_dim=embeddings_dim, input_length=maxlen)) nn_model.add(layers.Conv1D(filters=filters,kernel_size=kernel_size,activation='relu')) nn_model.add(layers.GlobalMaxPool1D()) nn_model.add(layers.Dropout(0.3)) # 上面 GlobalMaxPool1D 后,維度少了一維,下面自定義layers再擴展一維 nn_model.add(layers.Lambda(lambda x : keras.backend.expand_dims(x, axis=-1))) nn_model.add(layers.Conv1D(filters=filters,kernel_size=kernel_size,activation='relu')) nn_model.add(layers.GlobalMaxPool1D()) nn_model.add(layers.Dropout(0.3)) nn_model.add(layers.Dense(10, activation='relu')) nn_model.add(layers.Dense(1, activation='sigmoid')) # 二分類sigmoid, 多分類 softmax

參考文章:
Embedding層詳解
Keras: GlobalMaxPooling vs. MaxPooling

  • 配置模型
nn_model.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy']) nn_model.summary() from keras.utils import plot_model plot_model(nn_model, to_file='model.jpg') # 繪制模型結(jié)構(gòu)到文件 Model: "sequential_4" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_4 (Embedding) (None, 100, 150) 251400 _________________________________________________________________ conv1d_8 (Conv1D) (None, 96, 64) 48064 _________________________________________________________________ global_max_pooling1d_7 (Glob (None, 64) 0 _________________________________________________________________ dropout_7 (Dropout) (None, 64) 0 _________________________________________________________________ lambda_4 (Lambda) (None, 64, 1) 0 _________________________________________________________________ conv1d_9 (Conv1D) (None, 60, 64) 384 _________________________________________________________________ global_max_pooling1d_8 (Glob (None, 64) 0 _________________________________________________________________ dropout_8 (Dropout) (None, 64) 0 _________________________________________________________________ dense_6 (Dense) (None, 10) 650 _________________________________________________________________ dense_7 (Dense) (None, 1) 11 ================================================================= Total params: 300,509 Trainable params: 300,509 Non-trainable params: 0

5. 訓(xùn)練、測試

history = nn_model.fit(X_train,y_train,batch_size=batch_size,epochs=50,verbose=2,validation_data=(X_test,y_test)) # verbose 是否顯示日志信息,0不顯示,1顯示進度條,2不顯示進度條 loss, accuracy = nn_model.evaluate(X_train, y_train, verbose=1) print("訓(xùn)練集:loss {0:.3f}, 準(zhǔn)確率:{1:.3f}".format(loss, accuracy)) loss, accuracy = nn_model.evaluate(X_test, y_test, verbose=1) print("測試集:loss {0:.3f}, 準(zhǔn)確率:{1:.3f}".format(loss, accuracy))# 繪制訓(xùn)練曲線 from matplotlib import pyplot as plt pd.DataFrame(history.history).plot(figsize=(8, 5)) plt.grid(True) plt.gca().set_ylim(0, 1) # set the vertical range to [0-1] plt.show()

輸出:

Epoch 1/50 11/11 - 1s - loss: 0.6933 - accuracy: 0.5014 - val_loss: 0.6933 - val_accuracy: 0.4633 Epoch 2/50 11/11 - 0s - loss: 0.6931 - accuracy: 0.5214 - val_loss: 0.6935 - val_accuracy: 0.4633 Epoch 3/50 11/11 - 1s - loss: 0.6930 - accuracy: 0.5257 - val_loss: 0.6936 - val_accuracy: 0.4633 ....省略 11/11 - 0s - loss: 0.0024 - accuracy: 1.0000 - val_loss: 0.7943 - val_accuracy: 0.7600 Epoch 49/50 11/11 - 1s - loss: 0.0016 - accuracy: 1.0000 - val_loss: 0.7970 - val_accuracy: 0.7600 Epoch 50/50 11/11 - 0s - loss: 0.0027 - accuracy: 1.0000 - val_loss: 0.7994 - val_accuracy: 0.7600 22/22 [==============================] - 0s 4ms/step - loss: 9.0586e-04 - accuracy: 1.0000 訓(xùn)練集:loss 0.001, 準(zhǔn)確率:1.000 10/10 [==============================] - 0s 5ms/step - loss: 0.7994 - accuracy: 0.7600 測試集:loss 0.799, 準(zhǔn)確率:0.760

訓(xùn)練集:loss 0.001, 準(zhǔn)確率:1.000
測試集:loss 0.799, 準(zhǔn)確率:0.760
存在過擬合,訓(xùn)練集準(zhǔn)確率很高,測試集效果差

  • 隨意測試
text = ["i am not very good.", "i am very good."] x = tokenizer.texts_to_sequences(text) x = pad_sequences(x, maxlen=maxlen, padding='post') pred = nn_model.predict(x) print("預(yù)測{}的類別為:".format(text[0]), 1 if pred[0][0]>=0.5 else 0) print("預(yù)測{}的類別為:".format(text[1]), 1 if pred[1][0]>=0.5 else 0)

輸出:

預(yù)測i am not very good.的類別為: 0 預(yù)測i am very good.的類別為: 1

總結(jié)

以上是生活随笔為你收集整理的使用CNN进行情感分类的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。