當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

keras 的 example 文件 pretrained_word_embeddings.py 解析

發布時間：2023/11/27 生活经验 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 keras 的 example 文件 pretrained_word_embeddings.py 解析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

該代碼演示了采用預定義的詞向量來進行文本分類的功能。

數據集采用的是?20_newsgroup，18000篇新聞文章，一共涉及到20種話題，訓練神經網絡，給定一個新聞，可以識別出是屬于哪一類的新聞。

該代碼我在Linux下運行沒問題，但是在Windows下會報錯：

Traceback (most recent call last):File "pretrained_word_embeddings.py", line 43, in <module>for line in f:
UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 5456: illegal multibyte sequence

需要修改一下代碼，把

with open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt')) as f:

修改為：

with open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt'), encoding='utf-8') as f:

顯示獲取各單詞的預訓練的向量?embeddings_index；

然后代碼

sequences = tokenizer.texts_to_sequences(texts)

會把

Archive-name: atheism/resources
Alt-atheism-archive-name: resources
Last-modified: 11 December 1992
Version: 1.0Atheist ResourcesAddresses of Atheist OrganizationsUSAFREEDOM FROM RELIGION FOUNDATIONDarwin fish bumper stickers and assorted other atheist paraphernalia are
available from the Freedom From Religion Foundation in the US.

轉換為類似下面的結構：

[1237, 273, 1213, 1439, 1071, 1213, 1237, 273, 1439, 192, 2515, 348, 2964, 779, 332, 28, 45, 1628, 1439, 2516, 3, 1628, 2144, 780, 937, 29, 441, 2770, 8854, 4601, 7969, 11979, 5, 12806, 75, 1628, 19, 229, 29, 1, 937, 29, 441, 2770, 6, 1, 118, 558, 2, 90, 106, 482, 3979, 6602, 5375, 1871, 12260, 1632, 17687, 1828, 5101, 1828, 5101, 788, 1, 8854, 4601, 96, 4, 4601, 5455, 64, 1, 751, 563, 1716, 15, 71, 844, 24, 20, 1971, 5, 1, 389, 8854, 744, 1023, 1, 7762, 1300, 2912, 4601, 8, 73, 1698, 6, 1, 118, 558, 2, 1828, 5101, 16500, 13447, 73, 1261, 10982, 170, 66, 6, 1, 869, 2235, 2544, 534, 34, 79, 8854, 4601, 29, 6603, 3388, 264, 1505, 535, 49, 12, 343, 66, 60, 155, 2, 6603, 1043, 1, 427, 8, 73, 1698, 618, 4601, 417, 1628, 632, 11716, 4602, 814, 1628, 691, 3, 1, 467, 2163, 3, 2266, 7491, 5, 48, 15, 40, 135, 378, 8, 1, 467, 6359, 30, 101, 90, 1781, 5, 115, 101, 417, 1628, 632, 17061, 1448, 4317, 45, 860, 73, 1611, 2455, 3343, 467, 7491, 13132, 5814, 1301, 1781, 1, 467, 9477, 667, 11716, 323, 15, 1, 1074, 802, 332, 3, 1, 467, 558, 2, 417, 1628, 632, 90, 106, 482, 2030, 2408, 22, 13799, 853, 2030, 2408, 1871, 3793, 12524, 439, 3793, 13448, 691, 788, 691, 502, 1552, 11221, 116, 993, 558, 2, 2974, 996, 7674, 1184, 1346, 108, 828, 1871, 9478, 12807, 32, 7675, 460, 61, 110, 16, 3362, 22, 1950, 8, 691, 1711, 5622, 233, 1346, 1428, 4623, 1260, 12, 16501, 32, 1044, 7854, 564, 3955, 16501, 5, 1, 500, 3, 564, 27, 4602, 4, 9648, 2913, 10746, 558, 2, 7128, 97, 2456, 2420, 4623, 1260, 12, 16501, 90, 106, 482, 13133, 1346, 1428, 797, 2652, 632, 2366, 445, 3955, 681, 2477, 288, 1184,

理由是：

print("tokenizer.index_word[1237]:", tokenizer.index_word[1237])
print("tokenizer.index_word[273]:", tokenizer.index_word[273])

打印結果為：

tokenizer.index_word[1237]: archive
tokenizer.index_word[273]: name

就是把對應的word轉為對應的index。

根據預定義的詞向量 embedding_matrix，來添加 Embedding 層

神經網絡結構為：

____________________________________________________________________________________________________
Layer (type)                                 Output Shape                            Param #
====================================================================================================
input_1 (InputLayer)                         (None, 1000)                            0
____________________________________________________________________________________________________
embedding_1 (Embedding)                      (None, 1000, 100)                       2000000
____________________________________________________________________________________________________
conv1d_1 (Conv1D)                            (None, 996, 128)                        64128
____________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D)               (None, 199, 128)                        0
____________________________________________________________________________________________________
conv1d_2 (Conv1D)                            (None, 195, 128)                        82048
____________________________________________________________________________________________________
max_pooling1d_2 (MaxPooling1D)               (None, 39, 128)                         0
____________________________________________________________________________________________________
conv1d_3 (Conv1D)                            (None, 35, 128)                         82048
____________________________________________________________________________________________________
global_max_pooling1d_1 (GlobalMaxPooling1D)  (None, 128)                             0
____________________________________________________________________________________________________
dense_1 (Dense)                              (None, 128)                             16512
____________________________________________________________________________________________________
dense_2 (Dense)                              (None, 20)                              2580
====================================================================================================
Total params: 2,247,316
Trainable params: 247,316
Non-trainable params: 2,000,000
____________________________________________________________________________________________________
None

總結

以上是生活随笔為你收集整理的keras 的 example 文件 pretrained_word_embeddings.py 解析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： keras 的 example 文件 n
下一篇： tensorflow models 工程