日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 人文社科 > 生活经验 >内容正文

生活经验

keras 的 example 文件 pretrained_word_embeddings.py 解析

發布時間:2023/11/27 生活经验 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 keras 的 example 文件 pretrained_word_embeddings.py 解析 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

該代碼演示了 采用預定義的詞向量 來進行文本分類的功能。

數據集采用的是?20_newsgroup,18000篇新聞文章,一共涉及到20種話題,訓練神經網絡,給定一個新聞,可以識別出是屬于哪一類的新聞。

?

該代碼我在Linux下運行沒問題,但是在Windows下會報錯:

Traceback (most recent call last):File "pretrained_word_embeddings.py", line 43, in <module>for line in f:
UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 5456: illegal multibyte sequence

需要修改一下代碼,把

with open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt')) as f:

修改為:

with open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt'), encoding='utf-8') as f:

?

顯示獲取各單詞的預訓練的向量?embeddings_index;

?

然后代碼

sequences = tokenizer.texts_to_sequences(texts)

會把

Archive-name: atheism/resources
Alt-atheism-archive-name: resources
Last-modified: 11 December 1992
Version: 1.0Atheist ResourcesAddresses of Atheist OrganizationsUSAFREEDOM FROM RELIGION FOUNDATIONDarwin fish bumper stickers and assorted other atheist paraphernalia are
available from the Freedom From Religion Foundation in the US.

轉換為類似下面的結構:

[1237, 273, 1213, 1439, 1071, 1213, 1237, 273, 1439, 192, 2515, 348, 2964, 779, 332, 28, 45, 1628, 1439, 2516, 3, 1628, 2144, 780, 937, 29, 441, 2770, 8854, 4601, 7969, 11979, 5, 12806, 75, 1628, 19, 229, 29, 1, 937, 29, 441, 2770, 6, 1, 118, 558, 2, 90, 106, 482, 3979, 6602, 5375, 1871, 12260, 1632, 17687, 1828, 5101, 1828, 5101, 788, 1, 8854, 4601, 96, 4, 4601, 5455, 64, 1, 751, 563, 1716, 15, 71, 844, 24, 20, 1971, 5, 1, 389, 8854, 744, 1023, 1, 7762, 1300, 2912, 4601, 8, 73, 1698, 6, 1, 118, 558, 2, 1828, 5101, 16500, 13447, 73, 1261, 10982, 170, 66, 6, 1, 869, 2235, 2544, 534, 34, 79, 8854, 4601, 29, 6603, 3388, 264, 1505, 535, 49, 12, 343, 66, 60, 155, 2, 6603, 1043, 1, 427, 8, 73, 1698, 618, 4601, 417, 1628, 632, 11716, 4602, 814, 1628, 691, 3, 1, 467, 2163, 3, 2266, 7491, 5, 48, 15, 40, 135, 378, 8, 1, 467, 6359, 30, 101, 90, 1781, 5, 115, 101, 417, 1628, 632, 17061, 1448, 4317, 45, 860, 73, 1611, 2455, 3343, 467, 7491, 13132, 5814, 1301, 1781, 1, 467, 9477, 667, 11716, 323, 15, 1, 1074, 802, 332, 3, 1, 467, 558, 2, 417, 1628, 632, 90, 106, 482, 2030, 2408, 22, 13799, 853, 2030, 2408, 1871, 3793, 12524, 439, 3793, 13448, 691, 788, 691, 502, 1552, 11221, 116, 993, 558, 2, 2974, 996, 7674, 1184, 1346, 108, 828, 1871, 9478, 12807, 32, 7675, 460, 61, 110, 16, 3362, 22, 1950, 8, 691, 1711, 5622, 233, 1346, 1428, 4623, 1260, 12, 16501, 32, 1044, 7854, 564, 3955, 16501, 5, 1, 500, 3, 564, 27, 4602, 4, 9648, 2913, 10746, 558, 2, 7128, 97, 2456, 2420, 4623, 1260, 12, 16501, 90, 106, 482, 13133, 1346, 1428, 797, 2652, 632, 2366, 445, 3955, 681, 2477, 288, 1184, 

理由是:

print("tokenizer.index_word[1237]:", tokenizer.index_word[1237])
print("tokenizer.index_word[273]:", tokenizer.index_word[273])

打印結果為:

tokenizer.index_word[1237]: archive
tokenizer.index_word[273]: name

就是把對應的word轉為對應的index。

?

根據預定義的詞向量 embedding_matrix,來添加 Embedding 層

?

神經網絡結構為:

____________________________________________________________________________________________________
Layer (type)                                 Output Shape                            Param #
====================================================================================================
input_1 (InputLayer)                         (None, 1000)                            0
____________________________________________________________________________________________________
embedding_1 (Embedding)                      (None, 1000, 100)                       2000000
____________________________________________________________________________________________________
conv1d_1 (Conv1D)                            (None, 996, 128)                        64128
____________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D)               (None, 199, 128)                        0
____________________________________________________________________________________________________
conv1d_2 (Conv1D)                            (None, 195, 128)                        82048
____________________________________________________________________________________________________
max_pooling1d_2 (MaxPooling1D)               (None, 39, 128)                         0
____________________________________________________________________________________________________
conv1d_3 (Conv1D)                            (None, 35, 128)                         82048
____________________________________________________________________________________________________
global_max_pooling1d_1 (GlobalMaxPooling1D)  (None, 128)                             0
____________________________________________________________________________________________________
dense_1 (Dense)                              (None, 128)                             16512
____________________________________________________________________________________________________
dense_2 (Dense)                              (None, 20)                              2580
====================================================================================================
Total params: 2,247,316
Trainable params: 247,316
Non-trainable params: 2,000,000
____________________________________________________________________________________________________
None

?

?

總結

以上是生活随笔為你收集整理的keras 的 example 文件 pretrained_word_embeddings.py 解析的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。