當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

廖雪峰python教程菜鸟变高手_python怎样

發(fā)布時(shí)間：2023/12/18 python 24 豆豆

生活随笔收集整理的這篇文章主要介紹了廖雪峰python教程菜鸟变高手_python怎样小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

eras是目前最流行的深度學(xué)習(xí)庫之一。它使用起來相當(dāng)簡單，它使您能夠通過幾行代碼構(gòu)建神經(jīng)網(wǎng)絡(luò)。在這篇文章中，你將會發(fā)現(xiàn)如何用Keras建立一個(gè)神經(jīng)網(wǎng)絡(luò)，預(yù)測電影用戶評論的情感分為兩類：積極或消極。我們將使用Sentiment Analysis著名的imdb評論數(shù)據(jù)集來做到這一點(diǎn)。我們將構(gòu)建的模型也可以應(yīng)用于其他機(jī)器學(xué)習(xí)問題，只需進(jìn)行一些更改。

什么是Keras？

Keras是一個(gè)開放源代碼的python庫，使您能夠通過幾行代碼輕松構(gòu)建神經(jīng)網(wǎng)絡(luò)。該庫能夠在TensorFlow，Microsoft Cognitive Toolkit，Theano和MXNet上運(yùn)行。Tensorflow和Theano是Python中用來構(gòu)建深度學(xué)習(xí)算法的最常用的數(shù)字平臺，但它們可能相當(dāng)復(fù)雜且難以使用。相比之下，Keras提供了創(chuàng)建深度學(xué)習(xí)模型的簡便方法。它是為了盡可能快速和簡單地構(gòu)建神經(jīng)網(wǎng)絡(luò)而創(chuàng)建的。它的創(chuàng)造者FranoisChollet將注意力放在極簡主義，模塊化，極簡主義和python的支持上。Keras可以用于GPU和CPU。它支持Python 2和3。

什么是情緒分析？

借助情感分析，我們想要確定例如演講者或作家對于文檔，交互或事件的態(tài)度（例如情感）。因此，這是一個(gè)自然語言處理問題，需要理解文本，以預(yù)測潛在的意圖。情緒主要分為積極的，消極的和中立的類別。通過使用情緒分析，我們希望根據(jù)他撰寫的評論，預(yù)測客戶對產(chǎn)品的意見和態(tài)度。因此，情緒分析廣泛應(yīng)用于諸如評論，調(diào)查，文檔等等。

imdb數(shù)據(jù)集

imdb情緒分類數(shù)據(jù)集由來自imdb用戶的50,000個(gè)電影評論組成，標(biāo)記為positive（1）或negative（0）。評論是預(yù)處理的，每一個(gè)都被編碼為一個(gè)整數(shù)形式的單詞索引序列。評論中的單詞按照它們在數(shù)據(jù)集中的總體頻率進(jìn)行索引。例如，整數(shù)“2”編碼數(shù)據(jù)中第二個(gè)最頻繁的詞。50,000份評論分為25,000份培訓(xùn)和25,000份測試。該數(shù)據(jù)集由斯坦福大學(xué)的研究人員創(chuàng)建，并在2011年發(fā)表在一篇論文中，他們的準(zhǔn)確率達(dá)到了88.89％。它也被用在2011年的“袋裝文字爆米花”Kaggle比賽中。

建立神經(jīng)網(wǎng)絡(luò)

我們從導(dǎo)入所需的依賴關(guān)系開始：

import matplotlib

import matplotlib.pyplot as plt

import numpy as np

from keras.utils import to_categorical

from keras import keras import models

from keras import layers

我們繼續(xù)下載imdb數(shù)據(jù)集，這幸好已經(jīng)內(nèi)置到Keras中。由于我們不希望將50/50列車測試拆分，因此我們會在下載后立即將數(shù)據(jù)合并到數(shù)據(jù)和目標(biāo)中，以便稍后進(jìn)行80/20的拆分。

from keras.datasets import imdb

(training_data, training_targets), (testing_data, testing_targets) = imdb.load_data(num_words=10000)

data = np.concatenate((training_data, testing_data), axis=0)

現(xiàn)在我們可查看數(shù)據(jù)集：

targets = np.concatenate((training_targets, testing_targets), axis=0)

print("Categories:", np.unique(targets))

print("Number of unique words:", len(np.unique(np.hstack(data))))

Categories: [0 1]

Number of unique words: 9998

length = [len(i) for i in data]

print("Average Review length:", np.mean(length))

print("Standard Deviation:", round(np.std(length)))

Average Review length: 234.75892

Standard Deviation: 173.0

可以看到數(shù)據(jù)集被標(biāo)記為兩個(gè)類別，分別為0或1，表示審閱情緒。整個(gè)數(shù)據(jù)集包含9998個(gè)獨(dú)特單詞，平均評論長度為234個(gè)單詞，標(biāo)準(zhǔn)差為173個(gè)單詞。

現(xiàn)在我們來看一個(gè)訓(xùn)練樣例：

print("Label:", targets[0])

Label: 1

print(data[0])

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]

在上面，您會看到標(biāo)記為posotive的數(shù)據(jù)集的第一次審核（1）。下面的代碼檢索字典映射詞索引回到原來的單詞，以便我們可以閱讀它。它用“＃”替換每個(gè)未知的單詞。它通過使用get_word_index（）函數(shù)來完成此操作。

index = imdb.get_word_index()

reverse_index = dict([(value, key) for (key, value) in index.items()])

decoded = " ".join( [reverse_index.get(i - 3, "#") for i in data[0]] )

print(decoded)

# this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert # is an amazing actor and now the same being director # father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for # and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also # to the two little boy's that played the # of norman and paul they were just brilliant children are often left out of the # list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all

我們將矢量化每個(gè)評論并填充零，以便它包含正好10,000個(gè)數(shù)字。這意味著我們用零填充每個(gè)比10,000少的評論。我們這樣做是因?yàn)樽畲蟮膶彶闀r(shí)間很長，我們的神經(jīng)網(wǎng)絡(luò)的每個(gè)輸入都需要具有相同的大小。

def vectorize(sequences, dimension = 10000):

results = np.zeros((len(sequences), dimension))

for i, sequence in enumerate(sequences):

results[i, sequence] = 1

return results

data = vectorize(data)

targets = np.array(targets).astype("float32")

現(xiàn)在我們將數(shù)據(jù)分成訓(xùn)練和測試集。訓(xùn)練集將包含40,000條評論，并且測試設(shè)置為10,000條。

test_x = data [：10000]

test_y = targets [：10000]

train_x = data [10000：]

train_y = targets [10000：]

我們現(xiàn)在可以建立我們簡單的神經(jīng)網(wǎng)絡(luò)。我們首先定義我們要構(gòu)建的模型的類型。Keras中有兩種類型的模型可供使用：功能性API使用的Sequential模型和 Model類。然后我們只需添加輸入層，隱藏層和輸出層。在他們之間，我們使用退出來防止過度配合。在每一層，我們使用“密集”，這意味著它們完全連接。在隱藏層中，我們使用relu函數(shù)，在輸出層使用sigmoid函數(shù)。最后，我們讓Keras打印我們剛剛構(gòu)建的模型的摘要。

# Input - Layer

model.add(layers.Dense(50, activation = "relu", input_shape=(10000, )))

# Hidden - Layers

model.add(layers.Dropout(0.3, noise_shape=None, seed=None))

model.add(layers.Dense(50, activation = "relu"))

model.add(layers.Dropout(0.2, noise_shape=None, seed=None))

model.add(layers.Dense(50, activation = "relu"))

# Output- Layer

model.add(layers.Dense(1, activation = "sigmoid"))model.summary()

model.summary()

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

dense_1 (Dense) (None, 50) 500050

_________________________________________________________________

dropout_1 (Dropout) (None, 50) 0

_________________________________________________________________

dense_2 (Dense) (None, 50) 2550

_________________________________________________________________

dropout_2 (Dropout) (None, 50) 0

_________________________________________________________________

dense_3 (Dense) (None, 50) 2550

_________________________________________________________________

dense_4 (Dense) (None, 1) 51

=================================================================

Total params: 505,201

Trainable params: 505,201

Non-trainable params: 0

_________________________________________________________________

現(xiàn)在我們需要編譯我們的模型，這只不過是配置訓(xùn)練模型。我們使用“adam”優(yōu)化器，二進(jìn)制 - 交叉熵作為損失和準(zhǔn)確性作為我們的評估指標(biāo)。

modelpile（

optimizer =“adam”，

loss =“binary_crossentropy”，

metrics = [“accuracy”]

）

我們現(xiàn)在可以訓(xùn)練我們的模型。我們用batch_size為500來做這件事，并且只對兩個(gè)時(shí)代做這件事，因?yàn)槲艺J(rèn)識到如果我們訓(xùn)練它的時(shí)間越長，模型就會過度。我們將結(jié)果保存在“結(jié)果”變量中：

results = model.fit(

train_x, train_y,

epochs= 2,

batch_size = 500,

validation_data = (test_x, test_y)

)

Train on 40000 samples, validate on 10000 samples

Epoch 1/2

40000/40000 [==============================] - 5s 129us/step - loss: 0.4051 - acc: 0.8212 - val_loss: 0.2635 - val_acc: 0.8945

Epoch 2/2

40000/40000 [==============================] - 4s 90us/step - loss: 0.2122 - acc: 0.9190 - val_loss: 0.2598 - val_acc: 0.8950

現(xiàn)在是評估我們的模型的時(shí)候了：

print（np.mean（results.history [“val_acc”]））

0.894750000536

整個(gè)模型的代碼：

import numpy as np

from keras.utils import to_categorical

from keras import models

from keras import layers

from keras.datasets import imdb

(training_data, training_targets), (testing_data, testing_targets) = imdb.load_data(num_words=10000)

data = np.concatenate((training_data, testing_data), axis=0)

targets = np.concatenate((training_targets, testing_targets), axis=0)

def vectorize(sequences, dimension = 10000):

results = np.zeros((len(sequences), dimension))

for i, sequence in enumerate(sequences):

results[i, sequence] = 1

return results

test_x = data[:10000]

test_y = targets[:10000]

train_x = data[10000:]

train_y = targets[10000:]

model = models.Sequential()

# Input - Layer

model.add(layers.Dense(50, activation = "relu", input_shape=(10000, )))

# Hidden - Layers

model.add(layers.Dropout(0.3, noise_shape=None, seed=None))

model.add(layers.Dense(50, activation = "relu"))

model.add(layers.Dropout(0.2, noise_shape=None, seed=None))

model.add(layers.Dense(50, activation = "relu"))

# Output- Layer

model.add(layers.Dense(1, activation = "sigmoid"))

model.summary()

# compiling the model

modelpile(

optimizer = "adam",

loss = "binary_crossentropy",

metrics = ["accuracy"]

)

results = model.fit(

train_x, train_y,

epochs= 2,

batch_size = 500,

validation_data = (test_x, test_y)

)

print("Test-Accuracy:", np.mean(results.history["val_acc"]))

最后

如果你喜歡這些文章，請關(guān)注并轉(zhuǎn)發(fā)這篇文章，謝謝！

總結(jié)

以上是生活随笔為你收集整理的廖雪峰python教程菜鸟变高手_python怎样的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：以独占方式锁定此配置文件失败.另一个正在
下一篇： python调用perl_在Perl、S