日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

TensorFlow 2.0 - tf.data.Dataset 数据预处理 猫狗分类

發(fā)布時間:2024/7/5 编程问答 37 豆豆
生活随笔 收集整理的這篇文章主要介紹了 TensorFlow 2.0 - tf.data.Dataset 数据预处理 猫狗分类 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

    • 1 tf.data.Dataset.from_tensor_slices() 數(shù)據(jù)集建立
    • 2. Dataset.map(f) 數(shù)據(jù)集預(yù)處理
    • 3. Dataset.prefetch() 并行處理
    • 4. for 循環(huán)獲取數(shù)據(jù)
    • 5. 例子: 貓狗分類

學習于:簡單粗暴 TensorFlow 2

1 tf.data.Dataset.from_tensor_slices() 數(shù)據(jù)集建立

tf.data.Dataset.from_tensor_slices()

import matplotlib.pyplot as plt (train_data, train_label), (_, _) = tf.keras.datasets.mnist.load_data() train_data = np.expand_dims(train_data.astype(np.float32)/255., axis=-1)mnistdata = tf.data.Dataset.from_tensor_slices((train_data, train_label))for img, label in mnistdata:plt.title(label.numpy())plt.imshow(img.numpy())plt.show()

2. Dataset.map(f) 數(shù)據(jù)集預(yù)處理

  • Dataset.map(f) 應(yīng)用變換
def rotate90(img, label):img = tf.image.rot90(img)return img, labelmnistdata = mnistdata.map(rotate90)

  • Dataset.batch(batch_size) 分批
mnistdata = mnistdata.batch(5)for img, label in mnistdata: # img [5,28,28,1],label [5] 包含 5個樣本 fig, axs = plt.subplots(1, 5) # 1 行 5列for i in range(5):axs[i].set_title(label.numpy()[i])axs[i].imshow(img.numpy()[i, :, :, :])plt.show()

  • Dataset.shuffle(buffer_size) 隨機打亂
    buffer_size = 1,沒有打亂的效果
    數(shù)據(jù)集較隨機,buffer_size 可小一些,否則,設(shè)置大一些
    我在做貓狗分類例子的時候,遇到內(nèi)存不足的報錯,建議可以提前打亂數(shù)據(jù)
# 每次得到的數(shù)字不太一樣 mnistdata = mnistdata.shuffle(buffer_size=100).batch(5)

3. Dataset.prefetch() 并行處理

  • Dataset.prefetch() 開啟預(yù)加載數(shù)據(jù),使得在 GPU 訓(xùn)練的同時 CPU 可以準備數(shù)據(jù)
mnistdata = mnistdata.prefetch(buffer_size=tf.data.experimental.AUTOTUNE) # 可設(shè)置自動尋找 合適的 buffer_size
  • num_parallel_calls 多核心并行處理
mnistdata = mnistdata.map(map_func=rotate90,num_parallel_calls=2) # 也可以自動找參數(shù) tf.data.experimental.AUTOTUNE

4. for 循環(huán)獲取數(shù)據(jù)

# for 循環(huán) dataset = tf.data.Dataset.from_tensor_slices((A, B, C, ...)) for a, b, c, ... in dataset:# 對張量a, b, c等進行操作,例如送入模型進行訓(xùn)練# 或者 創(chuàng)建迭代器,使用 next() 獲取 dataset = tf.data.Dataset.from_tensor_slices((A, B, C, ...)) it = iter(dataset) a_0, b_0, c_0, ... = next(it) a_1, b_1, c_1, ... = next(it)

5. 例子: 貓狗分類

項目及數(shù)據(jù)地址:https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition/overview

The train folder contains 25,000 images of dogs and cats. Each image in this folder has the label as part of the filename. The test folder contains 12,500 images, named according to a numeric id.

For each image in the test set, you should predict a probability that the image is a dog (1 = dog, 0 = cat).

# ---------cat vs dog------------- # https://michael.blog.csdn.net/ import tensorflow as tf import pandas as pd import numpy as np import random import osnum_epochs = 10 batch_size = 32 learning_rate = 1e-4 train_data_dir = "./dogs-vs-cats/train/" test_data_dir = "./dogs-vs-cats/test/"# 數(shù)據(jù)處理 def _decode_and_resize(filename, label=None):img_string = tf.io.read_file(filename)img_decoded = tf.image.decode_jpeg(img_string)img_resized = tf.image.resize(img_decoded, [256, 256]) / 255.if label == None:return img_resizedreturn img_resized, label# 使用 tf.data.Dataset 生成數(shù)據(jù) def processData(train_filenames, train_labels):train_dataset = tf.data.Dataset.from_tensor_slices((train_filenames, train_labels))train_dataset = train_dataset.map(map_func=_decode_and_resize)# train_dataset = train_dataset.shuffle(buffer_size=25000) # 非常耗內(nèi)存,不使用train_dataset = train_dataset.batch(batch_size)train_dataset = train_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)return train_datasetif __name__ == "__main__":# 訓(xùn)練文件路徑file_dir = [train_data_dir + filename for filename in os.listdir(train_data_dir)]labels = [0 if filename[0] == 'c' else 1for filename in os.listdir(train_data_dir)]# 打包并打亂f_l = list(zip(file_dir, labels))random.shuffle(f_l)file_dir, labels = zip(*f_l)# 切分訓(xùn)練集,驗證集valid_ratio = 0.1idx = int((1 - valid_ratio) * len(file_dir))train_files, valid_files = file_dir[:idx], file_dir[idx:]train_labels, valid_labels = labels[:idx], labels[idx:]# 使用 tf.data.Dataset 生成數(shù)據(jù)集train_filenames, valid_filenames = tf.constant(train_files), tf.constant(valid_files)train_labels, valid_labels = tf.constant(train_labels), tf.constant(valid_labels)train_dataset = processData(train_filenames, train_labels)valid_dataset = processData(valid_filenames, valid_labels)# 建模 model = tf.keras.Sequential([tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(256, 256, 3)),tf.keras.layers.MaxPooling2D(),tf.keras.layers.Dropout(0.2),tf.keras.layers.Conv2D(64, 5, activation='relu'),tf.keras.layers.MaxPooling2D(),tf.keras.layers.Dropout(0.2),tf.keras.layers.Conv2D(128, 5, activation='relu'),tf.keras.layers.MaxPooling2D(),tf.keras.layers.Dropout(0.2),tf.keras.layers.Flatten(),tf.keras.layers.Dense(64, activation='relu'),tf.keras.layers.Dense(2, activation='softmax')])# 模型配置model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),loss=tf.keras.losses.sparse_categorical_crossentropy,metrics=[tf.keras.metrics.sparse_categorical_accuracy])# 訓(xùn)練model.fit(train_dataset, epochs=num_epochs, validation_data=valid_dataset)# 測試 testtest_filenames = tf.constant([test_data_dir + filename for filename in os.listdir(test_data_dir)])test_data = tf.data.Dataset.from_tensor_slices(test_filenames)test_data = test_data.map(map_func=_decode_and_resize)test_data = test_data.batch(batch_size)ans = model.predict(test_data) # ans [12500, 2]prob = ans[:, 1] # dog 的概率# 寫入提交文件id = list(range(1, 12501))output = pd.DataFrame({'id': id, 'label': prob})output.to_csv("submission.csv", index=False)

提交成績:

榜首他人成績:

  • 把模型改成 MobileNetV2 + FC,訓(xùn)練 2 個 epochs
basemodel = tf.keras.applications.MobileNetV2(input_shape=(256,256,3), include_top=False, classes=2) model = tf.keras.Sequential([basemodel,tf.keras.layers.Flatten(),tf.keras.layers.Dense(64, activation='relu'),tf.keras.layers.Dense(2, activation='softmax') ])

結(jié)果:

704/704 [==============================] - 179s 254ms/step - loss: 0.0741 - sparse_categorical_accuracy: 0.9737 - val_loss: 0.1609 - val_sparse_categorical_accuracy: 0.9744 704/704 [==============================] - 167s 237ms/step - loss: 0.0128 - sparse_categorical_accuracy: 0.9955 - val_loss: 0.0724 - val_sparse_categorical_accuracy: 0.9848

準確率(99%, 98%)比上面第一種模型高(第一種模型大概是訓(xùn)練集 92%, 驗證集80%)

測試時,損失值竟然比上面的大,怎么解釋?貌似第二種方案也沒有過擬合吧,訓(xùn)練集和驗證集準確率差不多。

創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎勵來咯,堅持創(chuàng)作打卡瓜分現(xiàn)金大獎

總結(jié)

以上是生活随笔為你收集整理的TensorFlow 2.0 - tf.data.Dataset 数据预处理 猫狗分类的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。