當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

tensorflow2.0 Dataset创建和使用

發(fā)布時(shí)間：2024/7/5 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 tensorflow2.0 Dataset创建和使用小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

一、創(chuàng)建Dataset

# 可以接收一個(gè)numpy.ndarray、tuple、dict dataset = tf.data.Dataset.from_tensor_slices(np.arange(10).reshape((5,2))) dataset = tf.data.Dataset.from_tensor_slices(([1,2,3,4,5,6],[10,20,30,40,50,60])) dataset = tf.data.Dataset.from_tensor_slices({"x":[1,2,3,4,5,6],"y":[10,20,30,40,50,60]})dataset = dataset.batch(3) for batch in dataset:print(batch)

分別輸出：

tf.Tensor( [[0 1][2 3][4 5]], shape=(3, 2), dtype=int32) tf.Tensor( [[6 7][8 9]], shape=(2, 2), dtype=int32) #------------------------------------------------------------------------(<tf.Tensor: shape=(3,), dtype=int32, numpy=array([1, 2, 3])>, <tf.Tensor: shape=(3,), dtype=int32, numpy=array([10, 20, 30])>) (<tf.Tensor: shape=(3,), dtype=int32, numpy=array([4, 5, 6])>, <tf.Tensor: shape=(3,), dtype=int32, numpy=array([40, 50, 60])>) #------------------------------------------------------------------------ {'x': <tf.Tensor: shape=(3,), dtype=int32, numpy=array([1, 2, 3])>, 'y': <tf.Tensor: shape=(3,), dtype=int32, numpy=array([10, 20, 30])>} {'x': <tf.Tensor: shape=(3,), dtype=int32, numpy=array([4, 5, 6])>, 'y': <tf.Tensor: shape=(3,), dtype=int32, numpy=array([40, 50, 60])>}

二、數(shù)據(jù)預(yù)處理

1、map

def func(x, y):x = x/1y = y/10return x, ytrain_data = [1,2,3,4,5,6] train_label = [10,20,30,40,50,60] dataset = tf.data.Dataset.from_tensor_slices((train_data, train_label)) dataset = dataset .map(func)for x, y in dataset :print(x,y)

輸出

tf.Tensor(1.0, shape=(), dtype=float64) tf.Tensor(1.0, shape=(), dtype=float64) tf.Tensor(2.0, shape=(), dtype=float64) tf.Tensor(2.0, shape=(), dtype=float64) tf.Tensor(3.0, shape=(), dtype=float64) tf.Tensor(3.0, shape=(), dtype=float64) tf.Tensor(4.0, shape=(), dtype=float64) tf.Tensor(4.0, shape=(), dtype=float64) tf.Tensor(5.0, shape=(), dtype=float64) tf.Tensor(5.0, shape=(), dtype=float64) tf.Tensor(6.0, shape=(), dtype=float64) tf.Tensor(6.0, shape=(), dtype=float64) dataset = dataset .map(map_func=func, num_parallel_calls=tf.data.experimental.AUTOTUNE)

num_parallel_calls：將數(shù)據(jù)加載與變換過程并行到多個(gè)CPU線程上
tf.data.experimental.AUTOTUNE：自動(dòng)設(shè)置為最大的可用線程數(shù)

2、shuffle 和 batch

dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5]) dataset = dataset.batch(2).shuffle(2) # 對(duì)batch進(jìn)行shuffle，batch內(nèi)部不shuffle dataset = dataset.shuffle(2).batch(2) # 先將數(shù)據(jù)進(jìn)行shuffle，再進(jìn)行batch劃分 for d in dataset:print(d)print("----------------------------")

結(jié)果1

tf.Tensor([1 2], shape=(2,), dtype=int32) ---------------------------- tf.Tensor([5], shape=(1,), dtype=int32) ---------------------------- tf.Tensor([3 4], shape=(2,), dtype=int32) ----------------------------

結(jié)果2

tf.Tensor([1 3], shape=(2,), dtype=int32) ---------------------------- tf.Tensor([2 4], shape=(2,), dtype=int32) ---------------------------- tf.Tensor([5], shape=(1,), dtype=int32) ----------------------------

3、repeat

dataset = tf.data.Dataset.from_tensor_slices([1,2]) dataset = dataset.repeat(2) for d in dataset:print(d)

結(jié)果

tf.Tensor(1, shape=(), dtype=int32) tf.Tensor(2, shape=(), dtype=int32) tf.Tensor(1, shape=(), dtype=int32) tf.Tensor(2, shape=(), dtype=int32)

三、并行化策略

dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)

Dataset.prefetch() ：讓數(shù)據(jù)集對(duì)象 Dataset 在訓(xùn)練時(shí)預(yù)取出若干個(gè)元素，使得在 GPU 訓(xùn)練的同時(shí) CPU 可以準(zhǔn)備數(shù)據(jù)，從而提升訓(xùn)練流程的效率。

四、模型使用數(shù)據(jù)集

Keras 支持使用 tf.data.Dataset 直接作為輸入。當(dāng)調(diào)用 tf.keras.Model 的 fit() 和 evaluate() 方法時(shí)，可以將參數(shù)中的輸入數(shù)據(jù) x 指定為一個(gè)元素格式為 (輸入數(shù)據(jù), 標(biāo)簽數(shù)據(jù)) 的 Dataset ，并忽略掉參數(shù)中的標(biāo)簽數(shù)據(jù) y 。

常規(guī)的 Keras 訓(xùn)練方式：

model.fit(x=train_data, y=train_label, epochs=num_epochs, batch_size=batch_size)

使用 tf.data.Dataset 訓(xùn)練方式：

model.fit(dataset, epochs=num_epochs)

總結(jié)

以上是生活随笔為你收集整理的tensorflow2.0 Dataset创建和使用的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

Dataset

上一篇： pytorch中的nan
下一篇： TypeError: can't pic