當(dāng)前位置：首頁(yè) > 人文社科 > 生活经验 >内容正文

生活经验

TFRecord tf.train.Feature

發(fā)布時(shí)間：2023/11/28 生活经验 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 TFRecord tf.train.Feature 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

一、定義

事先將數(shù)據(jù)編碼為二進(jìn)制的TFRecord文件，配合TF自帶的多線程API，讀取效率最高，且跨平臺(tái)，適合規(guī)范化存儲(chǔ)復(fù)雜的數(shù)據(jù)。上圖為T(mén)FRecord的pb格式定義，可發(fā)現(xiàn)每個(gè)TFRecord由許多Example組成。

Example官方定義：An Example is a mostly-normalized data format for storing data for training and inference.
一個(gè)Example代表一個(gè)封裝的數(shù)據(jù)輸入，比如包含一張圖片、圖片的寬高、圖片的label等信息。而每個(gè)信息用鍵值對(duì)的方式存儲(chǔ)。因此一個(gè)Example包含了一個(gè)Features(Features 包含多個(gè) feature）。

這種約定好的TFRecord格式，可以應(yīng)用于所有數(shù)據(jù)集的制作。

二、Feature
官方定義：

// A Feature contains Lists which may hold zero or more values. These
// lists are the base values BytesList, FloatList, Int64List.
//
// Features are organized into categories by name. The Features message
// contains the mapping from name to Feature.、

eatures是Feature的字典合集，key為String，而value為tf.train.Feature()，value必須符合特定的三種格式之一：字符串（BytesList）、實(shí)數(shù)列表（FloatList）或者整數(shù)列表（Int64List）。

tf.train.Feature(**options) 
options可以選擇如下三種數(shù)據(jù)格式：
bytes_list = tf.train.BytesList(value = 輸入)#輸入的元素的數(shù)據(jù)類型為string
int64_list = tf.train.Int64List(value = 輸入)#輸入的元素的數(shù)據(jù)類型為int(int32,int64)
float_list = tf.trian.FloatList(value = 輸入)#輸入的元素的數(shù)據(jù)類型為float(float32,float64)
注：value必須是list(向量)

原始數(shù)據(jù)為矩陣或張量（比如圖片格式）不管哪種方式存儲(chǔ)都會(huì)使數(shù)據(jù)丟失形狀信息，所以在向該樣本中寫(xiě)入feature時(shí)應(yīng)該額外加入shape信息作為額外feature。shape信息是int類型，建議采用原feature名字+’_shape’來(lái)指定shape信息的feature名。這樣讀取操作可獲取到shape信息進(jìn)行還原。

以下是兩種存儲(chǔ)矩陣的方式，都需要額外存儲(chǔ)shape信息以便還原：（第二種更方便）

將矩陣或張量fatten成list(向量)，再根據(jù)元素的數(shù)據(jù)類型選擇使用哪個(gè)數(shù)據(jù)格式存儲(chǔ)。
將矩陣或張量用.tostring()轉(zhuǎn)換成string類型，再用tf.train.Feature(bytes_list=tf.train.BytesList(value=[input.tostring()]))來(lái)存儲(chǔ)。

# 定義函數(shù)轉(zhuǎn)化變量類型。
def _int64_feature(value):return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))def _bytes_feature(value):return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))# 將每一個(gè)數(shù)據(jù)轉(zhuǎn)化為tf.train.Example格式。
def _make_example(pixels, label, image):image_raw = image.tostring()  # np.array ---> String byteexample = tf.train.Example(features=tf.train.Features(feature={'pixels': _int64_feature(pixels),'label': _int64_feature(np.argmax(label)),'image_raw': _bytes_feature(image_raw)}))return example

三、完整的持久化mnist數(shù)據(jù)為T(mén)FRecord

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np# 定義函數(shù)轉(zhuǎn)化變量類型。
def _int64_feature(value):return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))def _bytes_feature(value):return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))# 將數(shù)據(jù)轉(zhuǎn)化為tf.train.Example格式。
def _make_example(pixels, label, image):image_raw = image.tostring()example = tf.train.Example(features=tf.train.Features(feature={'pixels': _int64_feature(pixels),'label': _int64_feature(np.argmax(label)),'image_raw': _bytes_feature(image_raw)}))return exampledef save_tfrecords():# 讀取mnist訓(xùn)練數(shù)據(jù)。mnist = input_data.read_data_sets("../../datasets/MNIST_data",dtype=tf.uint8, one_hot=True)images = mnist.train.images  # (55000, 784) <class 'numpy.ndarray'>labels = mnist.train.labels  # (55000, 10) <class 'numpy.ndarray'>pixels = images.shape[1]  # 784 = 28 * 28num_examples = mnist.train.num_examples# 輸出包含訓(xùn)練數(shù)據(jù)的TFRecord文件。with tf.python_io.TFRecordWriter("output.tfrecords") as writer:for index in range(num_examples):# 生成一個(gè)Example并序列化后寫(xiě)入pbexample = _make_example(pixels, labels[index], images[index])writer.write(example.SerializeToString())print("TFRecord訓(xùn)練文件已保存。")

四、讀取解析TFRecord
讀取解析的步驟中，需要根據(jù)編碼時(shí)候的定義，來(lái)指定解碼時(shí)候的規(guī)則和還原的dtype，如image需要指定tf.string格式，之后再去解析成uint8。注意，這里的parse等op操作都是在graph中定義一些運(yùn)算op，并沒(méi)有運(yùn)行。sess.run()的時(shí)候才會(huì)真正多線程開(kāi)始讀取解析。這種讀取二進(jìn)制了流文件的速度，多線程加持下遠(yuǎn)遠(yuǎn)超過(guò)讀取硬盤(pán)中的原生圖片。

def test_tfrecords():# 讀取文件。print(len(tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS)))  # 0reader = tf.TFRecordReader()filename_queue = tf.train.string_input_producer(["output.tfrecords"])  # 隊(duì)列默認(rèn)自動(dòng)添加進(jìn)collectionprint(len(tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS)))   # 1_, serialized_example = reader.read(filename_queue)# 解析讀取的樣例。features = tf.parse_single_example(serialized_example,features={'image_raw': tf.FixedLenFeature([], tf.string),'pixels': tf.FixedLenFeature([], tf.int64),'label': tf.FixedLenFeature([], tf.int64)})images = tf.decode_raw(features['image_raw'], tf.uint8)labels = tf.cast(features['label'], tf.int32)pixels = tf.cast(features['pixels'], tf.int32)sess = tf.Session()# 啟動(dòng)多線程處理輸入數(shù)據(jù)。coord = tf.train.Coordinator()threads = tf.train.start_queue_runners(sess=sess, coord=coord)for i in range(5):image, label, pixel = sess.run([images, labels, pixels])print(label)

總結(jié)

以上是生活随笔為你收集整理的TFRecord tf.train.Feature的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。