當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Tensorflow线程队列与IO操作

發布時間：2024/7/5 编程问答 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 Tensorflow线程队列与IO操作小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

? ? ? ? ? ? ? ? ? ? ? ? ? Tensorflow線程隊列與IO操作

1 線程和隊列

1.1 前言

1.2?隊列

1.3?隊列管理器

1.4?線程協調器

2 文件讀取

2.1 流程

2.2 文件讀取API：

3?圖像讀取

3.1 圖像讀取基本知識

3.2?圖像基本操作

3.3 圖像讀取API

3.4?圖片批處理流程

3.5 讀取圖片案例

4 二進制文件讀取

4.1?CIFAR-10 二進制數據讀取

5?TFRecords分析存儲

5.1 簡介

5.2?TFRecords存儲

5.3?TFRecords讀取方法

5.4 案例

? ? ? ? ? ? ? ? ? ? ? ? ? Tensorflow線程隊列與IO操作

1 線程和隊列

1.1 前言

IO操作進行大文件讀取時，如果一次性進行讀取時非常消耗內存的，那么一次性讀取就需要一次性訓練。這樣是非常慢的。Tensorflow是計算型的，重點在計算，所以不能在讀寫上花太多時間，那么就提供了多線程，隊列等機制。在Tensorflow中的多線程是真正的多線程，能并行的執行任務。

1.2?隊列

（1）先進先出隊列，按順序出隊列：tf.FIFOQueue

FIFOQueue(capacity, dtypes, name='fifo_queue') 創建一個以先進先出的順序對元素進行排隊的隊列

? ? ? ? ? ?capacity：整數。可能存儲在此隊列中的元素數量的上限

? ? ? ? ? ?dtypes：DType對象列表。長度dtypes必須等于每個隊列元素中的張量數,dtype的類型形狀，決定了后面進隊列元素形狀

常用方法：

? ? ? ? ? ?dequeue(name=None) ：取數據，出隊列

? ? ? ? ? ?enqueue(vals, name=None): 放數據

? ? ? ? ? ?enqueue_many(vals, name=None):放數據，vals列表或者元組

返回一個進隊列操作 size(name=None)

（2）隨機出隊列：tf.RandomShuffleQueue

import tensorflow as tf # 模擬同步先處理數據，取數據訓練# 1、定義隊列 Q = tf.FIFOQueue(10, tf.float32)# 放入數據，參數如果是[0.1, 0.2, 0.3]會認為是一個張量 enq_many = Q.enqueue_many([[0.1, 0.2, 0.3], ])# 2、定義處理數據的邏輯，取數據*2，入隊列 out= Q.dequeue() data = out*2 enter = Q.enqueue(data)with tf.Session() as sess:# 初始化隊列sess.run(enq_many)# 處理數據for i in range(10):sess.run(enter)# 訓練數據for i in range(Q.size().eval()):print(sess.run(Q.dequeue()))

注意：tensorflow當中，運行操作有依賴性，有操作之間計算的關系才能叫做依賴性

1.3?隊列管理器

當數據量很大時，入隊操作從硬盤中讀取數據，放入內存中，主線程需要等待入隊操作完成，才能進行訓練。會話里可以運行多個線程，實現異步讀取。

tf.train.QueueRunner(queue, enqueue_ops=None) 創建一個QueueRunner

? ? ? ? ? ?queue：一個隊列? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

? ? ? ? ? ?enqueue_ops：添加線程的隊列操作列表，[]*2,指定兩個線程。[]里面指定線程做什么操作

方法：create_threads(sess, coord=None,start=False)? ? ?創建線程來運行給定會話的入隊操作，返回線程的實例

? ? ? ? ? ?coord:線程協調器，后面線程管理需要用到

? ? ? ? ? ?start：布爾值，如果True啟動線程；如果為False調用者必須調用start()啟動線程

1.4?線程協調器

tf.train.Coordinator() ?? ?線程協調員,實現一個簡單的機制來協調一組線程的終止。返回線程協調員實例

? ? ? ? ? ?request_stop() ：請求停止線程

? ? ? ? ? ?should_stop()：檢查是否要求停止

? ? ? ? ? ?join(threads=None, stop_grace_period_secs=120) ? 等待線程終止，回收線程

import tensorflow as tf # 模擬異步子線程存入樣本，主線程讀取樣本# 1、定義隊列 Q = tf.FIFOQueue(10, tf.float32)# 2、定義處理邏輯循環值+1，放入隊列當中 var = tf.Variable(0.0)# 實現一個自增 tf.assign_add data = tf.assign_add(var, tf.constant(1.0))# 放數據 enter = Q.enqueue(data)# 3、定義隊列管理器op, 指定開啟多少個子線程，子線程的任務 qr = tf.train.QueueRunner(Q, enqueue_ops=[enter] * 2)# 初始化變量的OP init_op = tf.global_variables_initializer()with tf.Session() as sess:# 初始化變量sess.run(init_op)# 開啟線程管理器coord = tf.train.Coordinator()# 真正開啟子線程threads = qr.create_threads(sess, coord=coord, start=True)# 主線程，不斷讀取數據訓練for i in range(100):print(sess.run(Q.dequeue()))# 回收coord.request_stop()coord.join(threads)

2 文件讀取

2.1 流程

（1）構造一個文件隊列

（2）構造文件閱讀器，read讀取隊列內容，默認只讀取一個樣本

①csv文件，默認讀取一行；②二進制文件，指定一個樣本的bytes讀取；③圖片文件，按一張一張的讀取；④threcords

（3）decode解碼

（4）批處理

2.2 文件讀取API：

（1）文件隊列構造

tf.train.string_input_producer(string_tensor,shuffle=True) 將輸出字符串（例如文件名）輸入到管道隊列

? ? ? ? ? ?string_tensor：?含有文件名的1階張量

? ? ? ? ? ?num_epochs:過幾遍數據，默認無限過數據

? ? ? ? ? ?return:具有輸出字符串的隊列

（2）文件閱讀器

tf.TextLineReader 根據文件格式，選擇對應的文件閱讀器?

? ? ? ? ? ?閱讀文本文件逗號分隔值（CSV）格式,默認按行讀取

? ? ? ? ? ?return：讀取器實例

tf.FixedLengthRecordReader(record_bytes)

? ? ? ? ? ?要讀取每個記錄是固定數量字節的二進制文件

? ? ? ? ? ?record_bytes:整型，指定每次讀取的字節數

? ? ? ? ? ?return：讀取器實例

tf.TFRecordReader 讀取TfRecords文件

注意：有一個共同的讀取方法： read(file_queue)：從隊列中指定數量內容返回一個Tensors元組（key文件名字，value默認的內容(行，字節)）

（3）文件內容解碼器

由于從文件中讀取的是字符串，需要函數去解析這些字符串到張量

tf.decode_csv(records,record_defaults=None,field_delim = None，name = None)? ? ? ? ? ? ?將CSV轉換為張量，與tf.TextLineReader搭配使用

? ? ? ? ? ?records:tensor型字符串，每個字符串是csv中的記錄行

? ? ? ? ? ?field_delim:默認分割符”,”

? ? ? ? ? ?record_defaults:參數決定了所得張量的類型，并設置一個值在輸入字符串中缺少使用默認值,如 tf.decode_raw(bytes,out_type,little_endian = None，name = None) ?? ?將字節轉換為一個數字向量表示，字節為一字符串類型的張量,與函數tf.FixedLengthRecordReader搭配使用，二進制讀取為uint8格式

（4）開啟線程操作

tf.train.start_queue_runners(sess=None,coord=None) ?? ?收集所有圖中的隊列線程，并啟動線程

? ? ? ? ? ?sess:所在的會話中

? ? ? ? ? ?coord：線程協調器

? ? ? ? ? ?return：返回所有線程隊列

（5）管道讀端批處理

tf.train.batch(tensors,batch_size,num_threads = 1,capacity = 32,name=None) 讀取指定大小（個數）的張量

? ? ? ? ? ?tensors：可以是包含張量的列表

? ? ? ? ? ?batch_size:從隊列中讀取的批處理大小

? ? ? ? ? ?num_threads：進入隊列的線程數

? ? ? ? ? ?capacity：整數，隊列中元素的最大數量

? ? ? ? ? ?return:tensors

tf.train.shuffle_batch(tensors,batch_size,capacity,min_after_dequeue, ? ?num_threads=1,)? ? ? ? ? ? ? ? ? ? 亂序讀取指定大小（個數）的張量

? ? ? ? ? ?min_after_dequeue:留下隊列里的張量個數，能夠保持隨機打亂

import tensorflow as tf import os def csvread(filelist):"""讀取CSV文件:param filelist: 文件路徑+名字的列表:return: 讀取的內容"""# 1、構造文件的隊列file_queue = tf.train.string_input_producer(filelist)# 2、構造csv閱讀器讀取隊列數據（按一行）reader = tf.TextLineReader()key, value = reader.read(file_queue)# 3、對每行內容解碼# record_defaults:指定每一個樣本的每一列的類型，指定默認值[["None"], [4.0]]records = [["None"], ["None"]]example, label = tf.decode_csv(value, record_defaults=records,field_delim=" ")# 4、想要讀取多個數據，就需要批處理example_batch, label_batch = tf.train.batch([example, label], batch_size=4, num_threads=1, capacity=4)print(example_batch, label_batch)return example_batch, label_batchif __name__ == "__main__":# 1、找到文件，放入列表路徑+名字 ->列表當中file_name = os.listdir("./floder")filelist = [os.path.join("./floder", file) for file in file_name ]# 打印文件名example_batch, label_batch = csvread(filelist)# 開啟會話運行結果with tf.Session() as sess:# 定義一個線程協調器coord = tf.train.Coordinator()# 開啟讀文件的線程threads = tf.train.start_queue_runners(sess, coord=coord)# 打印讀取的內容print(sess.run([example_batch, label_batch]))# 回收子線程coord.request_stop()coord.join(threads)

3?圖像讀取

3.1 圖像讀取基本知識

機器學習算法輸入是特征值+目標值。每個圖片由像素組成的，讀取的時候是讀取像素值去識別。

在圖像數字化表示當中，分為黑白和彩色兩種。在數字化表示圖片的時候，有三個因素。分別是圖片的長、圖片的寬、圖片的顏色通道數。

①黑白圖片：顏色通道數為1，個像素點只有一個值，稱為灰度值[0-255]；

②彩色圖片：它有三個顏色通道，分別為RGB，通過三個數字表示一個像素位。TensorFlow支持JPG、PNG圖像格式，RGB、RGBA顏色空間。圖像用與圖像尺寸相同(heightwidthchnanel)張量表示。圖像所有像素存在磁盤文件，需要被加載到內存。

3.2?圖像基本操作

操作：縮小圖片大小，為了所有圖片統一特征數（像素值一樣）。

目的：①增加圖片數據的統一性②所有圖片轉換成指定大小 ③縮小圖片數據量，防止增加開銷

圖片存儲計算的類型：存儲uint8（節約空間）矩陣計算float32（提高精度）

API：

tf.image.resize_images(images, size) 縮小圖片

? ? ? ? ? ? images：4-D形狀[batch, height, width, channels]或3-D形狀的張量[height, width, channels]的圖片數據

? ? ? ? ? ? size：1-D int32張量：new_height, new_width，圖像的新尺寸

? ? ? ? ? ? 返回4-D格式或者3-D格式圖片

3.3 圖像讀取API

圖像讀取器：

①tf.WholeFileReader 將文件的全部內容作為值輸出的讀取器

? ? ? ? ? ? return：讀取器實例

? ? ? ? ? ? read(file_queue):輸出將是一個文件名（key）和該文件的內容（值）

圖像解碼器：

①tf.image.decode_jpeg(contents) 將JPEG編碼的圖像解碼為uint8張量

? ? ? ? ? ? return:uint8張量，3-D形狀[height, width, channels]

②tf.image.decode_png(contents) 將PNG編碼的圖像解碼為uint8或uint16張量

? ? ? ? ? ? return:張量類型，3-D形狀[height, width, channels]

3.4?圖片批處理流程

（1）構造圖片文件隊列

（2）構造圖片閱讀器

（3）讀取圖片數據

（4）處理圖片數據

3.5 讀取圖片案例

import tensorflow as tf import os def pictureRead(filelist):"""讀取狗圖片并轉換成張量:param filelist: 文件路徑+ 名字的列表:return: 每張圖片的張量"""# 1、構造文件隊列file_queue = tf.train.string_input_producer(filelist)# 2、構造閱讀器去讀取圖片內容（默認讀取一張圖片）reader = tf.WholeFileReader()key, value = reader.read(file_queue)print(value)# 3、對讀取的圖片數據進行解碼image = tf.image.decode_jpeg(value)print(image)# 5、處理圖片的大小（統一大小）image_resize = tf.image.resize_images(image, [300, 300])print(image_resize)# 注意：一定要把樣本的形狀固定 [300, 300, 3],在批處理的時候要求所有數據形狀必須定義image_resize.set_shape([300, 300, 3])print(image_resize)# 6、進行批處理image_batch = tf.train.batch([image_resize], batch_size=50, num_threads=2, capacity=50)print(image_batch)return image_batchif __name__ == "__main__":# 1、找到文件，放入列表路徑+名字 ->列表當中file_name = os.listdir("./cat")filelist = [os.path.join("./cat", file) for file in file_name ]# 圖片的張量image_batch = pictureRead(filelist)# 開啟會話運行結果with tf.Session() as sess:# 定義一個線程協調器coord = tf.train.Coordinator()# 開啟讀文件的線程threads = tf.train.start_queue_runners(sess, coord=coord)# 打印讀取的內容print(sess.run([image_batch]))# 回收子線程coord.request_stop()coord.join(threads)

4 二進制文件讀取

4.1?CIFAR-10 二進制數據讀取

網址：http://www.cs.toronto.edu/~kriz/cifar.html

由介紹可知每個樣本的大小為：1（目標值）+3072（特征值）=3073字節

import tensorflow as tf import os def binaryRead(filelist):# 定義讀取的圖片的一些屬性height,width,channel = 32,33,3# 二進制文件每張圖片的字節label_bytes = 1image_bytes = height * width * channelbytes = label_bytes + image_bytes# 1、構造文件隊列file_queue = tf.train.string_input_producer(filelist)# 2、構造二進制文件讀取器，讀取內容, 每個樣本的字節數reader = tf.FixedLengthRecordReader(bytes)key, value = reader.read(file_queue)# 3、解碼內容, 二進制文件內容的解碼label_image = tf.decode_raw(value, tf.uint8)print(label_image)# 4、分割出圖片和標簽數據，切除特征值和目標值label = tf.cast(tf.slice(label_image, [0], [label_bytes]), tf.int32)image = tf.slice(label_image, [label_bytes], [image_bytes])# 5、對圖片的特征數據進行形狀的改變 [3072] --> [32, 32, 3]image_reshape = tf.reshape(image, [height, width, channel])print(label, image_reshape)# 6、批處理數據image_batch, label_batch = tf.train.batch([image_reshape, label], batch_size=20, num_threads=2, capacity=20)print(image_batch, label_batch)return image_batch, label_batchif __name__ == "__main__":# 1、找到文件，放入列表路徑+名字 ->列表當中file_name = os.listdir("./data/cifar-10-batches-bin")filelist = [os.path.join("./data/cifar-10-batches-bin", file) for file in file_name if file[-3:]=="bin"]# 二進制的張量image_batch, label_batch= binaryRead(filelist)# 開啟會話運行結果with tf.Session() as sess:# 定義一個線程協調器coord = tf.train.Coordinator()# 開啟讀文件的線程threads = tf.train.start_queue_runners(sess, coord=coord)# 打印讀取的內容print(sess.run([image_batch, label_batch]))# 回收子線程coord.request_stop()coord.join(threads)

5?TFRecords分析存儲

5.1 簡介

FRecords是Tensorflow設計的一種內置文件格式，是一種二進制文件，它能更好的利用內存，更方便復制和移動。從機器學習角度，一個樣本是特征值和目標值組成，FRecords是為了將二進制數據和標簽(訓練的類別標簽)數據存儲在同一個文件中

文件格式：*.tfrecords? ? ? ? ? ? ? ? ? ? ? 寫入文件內容：Example協議塊（類字典的格式）

優點：特征值目標值共同存儲，獲取的時候只要指定鍵是什么值是什么就能獲取到了。

5.2?TFRecords存儲

（1）建立TFRecord存儲器

tf.python_io.TFRecordWriter(path) 寫入tfrecords文件

? ? ? ? ? ?path: TFRecords文件的路徑

? ? ? ? ? ?return：寫文件

方法method：

? ? ? ? ? ?write(record):向文件中寫入一個字符串記錄（就是example）?

? ? ? ? ? ?close():關閉文件寫入器

注意：字符串為一個序列化的Example，使用Example.SerializeToString()

（2）構造每個樣本的Example協議塊

tf.train.Example(features=None)

? ? ? ? ? ?寫入tfrecords文件

? ? ? ? ? ?features:tf.train.Features類型的特征實例

? ? ? ? ? ?return：example格式協議塊

tf.train.Features(feature=None) 構建每個樣本的信息鍵值對

? ? ? ? ? ?feature:字典數據,key為要保存的名字， value為tf.train.Feature實例

? ? ? ? ? ?return:Features類型

tf.train.Feature(**options)

? ? ? ? ? ?**options：例如：

? ? ? ? ? ? bytes_list=tf.train. BytesList(value=[Bytes])

? ? ? ? ? ?int64_list=tf.train. Int64List(value=[Value])

tf.train. Int64List(value=[Value])

tf.train. BytesList(value=[Bytes])

tf.train. FloatList(value=[value])

5.3?TFRecords讀取方法

同文件閱讀器流程,中間需要解析過程

解析TFRecords的example協議內存塊：

tf.parse_single_example(serialized,features=None,name=None)

? ? ? ? ? ?解析一個單一的Example原型

? ? ? ? ? ?serialized：標量字符串Tensor，一個序列化的Example

? ? ? ? ? ?features：dict字典數據，鍵為讀取的名字，值為FixedLenFeature

? ? ? ? ? ?return:一個鍵值對組成的字典，鍵為讀取的名字

tf.FixedLenFeature(shape,dtype)

? ? ? ? ? ?shape：輸入數據的形狀，一般不指定,為空列表

? ? ? ? ? ?dtype：輸入數據類型，與存儲進文件的類型要一致類型只能是float32,int64,string

5.4 案例

CIFAR-10批處理結果存入tfrecords流程

（1）構造存儲器

（2）構造每一個樣本的Example

（3）寫入序列化的Example

讀取tfrecords流程

（1）構造文件隊列

（2）構造TFRecords閱讀器

（3）解析Example

（4）轉換格式，bytes解碼

import tensorflow as tf import os# 定義數據等命令行參數 FLAGS = tf.app.flags.FLAGStf.app.flags.DEFINE_string("data_dir", "./data/cifar-10-batches-bin", "文件的目錄") tf.app.flags.DEFINE_string("data_tfrecords", "./tmp/dataTFR.tfrecords", "存進tfrecords的文件")class TFRRead(object):"""完成讀取二進制文件，寫進tfrecords，讀取tfrecords"""def __init__(self, filelist):# 文件列表self.file_list = filelist# 定義讀取的圖片的一些屬性self.height = 32self.width = 32self.channel = 3# 二進制文件每張圖片的字節self.label_bytes = 1self.image_bytes = self.height * self.width * self.channelself.bytes = self.label_bytes + self.image_bytesdef read_and_decode(self):# 1、構造文件隊列file_queue = tf.train.string_input_producer(self.file_list)# 2、構造二進制文件讀取器，讀取內容, 每個樣本的字節數reader = tf.FixedLengthRecordReader(self.bytes)key, value = reader.read(file_queue)# 3、解碼內容, 二進制文件內容的解碼label_image = tf.decode_raw(value, tf.uint8)# 4、分割出圖片和標簽數據，切除特征值和目標值label = tf.cast(tf.slice(label_image, [0], [self.label_bytes]), tf.int32)image = tf.slice(label_image, [self.label_bytes], [self.image_bytes])# 5、對圖片的特征數據進行形狀的改變 [3072] --> [32, 32, 3]image_reshape = tf.reshape(image, [self.height, self.width, self.channel])# 6、批處理數據image_batch, label_batch = tf.train.batch([image_reshape, label], batch_size=20, num_threads=1, capacity=20)return image_batch, label_batchdef write_ro_tfrecords(self, image_batch, label_batch):"""將圖片的特征值和目標值存進tfrecords:param image_batch: 20張圖片的特征值:param label_batch: 20張圖片的目標值:return: None"""# 1、建立TFRecord存儲器writer = tf.python_io.TFRecordWriter(FLAGS.data_tfrecords)# 2、循環將所有樣本寫入文件，每張圖片樣本都要構造example協議for i in range(20):# 取出第i個圖片數據的特征值和目標值，image_batch[i]是類型，調用eval()獲取值，因為是個張量，需要調用.tostring()轉換成字符串image = image_batch[i].eval().tostring()label = int(label_batch[i].eval()[0])# 構造一個樣本的exampleexample = tf.train.Example(features=tf.train.Features(feature={"image": tf.train.Feature(bytes_list=tf.train.BytesList(value=[image])),"label": tf.train.Feature(int64_list=tf.train.Int64List(value=[label])),}))# 寫入單獨的樣本,字符串要為一個序列化的Examplewriter.write(example.SerializeToString())# 關閉writer.close()return Nonedef read_from_tfrecords(self):# 1、構造文件隊列file_queue = tf.train.string_input_producer([FLAGS.data_tfrecords])# 2、構造文件閱讀器，讀取內容example,value=一個樣本的序列化examplereader = tf.TFRecordReader()key, value = reader.read(file_queue)# 3、解析examplefeatures = tf.parse_single_example(value, features={"image": tf.FixedLenFeature([], tf.string),"label": tf.FixedLenFeature([], tf.int64),})# 4、解碼內容, 如果讀取的內容格式是string需要解碼，如果是int64,float32不需要解碼image = tf.decode_raw(features["image"], tf.uint8)# 固定圖片的形狀，方便與批處理image_reshape = tf.reshape(image, [self.height, self.width, self.channel])label = tf.cast(features["label"], tf.int32)print(image_reshape, label)# 進行批處理image_batch, label_batch = tf.train.batch([image_reshape, label], batch_size=20, num_threads=1, capacity=20)return image_batch, label_batchif __name__ == "__main__":# 1、找到文件，放入列表路徑+名字 ->列表當中file_name = os.listdir(FLAGS.data_dir)filelist = [os.path.join(FLAGS.data_dir, file) for file in file_name if file[-3:] == "bin"]# print(file_name)cf = TFRRead(filelist)#image_batch, label_batch = cf.read_and_decode()image_batch, label_batch = cf.read_from_tfrecords()# 開啟會話運行結果with tf.Session() as sess:# 定義一個線程協調器coord = tf.train.Coordinator()# 開啟讀文件的線程threads = tf.train.start_queue_runners(sess, coord=coord)#存進tfrecords文件# print("開始存儲")# threads = cf.write_ro_tfrecords(image_batch, label_batch)# print("結束存儲")# 打印讀取的內容print(sess.run([image_batch, label_batch]))# 回收子線程coord.request_stop()coord.join(threads)

創作挑戰賽新人創作獎勵來咯，堅持創作打卡瓜分現金大獎

總結

以上是生活随笔為你收集整理的Tensorflow线程队列与IO操作的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： java 捆绑_java – 如何在ja
下一篇： oracle事务重要属性,Oracle中