當(dāng)前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

OFRecord 图片文件制数据集

發(fā)布時間：2023/11/28 生活经验 18 豆豆

生活随笔收集整理的這篇文章主要介紹了 OFRecord 图片文件制数据集小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

OFRecord 圖片文件制數(shù)據(jù)集
在 OFRecord 數(shù)據(jù)格式和加載與準(zhǔn)備 OFRecord 數(shù)據(jù)集中，分別學(xué)習(xí)了 OFRecord 數(shù)據(jù)格式，以及如何將其它數(shù)據(jù)集轉(zhuǎn)為 OFRecord 數(shù)據(jù)集并使用。
本文介紹如何將圖片文件制作為 OFRecord 數(shù)據(jù)集，并提供了相關(guān)的制作腳本，方便用戶直接使用或者在此基礎(chǔ)上修改。內(nèi)容包括：
? 制作基于 MNIST 手寫數(shù)字?jǐn)?shù)據(jù)集的 OFRecord 數(shù)據(jù)集
? OFRecord 的編解碼方式
? 在自制的 OFRecord 數(shù)據(jù)集上進行訓(xùn)練
用圖片文件制作 OFRecord 文件
使用 MNIST 數(shù)據(jù)集中的圖片文件來制作一個 OFRecord 格式文件。
作為示例，僅使用了50張圖片，相關(guān)腳本和數(shù)據(jù)集的下載地址為 img2ofrecord
? 下載相關(guān)壓縮包并解壓
$ wget https://oneflow-static.oss-cn-beijing.aliyuncs.com/oneflow-tutorial-attachments/img2ofrecord.zip
$ unzip img2ofrecord.zip
? 進入到對應(yīng)目錄，并運行 OFRecord 制作腳本 img2ofrecord.py
$ cd ./img_to_ofrecord
$ python img2ofrecord.py --part_num=5 --save_dir=./dataset/ --img_format=.png --image_root=./images/train_set/
? 腳本運行過程中，將輸出以下內(nèi)容
The image root is: ./images/train_set/
The amount of OFRecord data part is: 5
The directory of Labels is: ./images/train_label/label.txt
The image format is: .png
The OFRecord save directory is: ./dataset/
Start Processing…
./images/train_set/00000030_3.png feature saved
./images/train_set/00000034_0.png feature saved
./images/train_set/00000026_4.png feature saved
./images/train_set/00000043_9.png feature saved
…
Process image successfully !!!
至此 OFRecord 文件制作完畢，并保存在 ./dataset 目錄下
代碼解讀
整個代碼目錄構(gòu)造如下
img_to_ofrecord
├── images
├── train_set
├── 00000000_5.png
├── 00000001_0.png
├── 00000002_4.png
…
├── train_label
├── label.txt
├── img2ofrecord.py
├── lenet_train.py
? images 目錄存放原始示例訓(xùn)練數(shù)據(jù)集以及標(biāo)簽文件
的標(biāo)簽文件是以 json 格式存儲的，格式如下：
{“00000030_3.png”: 3}
{“00000034_0.png”: 0}
{“00000026_4.png”: 4}
{“00000043_9.png”: 9}
{“00000047_5.png”: 5}
{“00000003_1.png”: 1}
…
? img2ofrecord.py 腳本將 MNIST 圖片轉(zhuǎn)換成 OFRecord 數(shù)據(jù)集
? lenet_train.py 腳本則讀取制作好的 OFRecord 數(shù)據(jù)集，并使用 LeNet 模型進行訓(xùn)練。
img2ofrecord.py 的命令行選項如下：
? image_root 指定圖片的根目錄路徑
? part_num 指定生成 OFRecord 文件個數(shù)，如果該數(shù)目大于總圖片數(shù)目，會報錯
? label_dir 指定標(biāo)簽的目錄路徑
? img_format 指定圖片的格式
? save_dir 指定 OFRecord 文件保存的目錄
腳本的編碼流程
與 OFRecord 文件編碼的相關(guān)邏輯也在 img2ofrecord.py 內(nèi)，其編碼流程如下：
首先，對讀取進來的圖片數(shù)據(jù)進行編碼
def encode_img_file(filename, ext=".jpg"):
img = cv2.imread(filename)
encoded_data = cv2.imencode(ext, img)[1]
return encoded_data.tostring()
這里的 ext 是圖片編碼格式，目前，OneFlow 圖片編解碼支持的格式與 OpenCV 的一致，可參見 cv::ImwriteFlags，包括：
? JPEG，一種最常見的有損編碼格式，可參考 JPEG
? PNG，一種常見的無損位圖編碼格式，可參考 Portable Network Graphics
? TIFF，一種可擴展的壓縮編碼格式，可參考 Tagged Image File Format
然后，轉(zhuǎn)化成 Feature 的形式，進行序列化，并將數(shù)據(jù)長度寫入到文件中
def ndarray2ofrecords(dsfile, dataname, encoded_data, labelname, encoded_label):
topack = {dataname: bytes_feature(encoded_data),
labelname: int32_feature(encoded_label)}
ofrecord_features = ofrecord.OFRecord(feature=topack)
serilizedBytes = ofrecord_features.SerializeToString()
length = ofrecord_features.ByteSize()
dsfile.write(struct.pack(“q”, length))
dsfile.write(serilizedBytes)
使用自制的 OFRecord 數(shù)據(jù)集進行訓(xùn)練
運行目錄下的 lenet_train.py，它將讀取剛制作好的 OFRecord 數(shù)據(jù)集，在 Lenet 模型上進行訓(xùn)練
該訓(xùn)練腳本輸出如下：
[6.778578]
[2.0212684]
[1.3814741]
[0.47514156]
[0.13277876]
[0.16388433]
[0.03788032]
[0.01225162]
…
至此，成功完成了數(shù)據(jù)集制作、讀取與訓(xùn)練整個流程。

總結(jié)

以上是生活随笔為你收集整理的OFRecord 图片文件制数据集的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。