日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

openvino系列 15. OpenVINO OCR

發(fā)布時(shí)間:2023/12/15 编程问答 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 openvino系列 15. OpenVINO OCR 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

openvino系列 15. OpenVINO OCR

此案例主要解釋如何使用 OpenVINO OCR 模型進(jìn)行字體檢測(cè)(detection)和識(shí)別(recognition)。總體上嘗試下來的,OpenVINO提供的OCR模塊效果一般,因?yàn)檫@個(gè)模塊只能識(shí)別數(shù)字和字母,遇到特殊字符會(huì)影響識(shí)別的精度,而且對(duì)于文字的角度與分辨率也有一定要求。

  • 字體檢測(cè)(detection)任務(wù)對(duì)應(yīng)模型:horizontal-text-detection-0001。
  • 字體識(shí)別(recognition)任務(wù)對(duì)應(yīng)模型:text-recognition-0014。

環(huán)境描述:

  • 本案例運(yùn)行環(huán)境:Win10,10代i5筆記本
  • IDE:VSCode
  • openvino版本:2022.1
  • 代碼鏈接,11-OCR

文章目錄

  • openvino系列 15. OpenVINO OCR
    • 1. 關(guān)于模型的使用
      • 1.1 字體檢測(cè)預(yù)訓(xùn)練模型
      • 1.2 FCOS 回顧
      • 1.3 PixelLink 算法回顧
      • 1.4 字體識(shí)別預(yù)訓(xùn)練模型
      • 1.5 最終選擇
    • 2. 代碼
      • 2.1 下載模型
      • 2.2 字體檢測(cè)模型
      • 2.3 字體識(shí)別模型
    • 3 結(jié)果


1. 關(guān)于模型的使用

OpenVINO 的 Model Zoo 提供了很多預(yù)訓(xùn)練模型。

1.1 字體檢測(cè)預(yù)訓(xùn)練模型

關(guān)于字體檢測(cè)的模型,Model Zoo 提供了如下幾個(gè):

  • horizontal-text-detection-0001
  • text-detection-0003
  • text-detection-0004
horizontal-text-detection-0001text-detection-0003text-detection-0004
說明based on FCOS architecture with MobileNetV2-like as a backbonebased on PixelLink architecture with MobileNetV2-like as a backbonebased on PixelLink architecture with MobileNetV2, depth_multiplier=1.4 as a backbone
輸入[1,3,704,704],對(duì)應(yīng) [1,C,H,W][1,768,1280,3],對(duì)應(yīng) [B,H,W,C][1,768,1280,3],對(duì)應(yīng) [B,H,W,C]
輸出1boxes:[N,5],其中 N 是檢測(cè)到的邊界框的數(shù)量。每個(gè)檢測(cè)框格式為:[x_min,y_min,x_max,y_max,conf]model/link_logits_/add:[1,192,320,16],logits related to linkage between pixels and their neighborsmodel/link_logits_/add:[1,192,320,16],logits related to linkage between pixels and their neighbors
輸出2labels:[N],其中 N 是檢測(cè)到的邊界框的數(shù)量,在文本檢測(cè)的情況下,每個(gè)檢測(cè)到的框的值都等于0。model/segm_logits/add:[1,192,320,2],logits related to text/no-text classification for each pixelmodel/segm_logits/add:[1,192,320,2],logits related to text/no-text classification for each pixel

B - batch size;H - image height;W - image width;C - number of channels。

1.2 FCOS 回顧

horizontal-text-detection-0001這個(gè)模型是通過FCOS訓(xùn)練而來的。這里我們對(duì)FCOS(Fully Convolutional One-Stage Object Detection)做一個(gè)簡(jiǎn)單的回顧。

FCOS是一個(gè)端到端的anchor-free one-stage 物體識(shí)別算法,網(wǎng)絡(luò)結(jié)構(gòu)如下圖,由如下三部分組成:

  • backbone網(wǎng)絡(luò);
  • feature pyramid結(jié)構(gòu);
  • 輸出部分(classification/Regression/Center-ness);
  • 根據(jù)FPN,我們?cè)诓煌瑢哟螌?duì)特征圖上檢測(cè)不同尺寸的物體。具體來說,我們抽出五層特征圖,分別定義為{ P3P_3P3?, P4P_4P4?, P5P_5P5?, P6P_6P6?, P7P_7P7?}。 P3P_3P3?, P4P_4P4?, P5P_5P5? 由主干CNN的特征圖 C3C_3C3?, C4C_4C4?, C5C_5C5? 經(jīng)過一個(gè)1x1卷積橫向連接得到。 P6P_6P6?, P7P_7P7? 分別由 P5P_5P5?, P6P_6P6? 經(jīng)過一個(gè)stride=2的卷積層得到。所以,最后我們得到的 P3P_3P3?, P4P_4P4?, P5P_5P5?, P6P_6P6?, P7P_7P7? 分別對(duì)應(yīng)stride 8,16,32,64,128。

    右側(cè)的 Head 是 FCOS 的重點(diǎn)部分,可以看到每層 feature 被分為了兩個(gè)分支,上面的分支用于做分類,下面的分支用于做目標(biāo)框位置的回歸。分類的分支還有一個(gè) Center-ness 分支用于做中心點(diǎn)的預(yù)測(cè)。不同于傳統(tǒng)的中心點(diǎn) + 寬高或者坐標(biāo)點(diǎn)的形式,FCOS 通過中心點(diǎn)和一個(gè)4D vector(l,t,r,b)來預(yù)測(cè)物體框的位置。

    最后,注意一點(diǎn),FCOS 中只要 feature map 某個(gè)位置的點(diǎn)落入 groundtruth 的 bbox 中就被認(rèn)為是正樣本,可見用于訓(xùn)練的正樣本的數(shù)量將會(huì)非常的多。

    Cost Function這里就不贅述了,我們只是在這里回顧一下 FCOS 算法的整體邏輯。

    1.3 PixelLink 算法回顧

    text-detection-0003和text-detection-0004背后的算法是基于PixelLink: Detecting Scene Text via Instance Segmentation。這里,我們對(duì)PixelLink做一個(gè)簡(jiǎn)單的回顧。

    對(duì)于一般的基于深度學(xué)習(xí)的文字檢測(cè)模型,其主要的實(shí)現(xiàn)步驟是判斷是不是文本,并且給出文本框的位置和角度,如下圖:

    上一章節(jié)那個(gè) FCOS 模型雖然不是專門檢測(cè)文字的,但整體邏輯類似,都是最后有一個(gè)回歸,一個(gè)分類。

    PixelLink主要有兩個(gè)部分:Pixel(像素)、Link(連接)。PixelLink主要是基于CNN網(wǎng)絡(luò),做某個(gè)像素(pixel)的文本/非文本的分類預(yù)測(cè),以及該像素的8個(gè)鄰域方向是否存在連接(link)的分類預(yù)測(cè)(即上圖中虛線框內(nèi)的八個(gè)熱圖,代表八個(gè)方向的連接預(yù)測(cè))。

    PixelLink網(wǎng)絡(luò)結(jié)構(gòu)的骨干(backbone)采用VGG16作為特征提取器,將最后的全連接層fc6、fc7替換為卷積層,特征融合和像素預(yù)測(cè)的方式基于FPN思想(feature pyramid network,金字塔特征網(wǎng)絡(luò)),即卷積層的尺寸依次減半,但卷積核的數(shù)量依次增倍。該模型結(jié)構(gòu)有兩個(gè)獨(dú)立的頭,一個(gè)用于文本/非文本預(yù)測(cè)(Text/non-text Prediction),另一個(gè)用于連接預(yù)測(cè)(Link Prediction),這兩者都使用了Softmax,輸出1x2=2通道(文本/非文本的分類)和8x2=16通道(8個(gè)鄰域方向是否有連接的分類)。

    1.4 字體識(shí)別預(yù)訓(xùn)練模型

    關(guān)于字體識(shí)別的模型,Model Zoo 提供了如下幾個(gè):

    • text-recognition-0012
    • text-recognition-0014
    • text-recognition-resnet-fc
    text-recognition-0012text-recognition-0014text-recognition-resnet-fc
    說明VGG16-like backbone and bidirectional LSTM encoder-decoderResNext101-like backbone (stage-1-2) and bidirectional LSTM encoder-decoder.model based on ResNet with Fully Connected text recognition head
    Accuracy in ICDAR13 Dataset0.88180.888792.96%
    輸入[1,32,120,1],對(duì)應(yīng) [B,H,W,C][1,1,32,128],對(duì)應(yīng) [B,C,H,W][1,1,32,100],對(duì)應(yīng) [B,C,H,W]
    注意source image should be tight aligned crop with detected text converted to grayscale.source image should be tight aligned crop with detected text converted to grayscale.source image should be tight aligned crop with detected text converted to grayscale. Mean values: [127.5, 127.5, 127.5], scale factor for each channel: 127.5.
    輸出boxes:[30,1,37],對(duì)應(yīng)[W,B,L],L的順序:0123456789abcdefghijklmnopqrstuvwxyz#[16,1,37],對(duì)應(yīng)[W,B,L],L的順序:#0123456789abcdefghijklmnopqrstuvwxyz[1,26,37],對(duì)應(yīng)[B,W,L],L的順序:[s]0123456789abcdefghijklmnopqrstuvwxyz

    B - batch size;H - image height;W - image width;C - number of channels;W:output sequence length;L:confidence distribution across alphanumeric symbols。

    1.5 最終選擇

    最終我們選擇:

    • 字體檢測(cè)(detection)任務(wù)對(duì)應(yīng)模型:horizontal-text-detection-0001。
    • 字體識(shí)別(recognition)任務(wù)對(duì)應(yīng)模型:text-recognition-0014。

    2. 代碼

    2.1 下載模型

    首先,和其他模型一樣,我們還是先下載模型。

    import shutil import sys from pathlib import Path import cv2 import matplotlib.pyplot as plt import numpy as np from IPython.display import Markdown, display from PIL import Image from openvino.runtime import Core from yaspin import yaspin import numpy from PIL import Image, ImageOpsie = Core() model_dir = Path("model") precision = "FP16" detection_model = "horizontal-text-detection-0001" recognition_model = "text-recognition-0014" #base_model_dir = Path("~/open_model_zoo_models").expanduser() base_model_dir = Path("./model/open_model_zoo_models").expanduser() #omz_cache_dir = Path("~/open_model_zoo_cache").expanduser() omz_cache_dir = Path("./model/open_model_zoo_cache").expanduser() model_dir.mkdir(exist_ok=True) ''' 下載模型 ''' print("1 - Download text detection model: horizontal-text-detection-0001, and text recognition model: text-recognition-0014 from Open Model Zoo. Both models are already in IR format.") ir_path_detection_model = Path(f"{base_model_dir}/intel/{detection_model}/{precision}/{detection_model}.xml") ir_path_recognition_model = Path(f"{base_model_dir}/intel/{recognition_model}/{precision}/{recognition_model}.xml")if not ir_path_detection_model.exists() and ir_path_recognition_model.exists():download_command = f"omz_downloader " \f"--name {detection_model},{recognition_model} " \f"--output_dir {base_model_dir} " \f"--cache_dir {omz_cache_dir} " \f"--precision {precision}"display(Markdown(f"Download command: `{download_command}`"))with yaspin(text=f"Downloading {detection_model}, {recognition_model}") as sp:download_result = !$download_commandprint(download_result)sp.text = f"Finished downloading {detection_model}, {recognition_model}"sp.ok("?") else:print("IR model already exists.")

    2.2 字體檢測(cè)模型

    • 加載檢測(cè)模型:horizontal-text-detection-0001;
    • 加載圖像,并調(diào)整其尺寸使之和模型的輸入尺寸吻合;
    • 模型推理,并返回檢測(cè)推理結(jié)果。

    首先,我們加載檢測(cè)模型,并且看一下這個(gè)模型的輸入輸出:

    print("2 - Load detection Model: horizontal-text-detection-0001")detection_model = ie.read_model(model=ir_path_detection_model, weights=ir_path_detection_model.with_suffix(".bin") ) detection_compiled_model = ie.compile_model(model=detection_model, device_name="CPU")detection_input_layer = detection_compiled_model.input(0) detection_output_layer_box = detection_compiled_model.output('boxes') detection_output_layer_label = detection_compiled_model.output('labels')print("- Input of detection model shape: {}".format(detection_input_layer)) print("- Output `box` of detection model shape: {}".format(detection_output_layer_box)) print("- Output `label` of detection model shape: {}".format(detection_output_layer_label))

    Terminal打印:

    2 - Load detection Model. - Input of detection model shape: <ConstOutput: names[image] shape{1,3,704,704} type: f32> - Output `box` of detection model shape: <ConstOutput: names[boxes] shape{..100,5} type: f32> - Output `label` of detection model shape: <ConstOutput: names[labels] shape{..100} type: i64>

    接下來,我們導(dǎo)入圖片,并調(diào)整其尺寸使之和模型的輸入尺寸吻合。

    print("3 - Load Image and resize into model input shape.")# Read the image image = cv2.imread("data/label4.png") print("- Input image size: {}".format(image.shape)) # N,C,H,W = batch size, number of channels, height, width N, C, H, W = detection_input_layer.shape# Resize image to meet network expected input sizes resized_image = cv2.resize(image, (W, H))# Reshape to network input shape input_image = np.expand_dims(resized_image.transpose(2, 0, 1), 0) print("- Input image is resized (with padding) into: {}".format(input_image.shape))plt.imshow(cv2.cvtColor(resized_image, cv2.COLOR_BGR2RGB));

    Terminal打印:

    3 - Load Image and resize into model input shape. - Input image size: (256, 644, 3) - Input image is resized (with padding) into: (1, 3, 704, 704)

    模型推理的代碼如下:

    ''' ### 模型推理 在圖像中檢測(cè)到文本框并以`[100, 5]`形狀的數(shù)據(jù)塊形式返回。每個(gè)檢測(cè)描述的格式為 `[x_min, y_min, x_max, y_max, conf]`。 ''' print("4 - Detection model inference.") output_key = detection_compiled_model.output("boxes") boxes = detection_compiled_model([input_image])[output_key]# Remove zero only boxes boxes = boxes[~np.all(boxes == 0, axis=1)] print("- Detect {} boxes.".format(boxes.shape[0]))

    Terminal打印:

    4 - Detection model inference. - Detect 4 boxes.

    2.3 字體識(shí)別模型

    文字識(shí)別模型和文字檢測(cè)模型導(dǎo)入和推理的步驟是類似的,這里我們就直接上代碼了:

    def multiply_by_ratio(ratio_x, ratio_y, box):return [max(shape * ratio_y, 10) if idx % 2 else shape * ratio_xfor idx, shape in enumerate(box[:-1])]def run_preprocesing_on_crop(crop, net_shape):temp_img = cv2.resize(crop, net_shape)temp_img = temp_img.reshape((1,) * 2 + temp_img.shape)return temp_imgdef convert_result_to_image(bgr_image, resized_image, boxes, threshold=0.3, conf_labels=True):# Define colors for boxes and descriptionscolors = {"red": (255, 0, 0), "green": (0, 255, 0), "white": (255, 255, 255)}# Fetch image shapes to calculate ratio(real_y, real_x), (resized_y, resized_x) = image.shape[:2], resized_image.shape[:2]ratio_x, ratio_y = real_x / resized_x, real_y / resized_y# Convert base image from bgr to rgb formatrgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)# Iterate through non-zero boxesfor box, annotation in boxes:# Pick confidence factor from last place in arrayconf = box[-1]if conf > threshold:# Convert float to int and multiply position of each box by x and y ratio(x_min, y_min, x_max, y_max) = map(int, multiply_by_ratio(ratio_x, ratio_y, box))# Draw box based on position, parameters in rectangle function are: image, start_point, end_point, color, thicknesscv2.rectangle(rgb_image, (x_min, y_min), (x_max, y_max), colors["green"], 3)# Add text to image based on position and confidence, parameters in putText function are: image, text, bottomleft_corner_textfield, font, font_scale, color, thickness, line_typeif conf_labels:# Create background box based on annotation length(text_w, text_h), _ = cv2.getTextSize(f"{annotation}", cv2.FONT_HERSHEY_TRIPLEX, 0.8, 1)image_copy = rgb_image.copy()cv2.rectangle(image_copy,(x_min, y_min - text_h - 10),(x_min + text_w, y_min - 10),colors["white"],-1,)# Add weighted image copy with white boxes under textcv2.addWeighted(image_copy, 0.4, rgb_image, 0.6, 0, rgb_image)cv2.putText(rgb_image,f"{annotation}",(x_min, y_min - 10),cv2.FONT_HERSHEY_SIMPLEX,0.8,colors["red"],1,cv2.LINE_AA,)return rgb_imageprint("5 - Load Recognition Model: text-recognition-0014")recognition_model = ie.read_model(model=ir_path_recognition_model, weights=ir_path_recognition_model.with_suffix(".bin") )recognition_compiled_model = ie.compile_model(model=recognition_model, device_name="CPU")recognition_output_layer = recognition_compiled_model.output(0) recognition_input_layer = recognition_compiled_model.input(0)# Get height and width of input layer _, _, Hrecog, Wrecog = recognition_input_layer.shapeprint("- Input of recognition model shape: {}".format(recognition_input_layer)) print("- Output of recognition model shape: {}".format(recognition_output_layer))''' 模型推理 ''' # Calculate scale for image resizing (real_y, real_x), (resized_y, resized_x) = image.shape[:2], resized_image.shape[:2] ratio_x, ratio_y = real_x / resized_x, real_y / resized_y# Convert image to grayscale for text recognition model grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)# Get dictionary to encode output, based on model documentation letters = "~0123456789abcdefghijklmnopqrstuvwxyz"# Prepare empty list for annotations annotations = list() cropped_images = list() # fig, ax = plt.subplots(len(boxes), 1, figsize=(5,15), sharex=True, sharey=True) # For each crop, based on boxes given by detection model we want to get annotations for i, crop in enumerate(boxes):# Get coordinates on corners of crop(x_min, y_min, x_max, y_max) = map(int, multiply_by_ratio(ratio_x, ratio_y, crop))image_crop = run_preprocesing_on_crop(grayscale_image[y_min:y_max, x_min:x_max], (Wrecog, Hrecog))# Run inference with recognition modelresult = recognition_compiled_model([image_crop])[recognition_output_layer]# Squeeze output to remove unnececery dimensionrecognition_results_test = np.squeeze(result)# Read annotation based on probabilities from output layerannotation = list()for letter in recognition_results_test:parsed_letter = letters[letter.argmax()]# 如果我們檢測(cè)到數(shù)字,都需要-1if parsed_letter.isnumeric():parsed_letter = int(parsed_letter)parsed_letter = parsed_letter + 1if parsed_letter == 10:parsed_letter = 0parsed_letter = str(parsed_letter)# Returning 0 index from argmax signalises end of stringif parsed_letter == letters[0]:continueannotation.append(parsed_letter)annotations.append("".join(annotation))cropped_image = Image.fromarray(image[y_min:y_max, x_min:x_max])cropped_images.append(cropped_image)boxes_with_annotations = list(zip(boxes, annotations))

    3 結(jié)果

    我試了幾張圖片,其實(shí)效果一般,說實(shí)話,還沒有Tesseract好。如下圖:

    總結(jié)

    以上是生活随笔為你收集整理的openvino系列 15. OpenVINO OCR的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。