日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Paddleocr 去除无关信息打印

發布時間:2024/1/1 编程问答 37 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Paddleocr 去除无关信息打印 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

用paddleocr庫來識別圖片中的文字時,總會有幾行無關信息,例如:

[2022/09/13 12:01:22] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, shape_info_filename=None, precision='fp32', gpu_mem=500, image_dir=None, det_algorithm='DB', det_model_dir='C:\\Users\\bdy/.paddleocr/whl\\det\\ch\\ch_PP-OCRv3_det_infer', det_limit_side_len=960, det_limit_type='max', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_box_type='quad', det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, det_fce_box_type='poly', rec_algorithm='SVTR_LCNet', rec_model_dir='C:\\Users\\bdy/.paddleocr/whl\\rec\\ch\\ch_PP-OCRv3_rec_infer', rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='D:\\python3.10\\Lib\\site-packages\\paddleocr\\ppocr\\utils\\ppocr_keys_v1.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=True, cls_model_dir='C:\\Users\\bdy/.paddleocr/whl\\cls\\ch_ppocr_mobile_v2.0_cls_infer', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, ocr=True, recovery=False, save_pdf=False, lang='ch', det=True, rec=True, type='ocr', ocr_version='PP-OCRv3', structure_version='PP-Structurev2') [2022/09/13 12:01:22] ppocr DEBUG: dt_boxes num : 1, elapse : 0.03490591049194336 [2022/09/13 12:01:22] ppocr DEBUG: cls num : 1, elapse : 0.010002851486206055 [2022/09/13 12:01:23] ppocr DEBUG: rec_res num : 1, elapse : 0.08177757263183594

這種情況很影響觀感,為此我追蹤輸出這些的語句,并將其注釋掉。

直接上結論

到自己的 python 文件夾中,如我的是3.10,來到 D:\python3.10\Lib\logging 文件夾,編輯__init__.py。

可能 logging 文件夾不在 Lib 下,有可能直接在 python 的文件夾中。

def emit(self, record):"""Emit a record.If a formatter is specified, it is used to format the record.The record is then written to the stream with a trailing newline. Ifexception information is present, it is formatted usingtraceback.print_exception and appended to the stream. If the streamhas an 'encoding' attribute, it is used to determine how to do theoutput to the stream."""try:msg = self.format(record)stream = self.stream# issue 35046: merged two stream.writes into one.stream.write(msg + self.terminator)self.flush()except RecursionError: # See issue 36272raiseexcept Exception:self.handleError(record)

在這里將?stream.write(msg + self.terminator) 注釋掉即可。

因為這里是將兩次write合并了,而有些版本是有兩行 stream.write 的。所以在注釋時,要把帶有 stream.write 的全部注釋掉。

追蹤過程

在 pycharm 中,按住 Ctrl ,點擊?PaddleOCR,之后默認按住 Ctrl

from paddleocr import PaddleOCR

來到 Paddleocr 中,直接拉到函數末尾,點擊 logger.debug(params) 中的 debug

class PaddleOCR(predict_system.TextSystem):def __init__(self, **kwargs):"""paddleocr packageargs:**kwargs: other params show in paddleocr --help"""params = parse_args(mMain=False)params.__dict__.update(**kwargs)assert params.ocr_version in SUPPORT_OCR_MODEL_VERSION, "ocr_version must in {}, but get {}".format(SUPPORT_OCR_MODEL_VERSION, params.ocr_version)params.use_gpu = check_gpu(params.use_gpu)if not params.show_log:logger.setLevel(logging.INFO)self.use_angle_cls = params.use_angle_clslang, det_lang = parse_lang(params.lang)# init model dirdet_model_config = get_model_config('OCR', params.ocr_version, 'det',det_lang)params.det_model_dir, det_url = confirm_model_dir_url(params.det_model_dir,os.path.join(BASE_DIR, 'whl', 'det', det_lang),det_model_config['url'])rec_model_config = get_model_config('OCR', params.ocr_version, 'rec',lang)params.rec_model_dir, rec_url = confirm_model_dir_url(params.rec_model_dir,os.path.join(BASE_DIR, 'whl', 'rec', lang), rec_model_config['url'])cls_model_config = get_model_config('OCR', params.ocr_version, 'cls','ch')params.cls_model_dir, cls_url = confirm_model_dir_url(params.cls_model_dir,os.path.join(BASE_DIR, 'whl', 'cls'), cls_model_config['url'])if params.ocr_version == 'PP-OCRv3':params.rec_image_shape = "3, 48, 320"else:params.rec_image_shape = "3, 32, 320"# download modelmaybe_download(params.det_model_dir, det_url)maybe_download(params.rec_model_dir, rec_url)maybe_download(params.cls_model_dir, cls_url)if params.det_algorithm not in SUPPORT_DET_MODEL:logger.error('det_algorithm must in {}'.format(SUPPORT_DET_MODEL))sys.exit(0)if params.rec_algorithm not in SUPPORT_REC_MODEL:logger.error('rec_algorithm must in {}'.format(SUPPORT_REC_MODEL))sys.exit(0)if params.rec_char_dict_path is None:params.rec_char_dict_path = str(Path(__file__).parent / rec_model_config['dict_path'])logger.debug(params)# init det_model and rec_modelsuper().__init__(params)

但是這是我們發現,有很多 debug 函數,這里我們選擇第三個

這里點擊?self._log(DEBUG, msg, args, **kwargs) 中的 _log

def debug(self, msg, *args, **kwargs):"""Log 'msg % args' with severity 'DEBUG'.To pass exception information, use the keyword argument exc_info witha true value, e.g.logger.debug("Houston, we have a %s", "thorny problem", exc_info=1)"""if self.isEnabledFor(DEBUG):self._log(DEBUG, msg, args, **kwargs)

?拉到最后,點擊?self.handle(record) 中的 handle(這個就在 _log 的正下方)

def _log(self, level, msg, args, exc_info=None, extra=None, stack_info=False,stacklevel=1):"""Low-level logging routine which creates a LogRecord and then callsall the handlers of this logger to handle the record."""sinfo = Noneif _srcfile:#IronPython doesn't track Python frames, so findCaller raises an#exception on some versions of IronPython. We trap it here so that#IronPython can use logging.try:fn, lno, func, sinfo = self.findCaller(stack_info, stacklevel)except ValueError: # pragma: no coverfn, lno, func = "(unknown file)", 0, "(unknown function)"else: # pragma: no coverfn, lno, func = "(unknown file)", 0, "(unknown function)"if exc_info:if isinstance(exc_info, BaseException):exc_info = (type(exc_info), exc_info, exc_info.__traceback__)elif not isinstance(exc_info, tuple):exc_info = sys.exc_info()record = self.makeRecord(self.name, level, fn, lno, msg, args,exc_info, func, extra, sinfo)self.handle(record)

點擊?callHandlers

def handle(self, record):"""Call the handlers for the specified record.This method is used for unpickled records received from a socket, aswell as those created locally. Logger-level filtering is applied."""if (not self.disabled) and self.filter(record):self.callHandlers(record)

在?callHandlers 中找到如下代碼段,點擊這里中間的 handle

while c:for hdlr in c.handlers:found = found + 1if record.levelno >= hdlr.level:hdlr.handle(record)if not c.propagate:c = None #break outelse:c = c.parent

點擊 emit

def handle(self, record):"""Conditionally emit the specified logging record.Emission depends on filters which may have been added to the handler.Wrap the actual emission of the record with acquisition/release ofthe I/O thread lock. Returns whether the filter passed the record foremission."""rv = self.filter(record)if rv:self.acquire()try:self.emit(record)finally:self.release()return rv

?這里我們發現只會返回異常,可以確定這不是我們要找的 emit

def emit(self, record):"""Do whatever it takes to actually log the specified logging record.This version is intended to be implemented by subclasses and soraises a NotImplementedError."""raise NotImplementedError('emit must be implemented ''by Handler subclasses')

這里要點左面帶下箭頭的藍點,選擇第一個 emit

最后就如上面結論修改即可

總結

以上是生活随笔為你收集整理的Paddleocr 去除无关信息打印的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。