日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习案例:验证码识别(Captcha)

發布時間:2024/3/13 编程问答 41 豆豆
生活随笔 收集整理的這篇文章主要介紹了 机器学习案例:验证码识别(Captcha) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

驗證碼(CAPTCHA,全自動區分計算機和人類的圖靈測試)的縮寫,是一種區分用戶是計算機還是人工智能的全自動程序。
實驗步驟:

  • 1、創建驗證碼
  • 2、對驗證碼進行01值化
  • 3、降噪
  • 4、對驗證碼進行切分
  • 5、對切分后的驗證碼進行圖片轉數字化
  • 6、使用邏輯回歸建模
  • 7、對新輸入的圖片進行預測

驗證碼的創建
1、隨機生成驗證碼的顏色
2、隨機生成驗證碼數字
3、使用PIL進行畫圖

import os from PIL import Image from PIL import ImageDraw from PIL import ImageFont import random import matplotlib.pyplot as plt def getRandomColor():"""獲取一個隨機顏色(r,g,b)格式的:return:"""c1 = random.randint(0, 255)c2 = random.randint(0, 255)c3 = random.randint(0, 255)if c1 == 255:c1 = 0if c2 == 255:c2 = 0if c3 == 255:c3 = 0return (c1, c2, c3) def getRandomStr():"""獲取一個隨機數字,每個數字的顏色也是隨機的:return:"""random_num = str(random.randint(0, 9))return random_num def generate_captcha():"""使用PIL畫圖步驟:return: """# 獲取一個Image對象,參數分別是RGB模式。寬150,高30, 隨機顏色image = Image.new('RGB', (150, 50), (255, 255, 255))# 獲取一個畫筆對象,將圖片對象傳過去draw = ImageDraw.Draw(image)# 獲取一個font字體對象參數是ttf的字體文件的目錄,以及字體的大小font = ImageFont.truetype("arlrdbd.ttf", size=32) # 如果找不到字體,需要從網上下載到本地label = "" # 隨機生成有5個數字的字符串for i in range(5):random_char = getRandomStr()label += random_char# 在圖片上寫東西,參數是:定位,字符串,顏色,字體draw.text((10+i*30, 0), random_char, getRandomColor(), font=font)# 畫出隨機噪點噪線width = 150height = 30# 畫線for i in range(3):x1 = random.randint(0, width)x2 = random.randint(0, width)y1 = random.randint(0, height)y2 = random.randint(0, height)draw.line((x1, y1, x2, y2), fill=(0, 0, 0))# 畫點for i in range(5):draw.point([random.randint(0, width), random.randint(0, height)], fill=getRandomColor())x = random.randint(0, width)y = random.randint(0, height)draw.arc((x, y, x + 4, y + 4), 0, 90, fill=(0, 0, 0))# 保存到硬盤,名為test.png格式為png的圖片image.save(open(''.join(['captcha_images/', label, '.png']), 'wb'), 'png')# image.save(open(''.join(['captcha_predict/', label, '.png']), 'wb'), 'png')

執行代碼之后,會在‘captcha_images’下生成實驗所需的圖片,如圖:

圖像處理:對生成的圖片進行處理
(1)對驗證碼圖片二值化,首先把圖像從RGB 三通道轉化成Gray單通道,然后把灰度圖(0~255)轉化成二值圖(0,1)。
(2)將處理好的二值圖進行降噪,去除圖片中的噪點和噪線

from PIL import Image import numpy as np import matplotlib.pyplot as plt import osdef binarization(path):"""把一個rgb的圖轉換成一個二值圖:param path::return:"""# 通過path把圖像laod進來img = Image.open(path)# 把圖像轉化成一個灰度圖img_gray = img.convert("L")# 把灰度圖組裝成數組形式img_gray = np.array(img_gray)# print(img_gray)# 得到灰度圖的寬和高w, h = img_gray.shapefor x in range(w):for y in range(h):# 得到每一個像素塊里的灰度值gray = img_gray[x, y]# 如果灰度值小于等于220, 就把它變成黑色if gray <= 220:img_gray[x, y] = 0# 如果灰度值大于220,就把它變成白色else:img_gray[x, y] = 1plt.figure("")plt.imshow(img_gray, cmap="gray")plt.axis("off")plt.show()return img_graydef noiseReduction(img_gray, label):"""降噪,也就是處理離群點如果一個像素點周圍只有小于4個黑點的時候,那么這個點就是離群點:param img_gray::param label::return:"""height, width = img_gray.shapefor x in range(height):for y in range(width):cnt = 0# 白色的點不用管if img_gray[x, y] == 1:continueelse:try:if img_gray[x-1, y-1] == 0:cnt += 1except:passtry:if img_gray[x-1, y] == 0:cnt += 1except:passtry:if img_gray[x-1, y+1] == 0:cnt += 1except:passtry:if img_gray[x, y-1] == 0:cnt += 1except:passtry:if img_gray[x, y+1] == 0:cnt += 1except:passtry:if img_gray[x+1, y-1] == 0:cnt += 1except:passtry:if img_gray[x+1, y] == 0:cnt += 1except:passtry:if img_gray[x+1, y+1] == 0:cnt += 1except:passif cnt < 4: # 周圍少于4個點就算是噪點img_gray[x, y] = 1plt.figure(" ")plt.imshow(img_gray, cmap="gray")plt.axis("off")plt.savefig("".join(["clean_captcha_img/", label, ".png"]))def image_2_clean():"""把所有的圖像都轉化成二值圖:return:"""captchas = os.listdir("".join(["captcha_images/"]))for captcha in captchas:label = captcha.split(".")[0]image_path = "".join(["captcha_images/", captcha])# 二值化im = binarization(image_path)# 降噪noiseReduction(im, label)if __name__ == '__main__':image_2_clean()# path = "captcha_images/00006.png"# img_gray = binarization(path)# noiseReduction(img_gray, label='00006')


圖像分割:對降噪后的圖片進行分割,并對分割后的圖片進行存儲

import os from PIL import Image from PIL import ImageDraw from PIL import ImageFont import random import matplotlib.pyplot as plt def cutImg(label):"""把圖像的每一個數字都切分出來,并且存到新的文件夾下:param label::return:"""labels = list(label)img = Image.open("".join(['clean_captcha_img/', label, '.png']))for i in range(5):pic = img.crop((100*(1+i), 170, 100*(1+i)+100, 280))plt.imshow(pic)# seq就是我們需要存到文件的文件名seq = get_save_seq(label[i])pic.save("".join(["cut_number/", str(label[i]), "/", str(seq), '.png'])) def get_save_seq(num):"""得到需要保存的數據的文件名每一個數文件下的文件名,都是從0開始保存 0.png, 1.png....:param num::return:"""nmlist = os.listdir("".join(["cut_number/", num, "/"]))if len(nmlist) == 0 or nmlist is None:return 0else:max_file = 0for file in nmlist:if int(file.split(".")[0]) > max_file:max_file = int(file.split(".")[0])return int(max_file) + 1 def clean_to_cut():"""對每一個文件都進行切分:return:"""captchas = os.listdir("".join(["clean_captcha_img"]))for captcha in captchas:label = captcha.split(".")[0]cutImg(label) def create_dir():for i in range(10):os.mkdir("".join(["cut_number/", str(i)]))if __name__ == '__main__':# create_dir()clean_to_cut()

圖片轉數字化:對切分后的圖片灰度化、二值化,使用Image.open()打開圖片文件,得到plt圖片對象,將plt圖片對象轉換為ndarray對象,將二值化后的圖像轉化為1行n列,存入X列表中,并將其對應的數字存入Y列表中。
模型的生成:將X,Y傳入邏輯回歸模型中,使用交叉驗證和網格搜索尋找最優的參數。

import os from PIL import Image import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.model_selection import GridSearchCV from sklearn.externals import joblibfrom sklearn.metrics import confusion_matrix from sklearn.metrics import precision_score from sklearn.metrics import recall_score def load_data():"""把數據從cut_number里面導出來其中X指的是每一個數字的01值的排列, Y指的是這個數字是什么:return:"""X, Y = [], []cut_list = os.listdir("cut_number")# 循環cut_number文件夾下的每一個自文件夾(1,2,3,4,5...)for numC in cut_list:num_list_dir = "".join(["cut_number/", str(numC), "/"])nums_dir = os.listdir(num_list_dir)# 循環子文件夾中的每一個圖片# print(np.array(Image.open(''.join(['cut_number/', str(numC), '/', '0.png']))))for num_file in nums_dir:# 導入數字圖片img = Image.open("".join(["cut_number/", str(numC), "/", num_file]))# print(np.array(img))# 對數字圖片做灰度化img_gray = img.convert("L")# plt.imshow(img_gray)# 把灰度化圖片保存到數組里img_array = np.array(img_gray)w, h = img_array.shape# 把灰度化的圖片做二值化for x in range(w):for y in range(h):gray = img_array[x, y]if gray <= 220:img_array[x, y] = 0else:img_array[x, y] = 1# 把二值化的圖片reshape成1行,n列img_re = img_array.reshape(1, -1)# print(img_re[0])X.append(img_re[0])Y.append(int(numC))return np.array(X), np.array(Y)def generate_model(X, Y):"""生成模型:param X::param Y::return:"""# 區分測試集和訓練集,37開X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3)log_clf = LogisticRegression(multi_class="ovr", solver="sag", max_iter=10000)# log_clf.fit(X_train, Y_train)# 利用交叉驗證選擇參數param_grid = {"tol": [1e-4, 1e-5, 1e-2], "C": [0.4, 0.6, 0.8]}grid_search = GridSearchCV(log_clf, param_grid=param_grid, cv=3)grid_search.fit(X, Y)print(grid_search.best_params_)print("模型生成成功")# 將模型持久化joblib.dump(log_clf, "captcha_model/captcha_model.model")print("模型保存成功")if __name__ == '__main__':X, Y = load_data()generate_model(X, Y)

圖片的預測:
輸入要預測的圖片,對其進行灰度化,二值化,并進行分割,將分割出來的五個圖片輸入進模型中。

import os from PIL import Image import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.model_selection import GridSearchCV from sklearn.externals import joblibfrom sklearn.metrics import confusion_matrix from sklearn.metrics import precision_score from sklearn.metrics import recall_score from .captcha_logistic import *def get_model():model = joblib.load('captcha_model/captcha_model.model')return modeldef model_predict():path = 'captcha_predict/unknown.png'pre_img_gray = binarization(path)noiseReduction(pre_img_gray, 'unknown')# cut imagelabels = ['0', '1', '2', '3', '4']img = Image.open(''.join(['clean_captcha_img/unknown.png']))for i in range(5):pic = img.crop((100 * (1 + i), 170, 100 * (1 + i) + 100, 280))plt.imshow(pic)pic.save(''.join(['captcha_predict/', labels[i], '.png']))result = ''model = get_model()for i in range(5):path = ''.join(['captcha_predict/', labels[i], '.png'])img = Image.open(path)img_gray = img.convert('L')img_array = np.array(img_gray)w, h = img_array.shapefor x in range(w):for y in range(h):gray = img_array[x, y]if gray <= 220:img_array[x, y] = 0else:img_array[x, y] = 1img_re = img_array.reshape(1, -1)X = img_re[0]y_pre = model.predict([X])result = ''.join([result, str(y_pre[0])])return result if __name__ == '__main__':result = model_predict()print(result)

總結

以上是生活随笔為你收集整理的机器学习案例:验证码识别(Captcha)的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。