當前位置：首頁 > 编程语言 > python >内容正文

python

python dlib学习（十一）：眨眼检测

發(fā)布時間：2025/3/21 python 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 python dlib学习（十一）：眨眼检测小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

前言

我們要使用opencv和dlib實現(xiàn)在視頻流中實時檢測和計數(shù)眨眼次數(shù)。
參考論文：Real-Time Eye Blink Detection using Facial Landmarks
作者在這篇文章中提出了一個眼睛縱橫比（eye aspect ratio (EAR)）的概念，通過計算這個EAR的數(shù)值，我們可以判斷眼睛是張開還是閉合，從而檢測眨眼動作。
首先，參考別人翻譯的這篇文章OpenCV/Python/dlib眨眼檢測，我稍微修改了一下代碼，實現(xiàn)了初步檢測；后面會進一步使用若干幀中檢測到的EAR組成一個特征向量，送到一個SVM中來進行分類。

眼睛縱橫比(EAR)

在討論EAR之前，先看看68個人臉特征點：

人臉特征點檢測本身的算法是很復雜的，dlib中給出了相關的實現(xiàn)。程序?qū)崿F(xiàn)可以參考我以前的博客：python dlib學習（二）：人臉特征點標定。想深入研究的可以參考這篇論文。

從圖中我們可以看到左眼和右眼分別對應了6個特征點，我們后面的討論都是基于這6個特征點來進行。

論文中國給出的EAR定義的圖片如下：

上圖中的6個特征點p1、p2、p3、p4、p5、p6是人臉特征點中對應眼睛的6個特征點。
我們想關注的重點是：這些點在眼睛睜開和閉合時，彼此坐標之間的關系。
如圖中直線所示，我們可以看出，長寬比在眼睛睜開和閉合時會有所不同。
順理成章地，我們可以導出EAR的方程：

EAR=∥p2?p6∥+∥p3?p5∥2∥p1?p4∥

分子中計算的是眼睛的特征點在垂直方向上的距離，分母計算的是眼睛的特征點在水平方向上的距離。由于水平點只有一組，而垂直點有兩組，所以分母乘上了2，以保證兩組特征點的權(quán)重相同。
接下來看看，上面的那個表格。我們不難發(fā)現(xiàn)，EAR在眼睛睜開時是基本保持不變的，在小范圍內(nèi)會上下浮動，然而，當眼睛閉合時，EAR會迅速下降。這也就是我們進行眨眼檢測的原理，十分簡單。當然想了解更詳細的內(nèi)容還是請查閱論文。

程序?qū)崿F(xiàn)

程序1：直接使用閾值判斷

導入模塊

#coding=utf-8 import numpy as np import cv2 import dlib from scipy.spatial import distance import os from imutils import face_utils

導入檢測器

shape_predictor_68_face_landmarks.dat這個模型文件我是存放在當前目錄下的model文件夾中的。

pwd = os.getcwd()# 獲取當前路徑 model_path = os.path.join(pwd, 'model')# 模型文件夾路徑 shape_detector_path = os.path.join(model_path, 'shape_predictor_68_face_landmarks.dat')# 人臉特征點檢測模型路徑detector = dlib.get_frontal_face_detector()# 人臉檢測器 predictor = dlib.shape_predictor(shape_detector_path)# 人臉特征點檢測器

定義一些參數(shù)

EYE_AR_THRESH = 0.3# EAR閾值 EYE_AR_CONSEC_FRAMES = 3# 當EAR小于閾值時，接連多少幀一定發(fā)生眨眼動作# 對應特征點的序號 RIGHT_EYE_START = 37 - 1 RIGHT_EYE_END = 42 - 1 LEFT_EYE_START = 43 - 1 LEFT_EYE_END = 48 - 1

EYE_AR_THRESH是判斷閾值，默認為0.3。如果EAR大于它，則認為眼睛是睜開的；如果EAR小于它，則認為眼睛是閉上的。
EYE_AR_CONSEC_FRAMES表示的是，當EAR小于閾值時，接連多少幀一定發(fā)生眨眼動作。只有小于閾值的幀數(shù)超過了這個值時，才認為當前眼睛是閉合的，即發(fā)生了眨眼動作；否則則認為是誤操作。
RIGHT_EYE_START、RIGHT_EYE_END、LEFT_EYE_START、LEFT_EYE_END：這幾個都對應了人臉特征點中對應眼睛的那幾個特征點的序號。由于list中默認從0開始，為保持一致，所以減一。

處理視頻流

frame_counter = 0# 連續(xù)幀計數(shù) blink_counter = 0# 眨眼計數(shù) cap = cv2.VideoCapture(1) while(1):ret, img = cap.read()# 讀取視頻流的一幀gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)# 轉(zhuǎn)成灰度圖像rects = detector(gray, 0)# 人臉檢測for rect in rects:# 遍歷每一個人臉print('-'*20)shape = predictor(gray, rect)# 檢測特征點points = face_utils.shape_to_np(shape)# convert the facial landmark (x, y)-coordinates to a NumPy arrayleftEye = points[LEFT_EYE_START:LEFT_EYE_END + 1]# 取出左眼對應的特征點rightEye = points[RIGHT_EYE_START:RIGHT_EYE_END + 1]# 取出右眼對應的特征點leftEAR = eye_aspect_ratio(leftEye)# 計算左眼EARrightEAR = eye_aspect_ratio(rightEye)# 計算右眼EARprint('leftEAR = {0}'.format(leftEAR))print('rightEAR = {0}'.format(rightEAR))ear = (leftEAR + rightEAR) / 2.0# 求左右眼EAR的均值leftEyeHull = cv2.convexHull(leftEye)# 尋找左眼輪廓rightEyeHull = cv2.convexHull(rightEye)# 尋找右眼輪廓cv2.drawContours(img, [leftEyeHull], -1, (0, 255, 0), 1)# 繪制左眼輪廓cv2.drawContours(img, [rightEyeHull], -1, (0, 255, 0), 1)# 繪制右眼輪廓# 如果EAR小于閾值，開始計算連續(xù)幀，只有連續(xù)幀計數(shù)超過EYE_AR_CONSEC_FRAMES時，才會計做一次眨眼if ear < EYE_AR_THRESH:frame_counter += 1else:if frame_counter >= EYE_AR_CONSEC_FRAMES:blink_counter += 1frame_counter = 0# 在圖像上顯示出眨眼次數(shù)blink_counter和EARcv2.putText(img, "Blinks:{0}".format(blink_counter), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.putText(img, "EAR:{:.2f}".format(ear), (300, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.imshow("Frame", img)if cv2.waitKey(1) & 0xFF == ord("q"):breakcap.release() cv2.destroyAllWindows()

運行結(jié)果

注：我測試時，我的攝像頭只有30幀，如果按照我們通常眨眼的速度，眨眼速度實在太快攝像頭根本來不及捕捉，更別說檢測到了。所以，如果你的攝像頭幀數(shù)不夠，檢測時眨眼放慢點，那樣效果會好一點。還有，盡量別戴眼鏡，眼睛反光可能會導致檢測眼睛輪廓時出錯，計算出的結(jié)果自然也是錯的。

程序2：使用SVM來劃分特征向量

這一部分相對要麻煩一點，我們需要自己采集數(shù)據(jù)，并訓練SVM模型，隨后才能在程序中讀取模型來使用。

采集數(shù)據(jù)

導入包

#coding=utf-8 import numpy as np import os import dlib import cv2 from scipy.spatial import distance from imutils import face_utils import pickle

隊列（特征向量）

VECTOR_SIZE = 3 def queue_in(queue, data):ret = Noneif len(queue) >= VECTOR_SIZE:ret = queue.pop(0)queue.append(data)return ret, queue

VECTOR_SIZE表示你的特征向量維度多少，我默認取了3維的。注意你采集數(shù)據(jù)程序中的VECTOR_SIZE要和其他程序中一致。用了一個隊列簡單實現(xiàn)，比較簡單不做贅述。

采集數(shù)據(jù)前準備

這些都與前面程序1中的一樣，不做贅述。

def eye_aspect_ratio(eye):# print(eye)A = distance.euclidean(eye[1], eye[5])B = distance.euclidean(eye[2], eye[4])C = distance.euclidean(eye[0], eye[3])ear = (A + B) / (2.0 * C)return earpwd = os.getcwd() model_path = os.path.join(pwd, 'model') shape_detector_path = os.path.join(model_path, 'shape_predictor_68_face_landmarks.dat')detector = dlib.get_frontal_face_detector() predictor = dlib.shape_predictor(shape_detector_path)cv2.namedWindow("frame", cv2.WINDOW_AUTOSIZE)cap = cv2.VideoCapture(0)# 對應特征點的序號 RIGHT_EYE_START = 37 - 1 RIGHT_EYE_END = 42 - 1 LEFT_EYE_START = 43 - 1 LEFT_EYE_END = 48 - 1

采集眼睛睜開時的樣本

print('Prepare to collect images with your eyes open') print('Press s to start collecting images.') print('Press e to end collecting images.') print('Press q to quit') flag = 0 txt = open('train_open.txt', 'wb') data_counter = 0 ear_vector = [] while(1):ret, frame = cap.read()key = cv2.waitKey(1)if key & 0xFF == ord("s"):print('Start collecting images.')flag = 1elif key & 0xFF == ord("e"):print('Stop collecting images.')flag = 0elif key & 0xFF == ord("q"):print('quit')breakif flag == 1:gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)rects = detector(gray, 0)for rect in rects:shape = predictor(gray, rect)points = face_utils.shape_to_np(shape)# convert the facial landmark (x, y)-coordinates to a NumPy array# points = shape.parts()leftEye = points[LEFT_EYE_START:LEFT_EYE_END + 1]rightEye = points[RIGHT_EYE_START:RIGHT_EYE_END + 1]leftEAR = eye_aspect_ratio(leftEye)rightEAR = eye_aspect_ratio(rightEye)# print('leftEAR = {0}'.format(leftEAR))# print('rightEAR = {0}'.format(rightEAR))ear = (leftEAR + rightEAR) / 2.0leftEyeHull = cv2.convexHull(leftEye)rightEyeHull = cv2.convexHull(rightEye)cv2.drawContours(frame, [leftEyeHull], -1, (0, 255, 0), 1)cv2.drawContours(frame, [rightEyeHull], -1, (0, 255, 0), 1)ret, ear_vector = queue_in(ear_vector, ear)if(len(ear_vector) == VECTOR_SIZE):# print(ear_vector)# input_vector = []# input_vector.append(ear_vector)txt.write(str(ear_vector))txt.write('\n')data_counter += 1print(data_counter)cv2.putText(frame, "EAR:{:.2f}".format(ear), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.imshow("frame", frame) txt.close()

這一部分與程序1中也很類似，都是讀取視頻流，識別人臉特征點，計算EAR。可以通過按鍵控制：按q鍵退出；按s鍵，開始采集信息，采集時請保證眼睛睜開，隨后會將計算得到的ear組成的特征向量寫入train_open.txt文件中；按e鍵，停止采集信息，眼睛閉合休息時用。
采集數(shù)據(jù)時真的是比較尷尬，因為眼睛對著攝像頭睜久了真的很累，我只能循環(huán)按s鍵和e鍵來斷斷續(xù)續(xù)地采集。

采集眼睛閉合時的樣本

print('-'*40) print('Prepare to collect images with your eyes close') print('Press s to start collecting images.') print('Press e to end collecting images.') print('Press q to quit') flag = 0 txt = open('train_close.txt', 'wb') data_counter = 0 ear_vector = [] while(1):ret, frame = cap.read()key = cv2.waitKey(1)if key & 0xFF == ord("s"):print('Start collecting images.')flag = 1elif key & 0xFF == ord("e"):print('Stop collecting images.')flag = 0elif key & 0xFF == ord("q"):print('quit')breakif flag == 1:gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)rects = detector(gray, 0)for rect in rects:shape = predictor(gray, rect)points = face_utils.shape_to_np(shape)# convert the facial landmark (x, y)-coordinates to a NumPy array# points = shape.parts()leftEye = points[LEFT_EYE_START:LEFT_EYE_END + 1]rightEye = points[RIGHT_EYE_START:RIGHT_EYE_END + 1]leftEAR = eye_aspect_ratio(leftEye)rightEAR = eye_aspect_ratio(rightEye)# print('leftEAR = {0}'.format(leftEAR))# print('rightEAR = {0}'.format(rightEAR))ear = (leftEAR + rightEAR) / 2.0leftEyeHull = cv2.convexHull(leftEye)rightEyeHull = cv2.convexHull(rightEye)cv2.drawContours(frame, [leftEyeHull], -1, (0, 255, 0), 1)cv2.drawContours(frame, [rightEyeHull], -1, (0, 255, 0), 1)ret, ear_vector = queue_in(ear_vector, ear)if(len(ear_vector) == VECTOR_SIZE):# print(ear_vector)# input_vector = []# input_vector.append(ear_vector)txt.write(str(ear_vector))txt.write('\n')data_counter += 1print(data_counter)cv2.putText(frame, "EAR:{:.2f}".format(ear), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.imshow("frame", frame) txt.close()

程序結(jié)構(gòu)跟前面采集眼睛睜開時的樣本的程序一模一樣，唯一的區(qū)別就是這次采集的數(shù)據(jù)不一樣。采集時可以通過鍵盤按鍵控制：按q鍵退出；按s鍵開始采集圖像，并計算EAR，并將特征向量保存到train_close.txt文件中，采集時注意眼睛閉合后再采集；按e結(jié)束采集。循環(huán)按s鍵和e鍵控制采集。

關閉攝像頭和窗口

cap.release() cv2.destroyAllWindows()

運行結(jié)果

采集結(jié)束后會生成兩個文本文件：train_close.txt、train_open.txt。兩個文件分別對應眼睛閉合和睜開時，EAR構(gòu)成的特征向量，即SVM的負樣本和正樣本。

這里的數(shù)據(jù)維度是前面自己定的，可以改變的。

訓練SVM

導入包

import numpy as np from sklearn import svm from sklearn.externals import joblib

解析數(shù)據(jù)

train = [] labels = []print('Reading train_open.txt...') line_ctr = 0 for txt_str in train_open_txt.readlines():temp = []# print(txt_str)datas = txt_str.strip()datas = datas.replace('[', '')datas = datas.replace(']', '')datas = datas.split(',')print(datas)for data in datas:# print(data)data = float(data)temp.append(data)# print(temp)train.append(temp)labels.append(0)print('Reading train_close.txt...') line_ctr = 0 temp = [] for txt_str in train_close_txt.readlines():temp = []# print(txt_str)datas = txt_str.strip()datas = datas.replace('[', '')datas = datas.replace(']', '')datas = datas.split(',')print(datas)for data in datas:# print(data)data = float(data)temp.append(data)# print(temp)train.append(temp)labels.append(1)for i in range(len(labels)):print("{0} --> {1}".format(train[i], labels[i]))train_close_txt.close() train_open_txt.close()# print(train) # print(labels)

從兩個txt文件中解析數(shù)據(jù)，提取出特征向量，放入列表train中，同時并把對應的標簽放入列表labels中。程序?qū)懙暮芎唵?#xff0c;就不注釋了。

訓練并保存模型

clf = svm.SVC(C=0.8, kernel='linear', gamma=20, decision_function_shape='ovo') clf.fit(train, labels) joblib.dump(clf, "ear_svm.m")

這里使用sickit-learn中的svm模塊，SVM本身實現(xiàn)還是比較復雜的，但是這里為了簡便就直接調(diào)用實現(xiàn)好的api函數(shù)了。
稍微介紹一下參數(shù)：
C=0.8表示軟間隔；
kernel='linear'表示采用線性核；kernel='rbf'時（default），為高斯核，gamma值越小，分類界面越連續(xù)；gamma值越大，分類界面越“散”；
decision_function_shape='ovr'，表示one v rest，即一個類別與其他類別劃分，多分類；decision_function_shape='ovo'，表示one v one，即一個類別與另一個類別劃分，二分類；

使用joblib模塊，我們會將模型文件保存到當前文件夾中。

測試準確率

一般來說我們還需要另外準備一部分測試集，來對我們的模型進行評估。考慮到篇幅可能過大，刪去了那一部分。直接取幾個樣本簡單測試一下，看下輸出。

print('predicting [[0.34, 0.34, 0.31]]') res = clf.predict([[0.34, 0.34, 0.31]]) print(res)print('predicting [[0.19, 0.18, 0.18]]') res = clf.predict([[0.19, 0.18, 0.18]]) print(res)

這種做法其實并不嚴謹，正確的做法是我們要像前面一樣另外采集一組測試集，或是從已經(jīng)采集好的數(shù)據(jù)中分出一部分作為測試集，隨后計算準確率進行評估。考慮到篇幅限制，這里只給出了思路，程序?qū)崿F(xiàn)無非就是那前面的代碼改一改。

進行實時檢測

這一段的代碼跟程序1很類似，唯一的區(qū)別就是檢測部分采用了SVM。下面就直接貼代碼了。
注意一點，就是所有的程序中的VECTOR_SIZE一定要保持一致，因為這是你的特征向量的維度。

程序?qū)崿F(xiàn)

#coding=utf-8 import numpy as np import cv2 import dlib from scipy.spatial import distance import os from imutils import face_utils from sklearn import svm from sklearn.externals import joblibVECTOR_SIZE = 3 def queue_in(queue, data):ret = Noneif len(queue) >= VECTOR_SIZE:ret = queue.pop(0)queue.append(data)return ret, queuedef eye_aspect_ratio(eye):# print(eye)A = distance.euclidean(eye[1], eye[5])B = distance.euclidean(eye[2], eye[4])C = distance.euclidean(eye[0], eye[3])ear = (A + B) / (2.0 * C)return earpwd = os.getcwd() model_path = os.path.join(pwd, 'model') shape_detector_path = os.path.join(model_path, 'shape_predictor_68_face_landmarks.dat')detector = dlib.get_frontal_face_detector() predictor = dlib.shape_predictor(shape_detector_path)# 導入模型 clf = joblib.load("ear_svm.m")EYE_AR_THRESH = 0.3# EAR閾值 EYE_AR_CONSEC_FRAMES = 3# 當EAR小于閾值時，接連多少幀一定發(fā)生眨眼動作# 對應特征點的序號 RIGHT_EYE_START = 37 - 1 RIGHT_EYE_END = 42 - 1 LEFT_EYE_START = 43 - 1 LEFT_EYE_END = 48 - 1frame_counter = 0 blink_counter = 0 ear_vector = [] cap = cv2.VideoCapture(1) while(1):ret, img = cap.read()gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)rects = detector(gray, 0)for rect in rects:print('-'*20)shape = predictor(gray, rect)points = face_utils.shape_to_np(shape)# convert the facial landmark (x, y)-coordinates to a NumPy array# points = shape.parts()leftEye = points[LEFT_EYE_START:LEFT_EYE_END + 1]rightEye = points[RIGHT_EYE_START:RIGHT_EYE_END + 1]leftEAR = eye_aspect_ratio(leftEye)rightEAR = eye_aspect_ratio(rightEye)print('leftEAR = {0}'.format(leftEAR))print('rightEAR = {0}'.format(rightEAR))ear = (leftEAR + rightEAR) / 2.0leftEyeHull = cv2.convexHull(leftEye)rightEyeHull = cv2.convexHull(rightEye)cv2.drawContours(img, [leftEyeHull], -1, (0, 255, 0), 1)cv2.drawContours(img, [rightEyeHull], -1, (0, 255, 0), 1)ret, ear_vector = queue_in(ear_vector, ear)if(len(ear_vector) == VECTOR_SIZE):print(ear_vector)input_vector = []input_vector.append(ear_vector)res = clf.predict(input_vector)print(res)if res == 1:frame_counter += 1else:if frame_counter >= EYE_AR_CONSEC_FRAMES:blink_counter += 1frame_counter = 0cv2.putText(img, "Blinks:{0}".format(blink_counter), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.putText(img, "EAR:{:.2f}".format(ear), (300, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,0,255), 2)cv2.imshow("Frame", img)if cv2.waitKey(1) & 0xFF == ord("q"):breakcap.release() cv2.destroyAllWindows()

運行結(jié)果

注：論文的原作者說使用眼長寬比（第 N-6 幀到第 N + 6幀）的13維特征向量，然后將該特征向量饋送到線性SVM分類，可以實現(xiàn)更好的效果。我嘗試了13維的情況，發(fā)現(xiàn)延遲太厲害，在你眨眼之后，要過一下子才能檢測到。因為我們是一幀一幀計算EAR然后才送入特征向量更新的，在攝像頭幀數(shù)低時，可能還是要使用低一些維度的特征向量，所以在程序中我是用了3維的特征向量。

后記

本來只是看了下論文，打算玩一下眨眼檢測的。結(jié)果在調(diào)試程序過程中還是遇到了不少坑，不知不覺就花了不少時間。尤其是選擇特征向量維度時，我試了3維、6維、13維等的特征向量，每次都是專門采集了1000多組正樣本和1000多組負樣本，訓練結(jié)果是3維的比較好。更大的原因還是我的攝像頭幀率太低吧，理論上來說，維度更高能得到更高的準確率，但是由于幀率低導致特征向量更新太慢，以致于延遲過長。

完整工程下載：http://download.csdn.net/download/hongbin_xu/10200655

總結(jié)

以上是生活随笔為你收集整理的python dlib学习（十一）：眨眼检测的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

Python
Dlib

上一篇：三维重建学习(3)：张正友相机标定推导
下一篇： Caffe官方教程翻译（9）：Multi