當(dāng)前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

深度学习：在图像上找到手势_使用深度学习的人类情绪和手势检测器：第1部分

發(fā)布時(shí)間：2023/12/15 pytorch 23 豆豆

生活随笔收集整理的這篇文章主要介紹了深度学习：在图像上找到手势_使用深度学习的人类情绪和手势检测器：第1部分小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

深度學(xué)習(xí)：在圖像上找到手勢(shì)

情感手勢(shì)檢測(cè) (Emotion Gesture Detection)

Has anyone ever wondered looking at someone and tried to analyze what kind of emotion they had or what kind of gesture they were trying to perform but you ended up being confused. Maybe once you tried to approach a baby which looked like this:

有沒有人想知道看著某人并試圖分析他們有什么樣的情感或他們?cè)噲D執(zhí)行哪種手勢(shì)，但最終卻感到困惑。也許一旦您嘗試接近這樣的嬰兒：

You thought it likes you and just wants a cuddle and then you ended up carrying it and then this happened!

您以為它喜歡您，只想要一個(gè)擁抱，然后您最終攜帶它，然后發(fā)生了！

Source: Brytny.com-Unsplash資料來源：Brytny.com-Unsplash

Oops! That did not work out as planned. But real-life uses may not be as simple as the above situation and may require more precise human emotion analysis as well as gesture analysis. This field of application is especially useful in any department where customer satisfaction or just knowing what the customer wants is extremely important.

糟糕！那沒有按計(jì)劃進(jìn)行。但是現(xiàn)實(shí)生活中的使用可能不像上述情況那么簡(jiǎn)單，并且可能需要更精確的人類情感分析以及手勢(shì)分析。在客戶滿意度或僅了解客戶需求至關(guān)重要的任何部門中，此應(yīng)用領(lǐng)域特別有用。

Today we will be uncovering a couple of Deep Learning models which does exactly that. The models we will be developing today can identify some human emotions as well as a few gestures. We will be trying to identify 6 emotions namely angry, happy, neutral, fear, sad and surprise. We will also be identifying 4 types of gestures which are loser, victory, super, and punch. We will be performing a real-time performance and we will be getting a real-time vocal response from the model.

今天，我們將揭露一些可以做到這一點(diǎn)的深度學(xué)習(xí)模型。我們今天將要開發(fā)的模型可以識(shí)別一些人類情感以及一些手勢(shì)。我們將嘗試識(shí)別6種情緒，即憤怒，快樂，中立，恐懼，悲傷和驚奇。我們還將確定4種手勢(shì)類型，即失敗者，勝利，超級(jí)手勢(shì)和拳打手勢(shì)。我們將進(jìn)行實(shí)時(shí)表演，并從模型中獲得實(shí)時(shí)的聲音響應(yīng)。

The emotions model will be built using convolution neural networks from scratch and for finger gestures, I will be using transfer learning with VGG-16 architecture and adding custom layers to improve the performance of the model to get better and higher accuracy. The emotion analysis and finger gestures will provide an appropriate vocal as well as text response for each of the actions. The metric we will be using is accuracy and we will try to achieve a validation accuracy of at least 50% for the emotions model-1, over 65% for emotions model-2, and a validation accuracy of over 90% for the gestures model.

情緒模型將從頭開始使用卷積神經(jīng)網(wǎng)絡(luò)構(gòu)建，并用于手指手勢(shì)，我將在VGG-16架構(gòu)中使用轉(zhuǎn)移學(xué)習(xí)，并添加自定義圖層以改善模型的性能以獲得更好和更高的準(zhǔn)確性。情緒分析和手指手勢(shì)將為每個(gè)動(dòng)作提供適當(dāng)?shù)穆曇艉臀谋卷憫?yīng)。我們將使用的度量標(biāo)準(zhǔn)是準(zhǔn)確性，我們將努力使情感模型1的驗(yàn)證精度至少達(dá)到50％，情感模型2的驗(yàn)證精度至少達(dá)到65％，手勢(shì)模型的驗(yàn)證精度達(dá)到90％以上。

數(shù)據(jù)集： (Datasets:)

Let us now look at the dataset choices we have available to us.

現(xiàn)在讓我們看一下可供選擇的數(shù)據(jù)集。

1. Kaggle’s fer2013 dataset — The dataset is an open-source dataset that contains 35,887 grayscale images of various emotions which are all labeled and are of size 48x48. The Facial Expression Recognition Dataset was published during the International Conference on Machine Learning (ICML). This Kaggle dataset will be the more primary and important dataset that will be used for emotion analysis in this case study.

1. Kaggle的fer2013數(shù)據(jù)集 —該數(shù)據(jù)集是一個(gè)開源數(shù)據(jù)集，其中包含35887張各種情緒的灰度圖像，這些圖像均已標(biāo)記且大小為48x48。面部表情識(shí)別數(shù)據(jù)集在國(guó)際機(jī)器學(xué)習(xí)大會(huì)(ICML)期間發(fā)布。該Kaggle數(shù)據(jù)集將是更主要和重要的數(shù)據(jù)集，在本案例研究中將用于情感分析。

The dataset is given in an excel sheet in .csv format and the pixels are to be extracted and after extraction of the pixels and pre-processing of data, the dataset looks like the image posted below:

數(shù)據(jù)集以.csv格式在excel工作表中給出，并且要提取像素，并且在提取像素并進(jìn)行數(shù)據(jù)預(yù)處理之后，數(shù)據(jù)集看起來像下面發(fā)布的圖像：

Source: Image by author資料來源：作者提供的圖片

(Refer this link in case the first link is not working).

(如果第一個(gè)鏈接不起作用，請(qǐng)參考此鏈接 )。

2. The first affect in the wild challenge — This can be a secondary dataset considered for this case study. The First Affect-in-the-wild Challenge is a design on state of the art Deep Neural Architectures including the AffWildNet which allows us to exploit the AffWild database for learning features, which can be used as priors for achieving the best performances for dimensional and categorical emotion recognition. In the download link, we will find a tar.gz file, which contains 4 folders named: videos, annotations, boxes, and landmarks. However, for our emotions recognition model, we will be strictly considering only the fer2013 dataset.

2. 野外挑戰(zhàn)中的第一個(gè)影響 -這可以是本案例研究考慮的第二個(gè)數(shù)據(jù)集。 “第一個(gè)自然情感挑戰(zhàn)”是對(duì)包括AffWildNet在內(nèi)的最先進(jìn)的深度神經(jīng)架構(gòu)進(jìn)行的設(shè)計(jì)，該設(shè)計(jì)使我們能夠利用AffWild數(shù)據(jù)庫獲取學(xué)習(xí)功能，可以將其用作獲得最佳尺寸和性能的先驗(yàn)條件。分類情感識(shí)別。在下載鏈接中，我們將找到一個(gè)tar.gz文件，該文件包含4個(gè)文件夾，名稱分別為：視頻，注釋，框和地標(biāo)。但是，對(duì)于我們的情緒識(shí)別模型，我們將嚴(yán)格只考慮fer2013數(shù)據(jù)集。

3. ASL Alphabet dataset — This will be the primary dataset for finger gesture detection. The “American Sign Language” Alphabet dataset consists of the collection of images of alphabets from the American Sign Language, separated into 29 folders which represent the various classes. The training data set contains 87,000 images which are 200x200 pixels.

3. ASL字母數(shù)據(jù)集 -這將是手指手勢(shì)檢測(cè)的主要數(shù)據(jù)集。 “美國(guó)手語”字母表數(shù)據(jù)集由來自美國(guó)手語的字母表圖像集合組成，分為29個(gè)文件夾，分別代表各個(gè)類別。訓(xùn)練數(shù)據(jù)集包含87,000張200x200像素的圖像。

There are 29 classes, of which 26 are for the letters A-Z and 3 classes for SPACE, DELETE, and NOTHING. These 3 classes are very helpful in real-time applications and classification. However, for our gesture recognition, we will be using 4 classes from A-Z from this data for some of the appropriate required actions with the fingers. The Model will be trained to recognize 4 of these specific hand gestures which are A (punch), F (Super), L (Loser), and V (Victory). We will then train our model to recognize these gestures and give an appropriate vocal response for each of the following accordingly.

有29個(gè)類，其中26個(gè)是字母AZ，3個(gè)類是SPACE，DELETE和NOTHING。這3個(gè)類在實(shí)時(shí)應(yīng)用程序和分類中非常有幫助。但是，為了進(jìn)行手勢(shì)識(shí)別，我們將使用來自該數(shù)據(jù)的AZ中的4個(gè)類，用手指進(jìn)行一些適當(dāng)?shù)谋匾僮鳌?該模型將接受訓(xùn)練以識(shí)別其中4種特定手勢(shì)，即A(打Kong)，F(超級(jí))，L(失敗者)和V(勝利)。然后，我們將訓(xùn)練我們的模型以識(shí)別這些手勢(shì)，并相應(yīng)地對(duì)以下每個(gè)手勢(shì)做出適當(dāng)?shù)穆曇繇憫?yīng)。

4. Custom Datasets — For both of these i.e. emotion analysis and finger gesture detection we can also use custom datasets of yourself or friends or even family for the recognition of various sentiments as well as hand gestures. The images taken will be grayscaled and then resized according to our requirements.

4.自定義數(shù)據(jù)集-對(duì)于這兩種情感分析和手指手勢(shì)檢測(cè)，我們還可以使用您自己或朋友甚至家人的自定義數(shù)據(jù)集來識(shí)別各種情緒以及手勢(shì)。拍攝的圖像將被灰度化，然后根據(jù)我們的要求調(diào)整大小。

預(yù)處理： (Pre-processing:)

For our emotions model, we will be using Kaggle’s fer2013 dataset and we will be using ASL dataset for gesture identification. We can begin performing the required pre-processing required for the models. For the emotions dataset, we will look at the libraries required for pre-processing.

對(duì)于我們的情感模型，我們將使用Kaggle的fer2013數(shù)據(jù)集，并將使用ASL數(shù)據(jù)集進(jìn)行手勢(shì)識(shí)別。我們可以開始執(zhí)行模型所需的預(yù)處理。對(duì)于情緒數(shù)據(jù)集，我們將研究預(yù)處理所需的庫。

Pandas is a fast, flexible open-source data analysis library that we will be using for accessing the .csv files.

Pandas是一個(gè)快速，靈活的開源數(shù)據(jù)分析庫，我們將使用它來訪問.csv文件。

Numpy is used for processing on multi-dimensional arrays. For our Data Pre-Processing, we will use numpy for making an array of the pixel features.

Numpy用于處理多維數(shù)組。對(duì)于我們的數(shù)據(jù)預(yù)處理，我們將使用numpy制作像素特征數(shù)組。

The OS module provides us a way to interact with the operating system.

操作系統(tǒng)模塊為我們提供了一種與操作系統(tǒng)進(jìn)行交互的方式。

The cv2 module is the computer vision/open-cv module we will be using to convert the numpy arrays of pixels into visual images.

cv2模塊是計(jì)算機(jī)視覺/ open-cv模塊，我們將使用它將像素的numpy數(shù)組轉(zhuǎn)換為視覺圖像。

tqdm is an optional library that we can use for visualizing the speed of processing and the number of bits/second.

tqdm是一個(gè)可選庫，我們可以使用它來可視化處理速度和每秒位數(shù)。

Now let us read the fer2013.csv file using pandas.

現(xiàn)在，讓我們使用熊貓讀取fer2013.csv文件。

We read the fer2013.csv file using pandas. The fer2013 is the facial expressions recognition .csv file from Kaggle. In the .csv file we have 3 main columns — emotion, pixels and Usage. The emotion column consists of labels 0–6. The pixels row contains the pixel images in an array format. The Usage column contains of Training, Public Test and Private Test. Let us have a closer look at this.

我們使用熊貓讀取了fer2013.csv文件。 fer2013是來自Kaggle的面部表情識(shí)別.csv文件。在.csv文件中，我們有3個(gè)主要列-情感，像素和使用情況。情緒欄包含標(biāo)簽0–6。像素行包含陣列格式的像素圖像。 “用法”列包含“培訓(xùn)”，“公共測(cè)試”和“私人測(cè)試”。讓我們仔細(xì)看看。

The Labels are in the range 0–6 where:

標(biāo)簽的范圍是0–6，其中：

0 = Angry, 1 = Disgust, 2 = Fear, 3 = Happy,

0 =生氣，1 =厭惡，2 =恐懼，3 =開心，

4 = Sad, 5 = Surprise, 6 = Neutral.

4 =悲傷，5 =驚喜，6 =中立。

The pixels consists of the pixel values which we can convert into an array form and then use open cv module cv2 to convert the pixel array into an actual image we can visualize. The Usage column consists of Training, PublicTest, and PrivateTest. We will use the Training to store the locations of the training dataset and the remaining PublicTest and PrivateTest will be used for storing the images in a validation folder.

像素由像素值組成，我們可以將其轉(zhuǎn)換為數(shù)組形式，然后使用開放式cv模塊cv2將像素?cái)?shù)組轉(zhuǎn)換為可以可視化的實(shí)際圖像。 “使用情況”列包括培訓(xùn)，PublicTest和PrivateTest。我們將使用培訓(xùn)來存儲(chǔ)培訓(xùn)數(shù)據(jù)集的位置，其余的PublicTest和PrivateTest將用于將圖像存儲(chǔ)在驗(yàn)證文件夾中。

Now let us extract these images accordingly. In the below code blocks I will be showing for one class for both train and validation. In this code block, we will be extracting the images from the pixel’s column and then we will be creating a train and validation folder which can be tracked from the Usage column. For each of the train and validation directories, we will be creating all the 7 folders which will contain the Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral.

現(xiàn)在讓我們相應(yīng)地提取這些圖像。在下面的代碼塊中，我將顯示針對(duì)訓(xùn)練和驗(yàn)證的一門課程。在此代碼塊中，我們將從像素的列中提取圖像，然后創(chuàng)建可從“用法”列中進(jìn)行跟蹤的訓(xùn)練和驗(yàn)證文件夾。對(duì)于每個(gè)火車和驗(yàn)證目錄，我們將創(chuàng)建所有7個(gè)文件夾，其中包含“憤怒”，“厭惡”，“恐懼”，“快樂”，“悲傷”，“驚喜”和“中立”。

We are looping through the dataset and we are converting the pixels from string to float and then storing all the float values in a numpy array. We are converting the image of size 48x48 which is our desired image size. (This step is optional because the given pixels are already of the desired size.)

我們正在遍歷數(shù)據(jù)集，并將像素從字符串轉(zhuǎn)換為float，然后將所有float值存儲(chǔ)在numpy數(shù)組中。我們正在轉(zhuǎn)換大小為48x48的圖像，這是我們所需的圖像尺寸。 (此步驟是可選的，因?yàn)榻o定的像素已經(jīng)具有所需的大小。)

If the Usage is given as Training then we make a train directory as well as the separate directories for each of the emotions. We store the images in the right emotion directory which can be found by the labels of the emotion column.

如果“用法”作為“培訓(xùn)”給出，那么我們將為每個(gè)情感創(chuàng)建一個(gè)培訓(xùn)目錄以及單獨(dú)的目錄。我們將圖像存儲(chǔ)在正確的情感目錄中，該目錄可通過情感列的標(biāo)簽找到。

These steps are similarly repeated for the validation directory for which we consider the Usage values as PublicTest and PrivateTest. The emotions are categorized by the labels from the emotion column similar to how the train directory works.

對(duì)于驗(yàn)證目錄，我們將類似地重復(fù)執(zhí)行這些步驟，我們將其“用法”值視為PublicTest和PrivateTest。情緒由“情緒”列中的標(biāo)簽進(jìn)行分類，類似于火車目錄的工作方式。

After this step, all the Data Pre-Processing for the training of the emotions is now completed and we have successfully extracted all the images required for the emotions recognition model and now we can proceed with the further steps. Luckily we don’t have to do a lot of pre-processing for the gestures data. Download the ASL dataset and then create the train1 and validation1 folders as below:

完成此步驟之后，用于情感訓(xùn)練的所有數(shù)據(jù)預(yù)處理均已完成，并且我們已經(jīng)成功提取了情感識(shí)別模型所需的所有圖像，現(xiàn)在我們可以繼續(xù)執(zhí)行其他步驟。幸運(yùn)的是，我們無需對(duì)手勢(shì)數(shù)據(jù)進(jìn)行大量預(yù)處理。下載ASL數(shù)據(jù)集，然后按如下所示創(chuàng)建train1和validation1文件夾：

The train1 and validation1 directories have 4 sub-directories labeled as shown. We will use the letter ‘L’ for loser, ‘A’ for punch, ‘F’ for super, and ‘V’ for Victory. Summarizing the letters and gestures below:

train1和validation1目錄具有4個(gè)子目錄，如圖所示。我們將使用字母“ L”代表失敗者，“ A”代表拳，“ F”代表超級(jí)，“ V”代表勝利。總結(jié)以下字母和手勢(shì)：

L = Loser | A = Punch | F = Super | V = Victory

L =失敗者| A =打Kong| F =超級(jí)| V =勝利

The ASL data set contains 3000 images for each letter. So we will use the first 2400 images for the training process and the rest 600 images for validation purposes. This way we are splitting the data into 80:20, train: validation ratio. Paste the first 2400 images of each of the alphabets ‘L’, ‘A’, ‘F,’ and ‘V’ into their respective sub-directories in the train1 folder and paste the remaining 600 images of the alphabets into their respective sub-directories in the validation1 folder.

ASL數(shù)據(jù)集每個(gè)字母包含3000張圖像。因此，我們將使用前2400張圖像進(jìn)行訓(xùn)練，將其余600張圖像用于驗(yàn)證。通過這種方式，我們將數(shù)據(jù)分為80:20，即訓(xùn)練：驗(yàn)證比率。將每個(gè)字母“ L”，“ A”，“ F”和“ V”的前2400張圖像粘貼到train1文件夾中各自的子目錄中，并將其余600張字母圖像粘貼到其各自的子目錄中。驗(yàn)證文件夾中的目錄。

探索性數(shù)據(jù)分析(EDA)： (EXPLORATORY DATA ANALYSIS (EDA):)

Before starting to train our emotion and gesture models let us look at images and the overall data we have in our hands after the pre-processing step. Firstly, we will be looking into EDA for emotions data and then we will look into the gestures data. Starting with the emotions data, we will plot a bar graph and scatter plot to see if the dataset is balanced or fairly balanced or totally unbalanced. We will be referring to the train directory.

在開始訓(xùn)練我們的情緒和手勢(shì)模型之前，讓我們先看一下圖像和預(yù)處理步驟之后手中的全部數(shù)據(jù)。首先，我們將研究EDA中的情感數(shù)據(jù)，然后研究手勢(shì)數(shù)據(jù)。從情緒數(shù)據(jù)開始，我們將繪制條形圖和散點(diǎn)圖，以查看數(shù)據(jù)集是平衡的還是相當(dāng)平衡的或完全不平衡的。我們將參考火車目錄。

條狀圖： (Bar Graph:)

散點(diǎn)圖： (Scatter Plot:)

We can notice that this is a fairly balanced model except the images for “disgust” is comparatively less. For our first emotions model, we will be dropping this emotion completely and we will only consider the remaining the remaining 6 emotions. Now let us look at how the train and validation directories for our emotions dataset look like.

我們可以注意到，這是一個(gè)相當(dāng)均衡的模型，只是“厭惡”的圖像相對(duì)較少。對(duì)于我們的第一個(gè)情緒模型，我們將完全刪除該情緒，并且僅考慮剩余的6種情緒。現(xiàn)在讓我們看一下情緒數(shù)據(jù)集的訓(xùn)練目錄和驗(yàn)證目錄。

培養(yǎng)： (Train:)

The Bar Graph and the scatter plot of the train data are as shown below:

火車數(shù)據(jù)的條形圖和散點(diǎn)圖如下所示：

The Train images of each of the dataset is as shown below:

每個(gè)數(shù)據(jù)集的訓(xùn)練圖像如下所示：

驗(yàn)證： (Validation:)

The Bar Graph and the scatter plot of the train data are as shown below:

火車數(shù)據(jù)的條形圖和散點(diǎn)圖如下所示：

The Validation images of each of the dataset is as shown below:

每個(gè)數(shù)據(jù)集的驗(yàn)證圖像如下所示：

With our emotions dataset analyzed, we can move on to the gestures dataset and perform a similar analysis as above and understand the gestures dataset as well. Since the dataset for our gestures data for both train and validation is completely balanced it is easier to analyze them. Both the train and validation data for gestures dataset will be analyzed in the next part and similar images will be displayed as well.

通過分析情緒數(shù)據(jù)集，我們可以繼續(xù)進(jìn)行手勢(shì)數(shù)據(jù)集并執(zhí)行與上述類似的分析，并理解手勢(shì)數(shù)據(jù)集。由于用于訓(xùn)練和驗(yàn)證的手勢(shì)數(shù)據(jù)的數(shù)據(jù)集是完全平衡的，因此更易于分析它們。手勢(shì)數(shù)據(jù)集的訓(xùn)練數(shù)據(jù)和驗(yàn)證數(shù)據(jù)將在下一部分中進(jìn)行分析，并且還將顯示類似的圖像。

This completes our exploratory data analysis for emotions model. We can now start building our models for emotion recognition. Firstly, we will build an emotions model using image data augmentation and then we will build the gestures model. Later, we will build a second emotions model directly from the .csv file and try to obtain a higher accuracy. In the end, we will create a final model to run the entire script.

這樣就完成了我們對(duì)情緒模型的探索性數(shù)據(jù)分析。現(xiàn)在，我們可以開始建立情感識(shí)別模型。首先，我們將使用圖像數(shù)據(jù)增強(qiáng)來構(gòu)建情感模型，然后將構(gòu)建手勢(shì)模型。稍后，我們將直接從.csv文件構(gòu)建第二個(gè)情緒模型，并嘗試獲得更高的準(zhǔn)確性。最后，我們將創(chuàng)建一個(gè)最終模型以運(yùn)行整個(gè)腳本。

情感模型1： (Emotion Model-1:)

In this model-1, we will be using techniques of data augmentation. The formal definition of data augmentation is as follows-

在這個(gè)model-1中，我們將使用數(shù)據(jù)擴(kuò)充技術(shù)。數(shù)據(jù)擴(kuò)充的正式定義如下：

Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data. Data augmentation techniques such as cropping, padding, and horizontal flipping are commonly used to train large neural networks.

數(shù)據(jù)擴(kuò)充是一種策略，使從業(yè)人員可以顯著增加可用于訓(xùn)練模型的數(shù)據(jù)的多樣性，而無需實(shí)際收集新數(shù)據(jù) 。諸如裁剪，填充和水平翻轉(zhuǎn)之類的數(shù)據(jù)增強(qiáng)技術(shù)通常用于訓(xùn)練大型神經(jīng)網(wǎng)絡(luò)。

Reference: bair.berkeley.edu

參考： bair.berkeley.edu

We will now proceed to import the required libraries and specify some parameters which will be needed for training the model.

現(xiàn)在，我們將繼續(xù)導(dǎo)入所需的庫，并指定一些訓(xùn)練模型所需的參數(shù)。

Import all the important required Deep Learning Libraries to train the emotions model.Keras is an Application Programming Interface (API) that can run on top of Tensorflow.Tensorflow will be the main deep learning module we will use to build our deep learning model.The ImageDataGenerator is used for Data augmentation where the model can see more copies of the model. Data Augmentation is used for creating replications of the original images and using those transformations in each epoch. The layers for training which will be used are as follows:1. Input = The input layer in which we pass the input shape.2. Conv2D = The Convolutional layer combined with Input to provide an output of tensors3. Maxpool2D = Downsampling the Data from the convolutional layer.4. Batch normalization = It is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.5. Dropout = Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly and this prevents over-fitting.6. Dense = Fully Connected layers.7. Flatten = Flatten the entire structure to a 1-D array.The Models can be built in a model like structure or can be built in a sequential manner.Use of l2 regularization for fine-tuning.The optimizer used will be Adam as it performs better than the other optimizers on this model.Numpy for numerical array-like operations.pydot_ng and Graphviz are used for making plots.We are also importing the os module to make it compatible with the Windows environment.

導(dǎo)入所有必需的重要深度學(xué)習(xí)庫以訓(xùn)練情緒模型.Keras是一個(gè)可在Tensorflow之上運(yùn)行的應(yīng)用程序編程接口(API)，Tensorflow將是我們用于構(gòu)建深度學(xué)習(xí)模型的主要深度學(xué)習(xí)模塊。 ImageDataGenerator用于數(shù)據(jù)增強(qiáng)，其中模型可以查看模型的更多副本。數(shù)據(jù)增強(qiáng)用于創(chuàng)建原始圖像的復(fù)制并在每個(gè)時(shí)期中使用這些轉(zhuǎn)換。將使用的訓(xùn)練層如下：1.。輸入 =我們?cè)谄渲袀鬟f輸入形狀的輸入層2。 Conv2D =卷積層與Input組合以提供張量的輸出3。 Maxpool2D =從卷積層對(duì)數(shù)據(jù)進(jìn)行下采樣4。 批次歸一化 =這是一種用于訓(xùn)練非常深的神經(jīng)網(wǎng)絡(luò)的技術(shù)，該技術(shù)可以將每個(gè)微型批次的輸入標(biāo)準(zhǔn)化。這具有穩(wěn)定學(xué)習(xí)過程并顯著減少訓(xùn)練深度網(wǎng)絡(luò)所需的訓(xùn)練時(shí)期的數(shù)量的作用。5。 輟學(xué) =輟學(xué)是一種在訓(xùn)練過程中忽略隨機(jī)選擇的神經(jīng)元的技術(shù)。它們是隨機(jī)“脫落”的，這可以防止過擬合6。密集 =完全連接的圖層7。 Flatten =將整個(gè)結(jié)構(gòu)展平為一維數(shù)組，這些模型可以以類似結(jié)構(gòu)的模型構(gòu)建，也可以以順序方式構(gòu)建。使用l2正則化進(jìn)行微調(diào)，執(zhí)行時(shí)使用的優(yōu)??化器將是Adam比該模型上的其他優(yōu)化器更好.Numpy用于類似數(shù)字?jǐn)?shù)組的操作.pydot_ng和Graphviz用于繪制圖;我們還導(dǎo)入了os模塊以使其與Windows環(huán)境兼容。

num_classes defines the number of classes we have to predict which are namely Angry, Fear, Happy, Neutral, Surprise, and Neutral.From the exploratory Data Analysis we know that The Dimensions of the image are: Image Height = 48 pixels Image Width = 48 pixels Number of classes = 1 because the images are gray-scale images.We will consider a batch size of 32 for the training of the image augmentation.

num_classes定義了我們必須預(yù)測(cè)的類別數(shù)，即憤怒，恐懼，快樂，中立，驚奇和中立。根據(jù)探索性數(shù)據(jù)分析，我們知道圖像的尺寸為：圖像高度= 48像素圖像寬度= 48像素類別數(shù)= 1，因?yàn)閳D像是灰度圖像。我們將考慮批量大小為32的圖像增強(qiáng)訓(xùn)練。

Specify the train and the validation directory for the stored images.train_dir is the directory that will contain the set of images for training.validation_dir is the directory that will contain the set of validation images.

為存儲(chǔ)的圖像指定火車和驗(yàn)證目錄.train_dir是將包含用于訓(xùn)練的圖像集的目錄.validation_dir是將包含驗(yàn)證圖像集的目錄。

數(shù)據(jù)擴(kuò)充 (DATA AUGMENTATION:)

We will look at the data augmentation code now:

我們現(xiàn)在來看一下數(shù)據(jù)增強(qiáng)代碼：

The ImageDataGenerator is used for data augmentation of images. We will be replicating and making copies of the transformations of theoriginal images. The Keras Data Generator will use the copies andnot the original ones. This will be useful for training at each epoch. We will be rescaling the image and updating all the parameters to suit our model. The parameters are as follows:1. rescale = Rescaling by 1./255 to normalize each of the pixel values2. rotation_range = specifies the random range of rotation3. shear_range = Specifies the intensity of each angle in the counter-clockwise range.4. zoom_range = Specifies the zoom range. 5. width_shift_range = specify the width of the extension.6. height_shift_range = Specify the height of the extension.7. horizontal_flip = Flip the images horizontally.8. fill_mode = Fill according to the closest boundaries. train_datagen.flow_from_directory Takes the path to a directory & generates batches of augmented data. The callable properties are as follows:1. train dir = Specifies the directory where we have stored the image data.2. color_mode = Important feature which we need to specify how our images are categorized i.e. grayscale or RGB format. The default is RGB.3. target_size = The Dimensions of the image.4. batch_size = The number of batches of data for the flow operation.5. class_mode = Determines the type of label arrays that are returned.“categorical” will be 2D one-hot encoded labels.6. shuffle = shuffle: Whether to shuffle the data (default: True) If set to False, sorts the data in alphanumeric order.

ImageDataGenerator用于圖像的數(shù)據(jù)擴(kuò)充。我們將復(fù)制和復(fù)制原始圖像的轉(zhuǎn)換。 Keras數(shù)據(jù)生成器將使用副本而不是原始副本。這對(duì)于每個(gè)時(shí)期的訓(xùn)練都是有用的。我們將重新縮放圖像并更新所有參數(shù)以適合我們的模型。主要參數(shù)如下：1。 重新調(diào)整 =重標(biāo)度由1./255歸一化每個(gè)像素values2的。 rotation_range =指定旋轉(zhuǎn)的隨機(jī)范圍3。 shear_range =指定逆時(shí)針范圍內(nèi)每個(gè)角度的強(qiáng)度4。 zoom_range =指定縮放范圍。 5. width_shift_range =指定擴(kuò)展名的寬度。6 。 height_shift_range =指定擴(kuò)展的高度7。 horizo??ntal_flip = 水平翻轉(zhuǎn)圖像8。 fill_mode =根據(jù)最接近的邊界填充。 train_datagen.flow_from_directory取得目錄的路徑并生成批次的擴(kuò)充數(shù)據(jù)。可調(diào)用的屬性如下：1。 train dir =指定我們存儲(chǔ)圖像數(shù)據(jù)的目錄2。 color_mode =重要功能，我們需要指定圖像的分類方式，即灰度或RGB格式。默認(rèn)值為RGB.3。 target_size =圖片的尺寸4。 batch_size =流操作的數(shù)據(jù)批數(shù)5。 class_mode =確定返回的標(biāo)簽數(shù)組的類型?！?categorical”將是二維一鍵編碼的標(biāo)簽。6。 shuffle = shuffle：是否隨機(jī)播放數(shù)據(jù)(默認(rèn)值：True)如果設(shè)置為False，則按字母數(shù)字順序?qū)?shù)據(jù)進(jìn)行排序。

情緒模型-1： (EMOTIONS MODEL-1:)

Now we will proceed towards building the model.

現(xiàn)在，我們將繼續(xù)構(gòu)建模型。

We will be using a sequential type of architecture for our model. Our Sequential model will have a total of 5 blocks i.e. three convolutional blocks, one fully connected layer, and one output layer.We will have 3 convolutional blocks with filters of increasing size like 32, 64, and 128 respectively. The kernel_size will be (3,3) and the kernel_initializer will be he_normal. We can also use a kernel_regularizer with l2 normalization. Our Preferred choice of activation is elu because it usually performs better on images. The Input shape will be the same as the size of each of our train and validation images.The Batch Normalization layer — Batch normalization is a technique for improving the speed, performance, and stability of artificial neural networks. Max pooling is used to downsample the data. The Dropout layer is used for the prevention of over-fitting.The fully connected block consists of a Dense layer of 64 filters and a batch normalization followed by a dropout layer. Before passing through the Dense layer the data is flattened to match the dimensions.Finally, the output layer consists of a Dense layer with a softmax activation to give probabilities according to the num_classes which represents the number of predictions to be made.

我們將為模型使用順序類型的體系結(jié)構(gòu)。我們的序列模型總共有5個(gè)塊，即3個(gè)卷積塊，1個(gè)完全連接層和1個(gè)輸出層;我們將有3個(gè)卷積塊，其濾波器的大小分別為32、64和128。 kernel_size將為(3,3)，kernel_initializer將為he_normal。我們還可以使用帶有l(wèi)2歸一化的kernel_regularizer。我們首選的激活方式是elu，因?yàn)樗ǔＴ趫D像上表現(xiàn)更好。輸入的形狀將與我們每個(gè)訓(xùn)練和驗(yàn)證圖像的大小相同。批歸一化層—批歸一化是一種用于提高人工神經(jīng)網(wǎng)絡(luò)的速度，性能和穩(wěn)定性的技術(shù)。最大池用于減少數(shù)據(jù)采樣。 Dropout層用于防止過度擬合。完全連接的塊由64個(gè)過濾器的Dense層和批處理歸一化后接dropout層組成。在通過Dense層之前，數(shù)據(jù)將被展平以匹配尺寸。最后，輸出層由具有softmax激活的Dense層組成，以根據(jù)num_classes給出表示要進(jìn)行的預(yù)測(cè)的次數(shù)的概率。

模型圖： (Model Plot:)

This is how our overall model which we built looks like:

這就是我們構(gòu)建的整體模型的樣子：

回調(diào)： (Callbacks:)

We will be importing the 3 required callbacks for training our model. The 3 important callbacks are ModelCheckpoint, ReduceLROnPlateau, and Tensorboard. Let us look at what task each of these individual callbacks performs.

我們將導(dǎo)入3個(gè)必需的回調(diào)以訓(xùn)練模型。 3個(gè)重要的回調(diào)是ModelCheckpoint，ReduceLROnPlateau和Tensorboard。讓我們看看這些單獨(dú)的回調(diào)分別執(zhí)行什么任務(wù)。

ModelCheckpoint — This callback is used for storing the weights of our model after training. We save only the best weights of our model by specifying save_best_only=True. We will monitor our training by using the accuracy metric.

ModelCheckpoint —此回調(diào)用于訓(xùn)練后存儲(chǔ)模型的權(quán)重。通過指定save_best_only = True，我們僅保存模型的最佳權(quán)重。我們將使用準(zhǔn)確性指標(biāo)來監(jiān)控我們的培訓(xùn)。

ReduceLROnPlateau — This callback is used for reducing the learning rate of the optimizer after a specified number of epochs. Here, we have specified the patience as 10. If the accuracy does not improve after 10 epochs, then our learning rate is reduced accordingly by a factor of 0.2. The metric used for monitoring here is accuracy as well.

ReduceLROnPlateau-此回調(diào)用于在指定的時(shí)期數(shù)之后降低優(yōu)化器的學(xué)習(xí)率。在這里，我們將耐心性指定為10。如果在10個(gè)周期后精度沒有提高，那么我們的學(xué)習(xí)率將相應(yīng)降低0.2倍。此處用于監(jiān)視的指標(biāo)也是準(zhǔn)確性。

Tensorboard — The tensorboard callback is used for plotting the visualization of the graphs, namely the graph plots for accuracy and the loss.

Tensorboard — tensorboard回調(diào)用于繪制圖形的可視化效果，即準(zhǔn)確性和損失的圖形圖。

編譯并擬合模型： (Compile and fit the model:)

We are compiling and fitting our model in the final step. Here, we are training the model and saving the best weights to emotions.h5 so that we don’t have to re-train the model repeatedly and we can use our saved model when required. We will be training on both the training and validation data. The loss we have used is categorical_crossentropy which computes the cross-entropy loss between the labels and predictions. The optimizer we will be using is Adam with a learning rate of 0.001 and we will compile our model on the metric accuracy. We will fit the data on the augmented training and validation images. After the fitting step, these are the results we are able to achieve on train and validation loss and accuracy.

我們將在最后一步中編譯和擬合模型。在這里，我們正在訓(xùn)練模型，并保存對(duì)情緒的最佳權(quán)重。因此，我們不必重復(fù)訓(xùn)練模型，可以在需要時(shí)使用保存的模型。我們將在培訓(xùn)和驗(yàn)證數(shù)據(jù)上進(jìn)行培訓(xùn)。我們使用的損失是categorical_crossentropy，它計(jì)算標(biāo)簽和預(yù)測(cè)之間的交叉熵?fù)p失。我們將使用的優(yōu)化器是Adam，學(xué)習(xí)率為0.001，我們將根據(jù)度量精度來編譯我們的模型。我們將把數(shù)據(jù)擬合在增強(qiáng)的訓(xùn)練和驗(yàn)證圖像上。擬合步驟完成之后，這些就是我們?cè)谟?xùn)練中以及驗(yàn)證損失和準(zhǔn)確性上能夠?qū)崿F(xiàn)的結(jié)果。

圖形： (Graph:)

觀察： (Observation:)

The Model is able to perform quite well. We can notice that the train and validation losses are decreasing constantly and the train as well as validation accuracy increases constantly. There is no over-fitting in the deep learning model and we are able to achieve an accuracy of about about 51% and validation accuracy of about 53%.

該模型能夠執(zhí)行得很好。我們可以注意到，訓(xùn)練和驗(yàn)證損失不斷減少，訓(xùn)練和驗(yàn)證準(zhǔn)確性也在不斷增加。深度學(xué)習(xí)模型中沒有過度擬合的問題，我們能夠?qū)崿F(xiàn)約51％的準(zhǔn)確性和約53％的驗(yàn)證準(zhǔn)確性。

This is it for the first part guys! I hope all of you enjoyed reading this as much as I did writing this article. In the next part, we will cover the gestures train model and then also look into a second emotions train model which we can use to achieve a higher accuracy. In the end, we will create a final pipeline to access the models in real time and get a vocal response from the model about the particular emotion or gesture. I will also be posting the GitHub repository for the entire code, scripts and building blocks. Stay tuned for the next part and have a wonderful day!

這是第一部分的家伙！我希望大家像我寫這篇文章一樣喜歡閱讀本文。在下一部分中，我們將介紹手勢(shì)訓(xùn)練模型，然后研究第二個(gè)情緒訓(xùn)練模型，我們可以使用該模型來獲得更高的準(zhǔn)確性。最后，我們將創(chuàng)建最終的管道以實(shí)時(shí)訪問模型，并從模型中獲得關(guān)于特定情緒或手勢(shì)的聲音響應(yīng)。我還將發(fā)布有關(guān)整個(gè)代碼，腳本和構(gòu)建塊的GitHub存儲(chǔ)庫。敬請(qǐng)期待下一部分，祝您有美好的一天！

翻譯自: https://towardsdatascience.com/human-emotion-and-gesture-detector-using-deep-learning-part-1-d0023008d0eb

深度學(xué)習(xí)：在圖像上找到手勢(shì)

總結(jié)

以上是生活随笔為你收集整理的深度学习：在图像上找到手势_使用深度学习的人类情绪和手势检测器：第1部分的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：于和伟获评国家一级演员本人幽默回应：恭
下一篇：梳理百年深度学习发展史-七月在线机器学习