立即学习AI:03-使用卷积神经网络进行马铃薯分类
今天學習AI (LEARN AI TODAY)
This is the 3rd story in the Learn AI Today series! These stories, or at least the first few, are based on a series of Jupyter notebooks I’ve created while studying/learning PyTorch and Deep Learning. I hope you find them as useful as I did!
這是《 今日學習AI》中的第三個故事 系列! 這些故事,或者至少是前幾篇小說,是基于我在學習/學習PyTorch和Deep Learning時創建的一系列Jupyter筆記本的 。 希望您發現它們和我一樣有用!
If you have not already, make sure to check the previous story!
如果您還沒有,請確保檢查以前的故事!
您將從這個故事中學到什么: (What you will learn in this story:)
- Potatoes Are Not All the Same 土豆不是都一樣
- Using Kaggle Datasets 使用Kaggle數據集
- How Convolutional Neural Networks Work 卷積神經網絡如何工作
- Using fastai2 to Make Your Life Easier 使用fastai2讓您的生活更輕松
1. Kaggle數據集 (1. Kaggle Datasets)
Kaggle Datasets page is a good place to start if you want to find a public dataset. There are almost 50 thousand datasets on Kaggle, a number growing every day as users create and upload new datasets to share with the world.
如果您要查找公共數據集,則Kaggle數據集頁面是一個不錯的起點。 Kaggle上有近5萬個數據集 ,隨著用戶創建和上傳新數據集以與世界共享,這一數據每天都在增長。
After having this idea of creating a Potato Classifier for this lesson, I quickly found this dataset that contains 4 classes of potatoes and also a lot of other fruits and vegetables.
有這種想法創造這一課土豆分類之后,我很快就發現這個數據集,其中包含4類土豆,也有很多其他的水果和蔬菜。
fruits 360 dataset.水果360數據集中的圖像樣本。2.卷積神經網絡(CNN) (2. Convolutional Neural Networks (CNNs))
The building blocks for computer vision are the Convolutional Neural Networks. These networks usually combine several layers of kernel convolution operations and downscaling.
卷積神經網絡是計算機視覺的基礎。 這些網絡通常結合了幾層內核卷積運算和縮減規模。
The animation below is a great visualization of the kernel convolution operations. The kernel, which is a small matrix, usually 3x3, moves over the entire image. Instead of calling it an image let’s refer to it as the input feature map to be more general.
下面的動畫很好地展示了內核卷積操作。 內核是一個很小的矩陣,通常為3x3,在整個圖像上移動。 與其將其稱為圖像,不如將其稱為輸入要素圖,以使其更為通用。
Theano documentation.Theano文檔中的卷積示例。At each step, the values of the kernel 3x3 matrix are multiplied elementwise to the corresponding values of the input feature map (blue matrix in the animation above) and the sum of those 9 products is the value for the output, resulting on the green matrix in the animation. The numbers in the kernel are parameters of the model to be learned. That way the model can learn to identify spatial patterns that are the basis of computer vision. By having multiple layers and gradually downscaling the images, the patterns learned by each convolutional layer are more and more complex. To get a deeper intuition of CNNs I recommend this story by Irhum Shafkat.
在每個步驟中,將內核3x3矩陣的值逐元素地乘以輸入特征圖的相應值(上面的動畫中的藍色矩陣),并且這9個乘積之和是輸出的值,產生綠色矩陣在動畫中。 內核中的數字是要學習的模型的參數。 這樣,模型就可以學習識別作為計算機視覺基礎的空間模式。 通過具有多層并逐漸縮小圖像的尺寸, 每個卷積層學習的模式變得越來越復雜。 為了更深入地了解CNN,我推薦Irhum Shafkat 講的這個故事 。
The idea of CNNs has been around since the 80s but it started to gain momentum in 2012 when the winners of ImageNet competition used such approach and ‘crushed’ the competition. Their paper describing the solution has the following abstract:
CNN的想法自80年代開始就出現了,但是在2012年ImageNet競賽的獲勝者使用這種方法并“壓垮”了比賽時,它就開始流行。 他們描述解決方案的論文摘要如下:
“We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.”
“我們訓練了一個大型的深度卷積神經網絡,將ImageNet LSVRC-2010競賽中的120萬個高分辨率圖像分類為1000個不同的類別。 在測試數據上,我們實現了前1個和前5個錯誤率分別為37.5%和17.0%,這比以前的最新技術要好得多。 該神經網絡具有6000萬個參數和65萬個神經元,它由五個卷積層組成,其中一些層是最大卷積層,以及三個完全連接的層,最終具有1000路softmax。 為了使訓練更快,我們使用了非飽和神經元和卷積運算的非常高效的GPU實現。 為了減少全連接層的過度擬合,我們采用了一種新近開發的正則化方法,稱為“丟包”,這種方法被證明非常有效。 我們還在ILSVRC-2012競賽中輸入了該模型的變體,并獲得了最高的前5名測試錯誤率15.3%,而第二名則達到了26.2%?!?
A top 5 error rate of 15.3% compared to 26.2% for the second-best entry is a huge breakthrough. Fast forward to today and the current top result for top 5 accuracy is 98.7% (error rate 1.3%).
前五名的錯誤率為15.3%,而第二名的錯誤率為26.2%,這是一個巨大的突破。 快進到今天, 當前排名 前5位的準確性 最高的結果 是98.7%(錯誤率1.3%)。
Let’s now code a very simple CNN with just two convolutional layers and use it to create a potato classifier!
現在,讓我們編寫一個只有兩個卷積層的非常簡單的CNN,并使用它來創建馬鈴薯分類器!
The first convolutional layer nn.Conv2d has 3 input channels and 32 output channels with a kernel size of 3x3. The number of input channels of 3 corresponds to the RGB image channels. The output channels number is just a choice.
nn.Conv2d積層nn.Conv2d具有3個輸入通道和32個輸出通道 , 內核大小為3x3 。 輸入通道數為3時,對應于RGB圖像通道。 輸出通道號只是一個選擇。
The second convolutional layer has 32 input channels to match the number of outputs channels of the previous layer and 64 output channels.
第二個卷積層具有32個輸入通道以匹配上一層的輸出通道數和64個輸出通道 。
Notice in lines 9 and 10 that after the convolutional layers I apply a F.max_pool2d and a F.relu . The max-pooling operation will simply downscale the image by selecting the maximum value of each 2x2 pixels. That way the resulting image has half the size. The ReLU is a non-linear activation function, as I mentioned in lesson 1 of this series.
注意,在第9和10行中,在卷積層之后,我應用了F.max_pool2d和F.relu 。 通過選擇每個2x2像素的最大值, 最大合并操作將簡單地縮小圖像的比例。 這樣,生成的圖像只有一半大小。 正如我在本系列的第1課中提到的那樣, ReLU是一種非線性激活函數。
After two convolutions and max poolings with the size of 2, the resulting feature map has 1/4 the size of the original image. I will be working with 64x64 images therefore this will result in a feature map of 16x16. I could add more of these convolutional layers but at some point when the feature map is already quite small, usually, the next step is to use an Average Pooling to reduce the feature map to 1x1 simply by computing the average. Notice that as we have 64 channels the resulting tensor will have a shape of (batch-size, 64, 1, 1) that then is reshaped to to (batch-size, 64) before applying the final linear layer.
經過兩次卷積和大小為2的最大池化后,所得特征圖的大小為原始圖像的1/4。 我將使用64x64圖像,因此這將導致16x16的特征圖。 我可以添加更多的這些卷積層,但是在某個時候特征圖已經很小的時候,通常,下一步是使用平均池通過簡單地計算平均值將特征圖減少到1x1。 請注意,由于我們具有64個通道,因此生成的張量將具有(batch-size, 64, 1, 1) 64,1,1 (batch-size, 64, 1, 1)的形狀,然后在應用最終線性層之前將其重塑為(batch-size, 64) 。
The final linear layer has an input size of 64 and an output size equal to the number of classes to predict. For this case, it will be 4 types of potatoes.
最終的線性層的輸入大小為64, 輸出大小等于要預測的類數 。 對于這種情況,將是4種土豆。
Note: A good way to understand how everything work is to use the Python debugger. You can import pdb and include pdb.set_trace() right in the forward method. Then you can move step by step and check the shapes of each layer to give you a better intuition or help to debug problems.
注意:了解一切工作原理的一個好方法是使用Python調試器 。 您可以import pdb并在forward方法中包含pdb.set_trace() 。 然后,您可以逐步移動并檢查每一層的形狀,以使您擁有更好的直覺或幫助調試問題。
3.使用fastai2使您的生活更輕松 (3. Using fastai2 to Make Your Life Easier)
It doesn’t worth wasting your time coding every step of the deep learning pipeline when there are tools that can make your life easier. That’s why in this story I’ll use fastai2 library to do most of the work. Nevertheless, I will use the basic CNN model defined in the previous section. Note that fastai2 uses PyTorch and makes customization of every step easy, making it useful for both beginner and advanced deep learning practitioners and researchers.
如果有可以使您的生活更輕松的工具,那么浪費您的時間在深度學習管道的每個步驟上進行編碼是不值得的。 這就是為什么在本故事中,我將使用fastai2庫來完成大部分工作。 不過,我將使用上一節中定義的基本CNN模型。 請注意,fastai2使用PyTorch并使每個步驟的自定義變得容易, 這對于初學者和高級深度學習從業人員以及研究人員都非常有用。
The following 12 lines of code are the entire Deep Learning pipeline in fastai2, using the BasicCNN defined in the previous section! You can find here the notebook with all the code for this lesson.
以下12行代碼是使用上一節中定義的BasicCNN組成的fastai2整個深度學習管道! 您可以在此處找到帶有本課程所有代碼的筆記本。
Lines 1 — 6: The fastai DataBlock is defined. I covered the topic of fastai DataBlock in this and this stories. The ImageBlock and CategoryBlock indicate that the dataloaders will have an input of image type and a target of categorical type.
第1至6行:定義了fastai DataBlock 。 我渾身fastai數據塊的話題這個和這個故事。 ImageBlock和CategoryBlock指示數據加載器將具有圖像類型的輸入和類別類型的目標。
Lines 2 and 3: The get_x and get_y are the arguments where the function to process the input and targets is given. In this case, I will be reading from a pandas dataframe the columns ‘file’ (with the path to each image file) and ‘id’ with the type of potato.
第2行和第3行: get_x和get_y是給出處理輸入和目標的函數的參數。 在這種情況下,我將從熊貓數據框中讀取“文件”列(帶有每個圖像文件的路徑)和“ id”列以及馬鈴薯的類型。
Line 4: The splitter is the argument where you can tell how to split the data into train and validation sets. Here I used RandomSplitter that by default selects 20% of the data randomly to create the validation set.
第4行: splitter是一個參數,您可以在其中告訴如何將數據拆分為訓練集和驗證集。 在這里,我使用RandomSplitter ,默認情況下會隨機選擇20%的數據來創建驗證集。
Line 5: A transformation is added to resize the images to 64x64.
第5行:添加了一種轉換,以將圖像調整為64x64。
Line 6: Normalization and image augmentations are included. Notice that I’m using the default augmentations. One nice thing about fastai is that most of the time you can use the default and it works. This is very good for learning because you don’t need to understand all the details before you start doing interesting work.
第6行:包括標準化和圖像增強。 請注意,我正在使用默認擴充。 關于fastai的一件好事是,大多數時候您都可以使用默認值并且它可以工作。 這對學習非常有好處,因為在開始有趣的工作之前,您不需要了解所有細節。
Line 8: The dataloaders object is created. (The train_df is the dataframe with file and id columns, check the full code here).
第8行:創建了dataloaders對象。 ( train_df是具有file和id列的數據train_df ,請在此處查看完整代碼)。
Line 9: Creating an instance of the BasicCNN model with a number of classes of 4 (notice that dls.c indicates the number of classes automatically).
第9行:創建具有4個類的類的BasicCNN模型的實例(注意dls.c自動指示類的數量)。
Line 10: The fastai Learner object is defined. This is where you indicate the model, loss function, optimizer and validation metrics. The loss function I will use is nn.CrossEntropyLoss that as covered in the previous lesson is the first choice for classification problems with more than 2 categories.
第10行:定義了fastai 學習者對象。 您可以在此處指示模型,損失函數,優化器和驗證指標。 我將使用的損失函數是nn.CrossEntropyLoss ,如上一課所述,它是解決2個以上類別的分類問題的首選。
Line 12: The model is trained for 30 epochs using a once-cycle learning rate schedule (the learning rate increases fast up to the lr_max and then gradually decreases) and a weight decay of 0.01.
第12行:使用一次周期學習率計劃(學習率快速增加到lr_max ,然后逐漸減小)并權重衰減為0.01,對模型訓練了30個紀元。
After training for 30 epochs I got a validation accuracy of 100% with this simple CNN model! This is what the training and validation loss looks like a train progresses:
在訓練了30個紀元后,我使用此簡單的CNN模型獲得了100%的驗證準確性! 這是訓練和驗證損失看起來像火車前進的樣子:
Train and validation loss evolution over the training. Image by the author.在培訓過程中進行培訓和驗證損失的演變。 圖片由作者提供。And that’s it! If you followed along with the code you can now identify among 4 types of potatoes very accurately. And most importantly, nothing in this example is specific about potatoes! You can apply a similar approach to virtually anything you want to classify!
就是這樣! 如果遵循了代碼,您現在可以非常準確地在4種土豆中進行識別。 最重要的是,在此示例中,沒有什么是土豆特有的! 您可以對幾乎所有您想要分類的東西都應用類似的方法!
家庭作業 (Homework)
I can show you a thousand examples but you will learn the most if you can make one or two experiments by yourself! The complete code for this story is available on this notebook.
我可以向您展示一千個示例,但如果您自己進行一兩個實驗,您將學到最多的知識! 有關此故事的完整代碼,請參閱此筆記本 。
- As in the previous lesson, try to play with the learning rate, number of epochs, weight decay and the size of the model. 與上一課一樣,嘗試發揮學習率,歷元數,權重衰減和模型的大小。
Instead of the BasicCNN model, try using a Resnet34 pretrained on ImageNet (take a look at fastai cnn_learner ) How do results compare? You can try larger image sizes and activate the GPU on the Kaggle kernel to make training faster! (Kaggle provides you with 30h/week of GPU usage for free)
代替BasicCNN模型,嘗試使用在ImageNet上經過預訓練的Resnet34(看看fastai cnn_learner )結果如何比較? 您可以嘗試更大的圖像尺寸,并在Kaggle內核上激活GPU,以加快訓練速度! (Kaggle免費為您提供每周30小時的GPU使用量)
- Train now the model using all fruits and vegetables in the dataset and take a look of the results. The dataset also includes a test set that you can use to further test the trained model! 現在使用數據集中的所有水果和蔬菜訓練模型并查看結果。 數據集還包含一個測試集,可用于進一步測試訓練后的模型!
And as always, if you create interesting notebooks with nice animations as a result of your experiments, go ahead and share them on GitHub, Kaggle or write a Medium story!
而且,像往常一樣,如果您通過實驗創建了帶有精美動畫的有趣筆記本,請繼續在GitHub,Kaggle上共享它們,或撰寫一個中型故事!
結束語 (Final remarks)
This ends the third story in the Learn AI Today series!
到此為止,《今日學習AI》系列的第三個故事!
Please consider joining my mailing list in this link so that you won’t miss any of my upcoming stories!
請考慮通過此鏈接加入我的郵件列表 這樣您就不會錯過任何我即將發表的故事!
I will also be listing the new stories at learn-ai-today.com, the page I created for this learning journey, and at this GitHub repository!
我還將在learning-ai-today.com ,為此學習旅程創建的頁面以及此GitHub存儲庫中列出新故事!
And in case you missed it before, this is the link for the Kaggle notebook with the code for this story!
萬一您之前錯過了它, 這是Kaggle筆記本的鏈接以及此故事的代碼 !
Feel free to give me some feedback in the comments. What did you find most useful or what could be explained better? Let me know!
請隨時在評論中給我一些反饋。 您覺得最有用的是什么? 讓我知道!
You can read more about my Deep Learning journey on the following stories!
您可以在以下故事中閱讀有關我的深度學習之旅的更多信息!
Thanks for reading! Have a great day!
謝謝閱讀! 祝你有美好的一天!
翻譯自: https://towardsdatascience.com/learn-ai-today-03-potato-classification-using-convolutional-neural-networks-4481222f2806
總結
以上是生活随笔為你收集整理的立即学习AI:03-使用卷积神经网络进行马铃薯分类的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: opencv 检测几何图形_使用Open
- 下一篇: open ai gpt_您实际上想尝试的