UFLDL教程: Exercise: Implement deep networks for digit classification
Deep networks
Deep Learning and Unsupervised Feature Learning Tutorial Solutions
深度網(wǎng)絡(luò)的優(yōu)勢(shì)
訓(xùn)練深度網(wǎng)絡(luò)的困難
1. 數(shù)據(jù)獲取問題
需要依賴于有標(biāo)簽的數(shù)據(jù)才能進(jìn)行訓(xùn)練。然而有標(biāo)簽的數(shù)據(jù)通常是稀缺的,因此對(duì)于許多問題,我們很難獲得足夠多的樣本來擬合一個(gè)復(fù)雜模型的參數(shù)。考慮到深度網(wǎng)絡(luò)具有強(qiáng)大的表達(dá)能力,在不充足的數(shù)據(jù)上進(jìn)行訓(xùn)練將會(huì)導(dǎo)致過擬合。
2. 局部極值問題
使用監(jiān)督學(xué)習(xí)方法來對(duì)淺層網(wǎng)絡(luò)(只有一個(gè)隱藏層)進(jìn)行訓(xùn)練通常能夠使參數(shù)收斂到合理的范圍內(nèi). 使用監(jiān)督學(xué)習(xí)方法訓(xùn)練神經(jīng)網(wǎng)絡(luò)時(shí),通常會(huì)涉及到求解一個(gè)高度非凸的優(yōu)化問題(例如最小化訓(xùn)練誤差 ,其中參數(shù) 是要優(yōu)化的參數(shù)。對(duì)深度網(wǎng)絡(luò)而言,這種非凸優(yōu)化問題的搜索區(qū)域中充斥著大量“壞”的局部極值,因而使用梯度下降法(或者像共軛梯度下降法,L-BFGS等方法)效果并不好。
3. 梯度彌散問題
梯度下降法(以及相關(guān)的L-BFGS算法等)在使用隨機(jī)初始化權(quán)重的深度網(wǎng)絡(luò)上效果不好的技術(shù)原因是:梯度會(huì)變得非常小。具體而言,當(dāng)使用反向傳播方法計(jì)算導(dǎo)數(shù)的時(shí)候,隨著網(wǎng)絡(luò)的深度的增加,反向傳播的梯度(從輸出層到網(wǎng)絡(luò)的最初幾層)的幅度值會(huì)急劇地減小。結(jié)果就造成了整體的損失函數(shù)相對(duì)于最初幾層的權(quán)重的導(dǎo)數(shù)非常小。這樣,當(dāng)使用梯度下降法的時(shí)候,最初幾層的權(quán)重變化非常緩慢,以至于它們不能夠從樣本中進(jìn)行有效的學(xué)習(xí)。這種問題通常被稱為“梯度的彌散”.
與梯度彌散問題緊密相關(guān)的問題是:當(dāng)神經(jīng)網(wǎng)絡(luò)中的最后幾層含有足夠數(shù)量神經(jīng)元的時(shí)候,可能單獨(dú)這幾層就足以對(duì)有標(biāo)簽數(shù)據(jù)進(jìn)行建模,而不用最初幾層的幫助,也就是說無法起到對(duì)網(wǎng)絡(luò)的前幾層結(jié)構(gòu)起到學(xué)習(xí)的作用。
因此,對(duì)所有層都使用隨機(jī)初始化的方法訓(xùn)練得到的整個(gè)網(wǎng)絡(luò)的性能將會(huì)與訓(xùn)練得到的淺層網(wǎng)絡(luò)(僅由深度網(wǎng)絡(luò)的最后幾層組成的淺層網(wǎng)絡(luò))的性能相似。
逐層貪婪訓(xùn)練方法
逐層貪婪算法的主要思路
逐層貪婪的訓(xùn)練方法優(yōu)勢(shì)
1. 數(shù)據(jù)獲取
雖然獲取有標(biāo)簽數(shù)據(jù)的代價(jià)是昂貴的,但獲取大量的無標(biāo)簽數(shù)據(jù)是容易的。
自學(xué)習(xí)方法(self-taught learning)的潛力在于它能通過使用大量的無標(biāo)簽數(shù)據(jù)來學(xué)習(xí)到更好的模型。
具體而言,該方法使用無標(biāo)簽數(shù)據(jù)來學(xué)習(xí)得到所有層(不包括用于預(yù)測標(biāo)簽的最終分類層) 的最佳初始權(quán)重。相比純監(jiān)督學(xué)習(xí)方法,這種自學(xué)習(xí)方法能夠利用多得多的數(shù)據(jù),并且能夠?qū)W習(xí)和發(fā)現(xiàn)數(shù)據(jù)中存在的模式。
2. 更好的局部極值
當(dāng)用無標(biāo)簽數(shù)據(jù)訓(xùn)練完網(wǎng)絡(luò)后,相比于隨機(jī)初始化而言,各層初始權(quán)重會(huì)位于參數(shù)空間中較好的位置上。然后我們可以從這些位置出發(fā)進(jìn)一步微調(diào)權(quán)重。
從經(jīng)驗(yàn)上來說,以這些位置為起點(diǎn)開始梯度下降更有可能收斂到比較好的局部極值點(diǎn),這是因?yàn)?strong>無標(biāo)簽數(shù)據(jù)已經(jīng)提供了大量輸入數(shù)據(jù)中包含的模式的先驗(yàn)信息。 所以此時(shí)的參數(shù)初始化值一般都能得到最終比較好的局部最優(yōu)解。
備注
當(dāng)訓(xùn)練深度網(wǎng)絡(luò)的時(shí)候,每一層隱層應(yīng)該使用非線性的激活函數(shù) f(x)。這是因?yàn)槎鄬拥木€性函數(shù)組合在一起本質(zhì)上也只有線性函數(shù)的表達(dá)能力(例如,將多個(gè)線性方程組合在一起僅僅產(chǎn)生另一個(gè)線性方程)。因此,在激活函數(shù)是線性的情況下,相比于單隱藏層神經(jīng)網(wǎng)絡(luò),包含多隱藏層的深度網(wǎng)絡(luò)并沒有增加表達(dá)能力。
從自我學(xué)習(xí)到深層網(wǎng)絡(luò)
預(yù)訓(xùn)練與微調(diào)
預(yù)訓(xùn)練(pre-training):在訓(xùn)練獲得模型最初參數(shù)(利用自動(dòng)編碼器訓(xùn)練第一層,利用 logistic/softmax 回歸訓(xùn)練第二層);
微調(diào)(fine-tune):我們可以進(jìn)一步修正模型參數(shù),進(jìn)而降低訓(xùn)練誤差。
在什么時(shí)候應(yīng)用微調(diào)?
通常僅在有大量已標(biāo)注訓(xùn)練數(shù)據(jù)的情況下使用。在這樣的情況下,微調(diào)能顯著提升分類器性能。
然而,如果有大量未標(biāo)注數(shù)據(jù)集(用于非監(jiān)督特征學(xué)習(xí)/預(yù)訓(xùn)練),卻只有相對(duì)較少的已標(biāo)注訓(xùn)練集,微調(diào)的作用非常有限。 這時(shí)可用Self-Taught Learning_Exercise(斯坦福大學(xué)深度學(xué)習(xí)教程UFLDL)中介紹的方法。
實(shí)驗(yàn)內(nèi)容
Exercise: Implement deep networks for digit classification。利用深度網(wǎng)絡(luò)完成MNIST手寫數(shù)字?jǐn)?shù)據(jù)庫中手寫數(shù)字的識(shí)別。
用6萬個(gè)已標(biāo)注數(shù)據(jù)(即:6萬張28*28的圖像塊(patches)),作為訓(xùn)練數(shù)據(jù)集,然后把它輸入到棧式自編碼器中,它的第一層自編碼器提取出訓(xùn)練數(shù)據(jù)集的一階特征,接著把這個(gè)一階特征輸入到第二層自編碼器中提取出二階特征,然后把把這個(gè)二階特征輸入到softmax分類器,再用原始數(shù)據(jù)的標(biāo)簽和二階特征來訓(xùn)練softmax分類器,最后利用BP算法對(duì)整個(gè)網(wǎng)絡(luò)的權(quán)重值進(jìn)行微調(diào)以更好地學(xué)習(xí)數(shù)據(jù),
再用1萬個(gè)已標(biāo)注數(shù)據(jù)(即:1萬張28*28的圖像塊(patches))作為測試數(shù)據(jù)集,用前面訓(xùn)練好的softmax分類器對(duì)測試數(shù)據(jù)集進(jìn)行分類,并計(jì)算分類的正確率。本節(jié)整個(gè)網(wǎng)絡(luò)結(jié)構(gòu)如下:
實(shí)驗(yàn)步驟
1.初始化參數(shù),加載MNIST手寫數(shù)字?jǐn)?shù)據(jù)庫。
2.利用訓(xùn)練樣本集訓(xùn)練第一個(gè)稀疏編碼器,得到它的權(quán)重參數(shù)值sae1OptTheta,通過sae1OptTheta可得到原始數(shù)據(jù)的一階特征sae1Features。
3.利用一階特征sae1Features訓(xùn)練第二個(gè)自編碼器,得到它的權(quán)重參數(shù)值sae2OptTheta,通過sae2OptTheta可得到原始數(shù)據(jù)的二階特征sae2Features。
4.利用二階特征sae2Features和原始數(shù)據(jù)的標(biāo)簽來訓(xùn)練softmax分類器,得到softmax分類器的權(quán)重參數(shù)saeSoftmaxOptTheta。
5.利用誤差反向傳播進(jìn)行微調(diào),利用前面得到的所有權(quán)重參數(shù)sae1OptTheta、sae2OptTheta、saeSoftmaxOptTheta,得到微調(diào)前整個(gè)網(wǎng)絡(luò)的權(quán)重參數(shù)stackedAETheta,然后在利用原始數(shù)據(jù)及其標(biāo)簽的基礎(chǔ)上通過BP算法對(duì)stackedAETheta進(jìn)行微調(diào),得到微調(diào)后的整個(gè)網(wǎng)絡(luò)的權(quán)重參數(shù)stackedAEOptTheta。
6.利用測試樣本集對(duì)得到的分類器進(jìn)行精度測試.通過微調(diào)前整個(gè)網(wǎng)絡(luò)的權(quán)重參數(shù)stackedAETheta和微調(diào)后的整個(gè)網(wǎng)絡(luò)的權(quán)重參數(shù)stackedAEOptTheta,分別對(duì)測試數(shù)據(jù)進(jìn)行分類,得到兩者的分類準(zhǔn)確率。
stackedAEExercise.m
%% CS294A/CS294W Stacked Autoencoder Exercise% Instructions % ------------ % % This file contains code that helps you get started on the % sstacked autoencoder exercise. You will need to complete code in % stackedAECost.m % You will also need to have implemented sparseAutoencoderCost.m and % softmaxCost.m from previous exercises. You will need the initializeParameters.m % loadMNISTImages.m, and loadMNISTLabels.m files from previous exercises. % % For the purpose of completing the assignment, you do not need to % change the code in this file. % %%====================================================================== %% STEP 0: Here we provide the relevant parameters values that will % allow your sparse autoencoder to get good filters; you do not need to % change the parameters below.%設(shè)置多層自編碼器的相關(guān)參數(shù)% 整個(gè)網(wǎng)絡(luò)的輸入輸出結(jié)構(gòu) inputSize = 28 * 28; numClasses = 10;% 稀疏自編碼器結(jié)構(gòu)hiddenSizeL1 = 200; % Layer 1 Hidden Size hiddenSizeL2 = 200; % Layer 2 Hidden Size sparsityParam = 0.1; % desired average activation of the hidden units.% (This was denoted by the Greek alphabet rho, which looks like a lower-case "p", % 一些權(quán)值 % in the lecture notes). lambda = 3e-3; % weight decay parameter beta = 3; % weight of sparsity penalty term %%====================================================================== %% STEP 1: Load data from the MNIST database %載入MNSIT數(shù)據(jù)集及標(biāo)簽集 % This loads our training data from the MNIST database files.% Load MNIST database files DISPLAY = true;addpath mnist/ trainData = loadMNISTImages('mnist/train-images.idx3-ubyte'); trainLabels = loadMNISTLabels('mnist/train-labels.idx1-ubyte');trainLabels(trainLabels == 0) = 10; % Remap 0 to 10 since our labels need to start from 1%%====================================================================== %% STEP 2: Train the first sparse autoencoder % This trains the first sparse autoencoder on the unlabelled STL training % images. % If you've correctly implemented sparseAutoencoderCost.m, you don't need % to change anything here.%訓(xùn)練第一個(gè)稀疏自編碼器(訓(xùn)練樣本集為trainData,看作是無標(biāo)簽訓(xùn)練樣本集)% Randomly initialize the parameters sae1Theta = initializeParameters(hiddenSizeL1, inputSize);%% ---------------------- YOUR CODE HERE --------------------------------- % Instructions: Train the first layer sparse autoencoder, this layer has % an hidden size of "hiddenSizeL1" % You should store the optimal parameters in sae1OptTheta% 利用無標(biāo)簽樣本集對(duì)稀疏自編碼器進(jìn)行學(xué)習(xí),學(xué)習(xí)到的參數(shù)存放在向量sae1OptTheta中 % 優(yōu)化函數(shù)的一些參數(shù)設(shè)置addpath minFunc/; options = struct; options.Method = 'lbfgs'; options.maxIter = 400; options.display = 'on';% 調(diào)用優(yōu)化函數(shù),得到優(yōu)化向量sae1OptTheta [sae1OptTheta, cost] = minFunc(@(p)sparseAutoencoderCost(p,...inputSize,hiddenSizeL1,lambda,sparsityParam,beta,trainData),sae1Theta,options);%訓(xùn)練出第一層網(wǎng)絡(luò)的參數(shù)%輸入維數(shù)、輸出維數(shù) save('saves/step2.mat', 'sae1OptTheta');if DISPLAYW1 = reshape(sae1OptTheta(1:hiddenSizeL1 * inputSize), hiddenSizeL1, inputSize);display_network(W1'); end% -------------------------------------------------------------------------%%====================================================================== %% STEP 2: Train the second sparse autoencoder訓(xùn)練第二個(gè)稀疏自編碼器(訓(xùn)練數(shù)據(jù)是第一個(gè)自編碼器提取到的特征) % This trains the second sparse autoencoder on the first autoencoder % featurse. % If you've correctly implemented sparseAutoencoderCost.m, you don't need % to change anything here.% 利用第一個(gè)稀疏自編碼器的權(quán)重參數(shù)sae1OptTheta,得到輸入數(shù)據(jù)的一階特征表示 % 求解第一個(gè)自編碼器的輸出sae1Features(維數(shù)為hiddenSizeL1) [sae1Features] = feedForwardAutoencoder(sae1OptTheta, hiddenSizeL1, ...inputSize, trainData);% Randomly initialize the parameters sae2Theta = initializeParameters(hiddenSizeL2, hiddenSizeL1);%% ---------------------- YOUR CODE HERE --------------------------------- % Instructions: Train the second layer sparse autoencoder, this layer has % an hidden size of "hiddenSizeL2" and an inputsize of % "hiddenSizeL1" % % You should store the optimal parameters in sae2OptTheta% 開始訓(xùn)練第二個(gè)自編碼器,輸入維數(shù)是hiddenSizeL1,輸出維數(shù)是hiddenSizeL2,優(yōu)化向量存放在sae2OptTheta中 [sae2OptTheta, cost] = minFunc(@(p)sparseAutoencoderCost(p,...hiddenSizeL1,hiddenSizeL2,lambda,sparsityParam,beta,sae1Features),sae2Theta,options);%訓(xùn)練出第二層網(wǎng)絡(luò)的參數(shù) save('saves/step3.mat', 'sae2OptTheta');figure; if DISPLAYW11 = reshape(sae1OptTheta(1:hiddenSizeL1 * inputSize), hiddenSizeL1, inputSize);W12 = reshape(sae2OptTheta(1:hiddenSizeL2 * hiddenSizeL1), hiddenSizeL2, hiddenSizeL1);% TODO(zellyn): figure out how to display a 2-level network % display_network(log(W11' ./ (1-W11')) * W12'); % W12_temp = W12(1:196,1:196); % display_network(W12_temp'); % figure; % display_network(W12_temp'); end% -------------------------------------------------------------------------%%====================================================================== %% STEP 3: Train the softmax classifier%用二階特征訓(xùn)練softmax分類器 %訓(xùn)練softmax classifier(它的輸入為第二個(gè)自編碼器提取到的特征sae2Features) % This trains the sparse autoencoder on the second autoencoder features. % If you've correctly implemented softmaxCost.m, you don't need % to change anything here.% 利用第二個(gè)稀疏自編碼器的權(quán)重參數(shù)sae2OptTheta,得到輸入數(shù)據(jù)的二階特征表示 % 求解第二個(gè)自編碼器的輸出sae1Features(維數(shù)為hiddenSizeL2) [sae2Features] = feedForwardAutoencoder(sae2OptTheta, hiddenSizeL2, ...hiddenSizeL1, sae1Features);% Randomly initialize the parameters saeSoftmaxTheta = 0.005 * randn(hiddenSizeL2 * numClasses, 1);%% ---------------------- YOUR CODE HERE --------------------------------- % Instructions: Train the softmax classifier, the classifier takes in % input of dimension "hiddenSizeL2" corresponding to the % hidden layer size of the 2nd layer. % % You should store the optimal parameters in saeSoftmaxOptTheta % % NOTE: If you used softmaxTrain to complete this part of the exercise, % set saeSoftmaxOptTheta = softmaxModel.optTheta(:);% 開始優(yōu)化softmax classifier,得到優(yōu)化向量softmaxLambda = 1e-4; numClasses = 10; softoptions = struct; softoptions.maxIter = 400; softmaxModel = softmaxTrain(hiddenSizeL2,numClasses,softmaxLambda,...sae2Features,trainLabels,softoptions); saeSoftmaxOptTheta = softmaxModel.optTheta(:);%得到softmax分類器的權(quán)重參數(shù)save('saves/step4.mat', 'saeSoftmaxOptTheta');% -------------------------------------------------------------------------%%====================================================================== %% STEP 5: Finetune softmax model微調(diào)多層自編碼器% Implement the stackedAECost to give the combined cost of the whole model % then run this cell.% 利用稀疏自編碼(stack)和softmax分類器(saeSoftmaxOptTheta)學(xué)習(xí)到的參數(shù)作為微調(diào)模型的初始值 % 稀疏自編碼的參數(shù)stack% Initialize the stack using the parameters learned stack = cell(2,1);%存放稀疏自編碼器參數(shù)的元胞 stack{1}.w = reshape(sae1OptTheta(1:hiddenSizeL1*inputSize), ...hiddenSizeL1, inputSize); stack{1}.b = sae1OptTheta(2*hiddenSizeL1*inputSize+1:2*hiddenSizeL1*inputSize+hiddenSizeL1); stack{2}.w = reshape(sae2OptTheta(1:hiddenSizeL2*hiddenSizeL1), ...hiddenSizeL2, hiddenSizeL1); stack{2}.b = sae2OptTheta(2*hiddenSizeL2*hiddenSizeL1+1:2*hiddenSizeL2*hiddenSizeL1+hiddenSizeL2);% Initialize the parameters for the deep model [stackparams, netconfig] = stack2params(stack);%所有stack轉(zhuǎn)化為向量形式,并提取稀疏自編碼器的結(jié)構(gòu)% 整個(gè)模型參數(shù)(saeSoftmaxOptTheta+stack) stackedAETheta = [ saeSoftmaxOptTheta ; stackparams ];%% ---------------------- YOUR CODE HERE --------------------------------- % Instructions: Train the deep network, hidden size here refers to the ' % dimension of the input to the classifier, which corresponds % to "hiddenSizeL2". % % % 用BP算法微調(diào),得到微調(diào)后的整個(gè)網(wǎng)絡(luò)參數(shù)stackedAEOptTheta[stackedAEOptTheta, cost] = minFunc(@(p)stackedAECost(p,inputSize,hiddenSizeL2,...numClasses, netconfig,lambda, trainData, trainLabels),...stackedAETheta,options);%訓(xùn)練出第三層網(wǎng)絡(luò)的參數(shù) save('saves/step5.mat', 'stackedAEOptTheta');figure; if DISPLAYoptStack = params2stack(stackedAEOptTheta(hiddenSizeL2*numClasses+1:end), netconfig);W11 = optStack{1}.w;W12 = optStack{2}.w;% TODO(zellyn): figure out how to display a 2-level network% display_network(log(1 ./ (1-W11')) * W12'); end% -------------------------------------------------------------------------%%====================================================================== %% STEP 6: Test % Instructions: You will need to complete the code in stackedAEPredict.m % before running this part of the code %% Get labelled test images % Note that we apply the same kind of preprocessing as the training set% 獲取有標(biāo)簽樣本集 testData = loadMNISTImages('mnist/t10k-images-idx3-ubyte'); testLabels = loadMNISTLabels('mnist/t10k-labels-idx1-ubyte');testLabels(testLabels == 0) = 10; % Remap 0 to 10% 進(jìn)行預(yù)測(微調(diào)前的) [pred] = stackedAEPredict(stackedAETheta, inputSize, hiddenSizeL2, ...numClasses, netconfig, testData);acc = mean(testLabels(:) == pred(:));% 計(jì)算預(yù)測精度 fprintf('Before Finetuning Test Accuracy: %0.3f%%\n', acc * 100);% 進(jìn)行預(yù)測(微調(diào)后的) [pred] = stackedAEPredict(stackedAEOptTheta, inputSize, hiddenSizeL2, ...numClasses, netconfig, testData);acc = mean(testLabels(:) == pred(:));% 計(jì)算預(yù)測精度 fprintf('After Finetuning Test Accuracy: %0.3f%%\n', acc * 100);% Accuracy is the proportion of correctly classified images % The results for our implementation were: % % Before Finetuning Test Accuracy: 87.7% % After Finetuning Test Accuracy: 97.6% % % If your values are too low (accuracy less than 95%), you should check % your code for errors, and make sure you are training on the % entire data set of 60000 28x28 training images % (unless you modified the loading code, this should be the case)stackedAECost.m
function [ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, ...numClasses, netconfig, ...lambda, data, labels)% stackedAECost: Takes a trained softmaxTheta and a training data set with labels, % and returns cost and gradient using a stacked autoencoder model. Used for % finetuning.% 計(jì)算整個(gè)模型的代價(jià)函數(shù)及其梯度 % 注意:完成這個(gè)函數(shù)后最好用checkStackedAECost函數(shù)檢查梯度計(jì)算是否正確 % theta: trained weights from the autoencoder整個(gè)網(wǎng)絡(luò)的權(quán)值向量 % visibleSize: the number of input units網(wǎng)絡(luò)的輸入層維數(shù) % hiddenSize: the number of hidden units *at the 2nd layer*最后一個(gè)稀疏自編碼器的隱藏層維數(shù) % numClasses: the number of categories類別總數(shù) % netconfig: the network configuration of the stack % lambda: the weight regularization penalty % data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example. 訓(xùn)練樣本集 % labels: A vector containing labels, where labels(i) is the label for the訓(xùn)練樣本集的標(biāo)簽 % i-th training example%% Unroll softmaxTheta parameter% We first extract the part which compute the softmax gradient softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);% Extract out the "stack" stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);% You will need to compute the following gradients softmaxThetaGrad = zeros(size(softmaxTheta)); stackgrad = cell(size(stack)); for d = 1:numel(stack)stackgradozvdkddzhkzd.w = zeros(size(stackozvdkddzhkzd.w));stackgradozvdkddzhkzd.b = zeros(size(stackozvdkddzhkzd.b)); endcost = 0; % You need to compute this% You might find these variables useful M = size(data, 2); groundTruth = full(sparse(labels, 1:M, 1));%% --------------------------- YOUR CODE HERE ----------------------------- % Instructions: Compute the cost function and gradient vector for % the stacked autoencoder. % % You are given a stack variable which is a cell-array of % the weights and biases for every layer. In particular, you % can refer to the weights of Layer d, using stackozvdkddzhkzd.w and % the biases using stackozvdkddzhkzd.b . To get the total number of % layers, you can use numel(stack). % % The last layer of the network is connected to the softmax % classification layer, softmaxTheta. % % You should compute the gradients for the softmaxTheta, % storing that in softmaxThetaGrad. Similarly, you should % compute the gradients for each layer in the stack, storing % the gradients in stackgradozvdkddzhkzd.w and stackgradozvdkddzhkzd.b % Note that the size of the matrices in stackgrad should % match exactly that of the size of the matrices in stack. %depth = size(stack, 1); % 隱藏層的數(shù)量 a = cell(depth+1, 1); % 輸入層和隱藏層的輸出值,即:輸入層的輸出值和隱藏層的激活值 a{1} = data; % 輸入層的輸出值 Jweight = 0; % 權(quán)重懲罰項(xiàng) m = size(data, 2); % 樣本數(shù) % 計(jì)算隱藏層的激活值 for i=2:numel(a) a{i} = sigmoid(stack{i-1}.w*a{i-1}+repmat(stack{i-1}.b, [1 size(a{i-1}, 2)])); %Jweight = Jweight + sum(sum(stack{i-1}.w).^2); end M = softmaxTheta*a{depth+1}; M = bsxfun(@minus, M, max(M, [], 1)); %防止下一步計(jì)算指數(shù)函數(shù)時(shí)溢出M = exp(M); p = bsxfun(@rdivide, M, sum(M)); Jweight = Jweight + sum(softmaxTheta(:).^2); % 計(jì)算softmax分類器的代價(jià)函數(shù),為什么它就是整個(gè)模型的代價(jià)函數(shù)?cost = -1/m .* groundTruth(:)'*log(p(:)) + lambda/2*Jweight;% 代價(jià)函數(shù)=均方差項(xiàng)+權(quán)重衰減項(xiàng)(也叫:規(guī)則化項(xiàng)) %計(jì)算softmax分類器代價(jià)函數(shù)的梯度,即輸出層的梯度 softmaxThetaGrad = -1/m .* (groundTruth - p)*a{depth+1}' + lambda*softmaxTheta; delta = cell(depth+1, 1); %隱藏層和輸出層的殘差 %計(jì)算輸出層的殘差 delta{depth+1} = -softmaxTheta' * (groundTruth - p) .* a{depth+1} .* (1-a{depth+1}); %計(jì)算隱藏層的殘差for i=depth:-1:2 delta{i} = stack{i}.w'*delta{i+1}.*a{i}.*(1-a{i}); end % 通過前面得到的輸出層和隱藏層的殘差,計(jì)算隱藏層參數(shù)的梯度for i=depth:-1:1 stackgrad{i}.w = 1/m .* delta{i+1}*a{i}'; stackgrad{i}.b = 1/m .* sum(delta{i+1}, 2); end % -------------------------------------------------------------------------%% Roll gradient vector grad = [softmaxThetaGrad(:) ; stack2params(stackgrad)];end% You might find this useful function sigm = sigmoid(x)sigm = 1 ./ (1 + exp(-x)); endstackedAEPredict.m
function [pred] = stackedAEPredict(theta, inputSize, hiddenSize, numClasses, netconfig, data)% stackedAEPredict: Takes a trained theta and a test data set, % and returns the predicted labels for each example.% theta: trained weights from the autoencoder % visibleSize: the number of input units % hiddenSize: the number of hidden units *at the 2nd layer* % numClasses: the number of categories % data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example. % Your code should produce the prediction matrix % pred, where pred(i) is argmax_c P(y(c) | x(i)).%% Unroll theta parameter% We first extract the part which compute the softmax gradient softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);% Extract out the "stack" stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);%% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute pred using theta assuming that the labels start % from 1.%% 前向傳播計(jì)算 depth = numel(stack); a = cell(depth+1); a{1} = data; m = size(data, 2); for i=2:depth+1 a{i} = sigmoid(stack{i-1}.w*a{i-1}+ repmat(stack{i-1}.b, [1 m])); end % % %% softmax模型的輸出Htheta % % softmaxData=a{depth+1};%softmax的輸入即為stack自編碼器最后一層的輸出 % % M=softmaxTheta*softmaxData;%矩陣M % % M=bsxfun(@minus,M,max(M));%減去行向量α,防止數(shù)據(jù)溢出 % % Htheta=bsxfun(@rdivide,exp(M),sum(exp(M)));%softmax模型的假設(shè)函數(shù)輸出 % % %% 計(jì)算Htheta每一列最大元素所在位置,即為該列所對(duì)應(yīng)樣本的類別 % % [~,pred]=max(Htheta);[prob pred] = max(softmaxTheta*a{depth+1}); % -----------------------------------------------------------end% You might find this useful function sigm = sigmoid(x)sigm = 1 ./ (1 + exp(-x)); enddisplay_network.m
function [h, array] = display_network(A, opt_normalize, opt_graycolor, cols, opt_colmajor) % This function visualizes filters in matrix A. Each column of A is a % filter. We will reshape each column into a square image and visualizes % on each cell of the visualization panel. % All other parameters are optional, usually you do not need to worry % about it. % opt_normalize: whether we need to normalize the filter so that all of % them can have similar contrast. Default value is true.% 是否需要?dú)w一化的參數(shù)。真:每個(gè)圖像塊歸一化(即:每個(gè)圖像塊元素值除以該圖像塊中像素值絕對(duì)值的最大值); %假:整幅大圖像一起歸一化(即:每個(gè)圖像塊元素值除以整幅圖像中像素值絕對(duì)值的最大值)。默認(rèn)為真。% opt_graycolor: whether we use gray as the heat map. Default is true. % 該參數(shù)決定是否顯示灰度圖。 % 真:顯示灰度圖;假:不顯示灰度圖。默認(rèn)為真。% cols: how many columns are there in the display. Default value is the % squareroot of the number of columns in A.該參數(shù)決定將要顯示的整幅大圖像每一行中小圖像塊的個(gè)數(shù)。默認(rèn)為A列數(shù)的均方根。% opt_colmajor: you can switch convention to row major for A. In that % case, each row of A is a filter. Default value is false. % 該參數(shù)決定將要顯示的整個(gè)大圖像中每個(gè)小圖像塊是按行從左到右依次排列,還是按列從上到下依次排列 % 真:整個(gè)大圖像由每個(gè)小圖像塊按列從上到下依次排列組成; % 假:整個(gè)大圖像由每個(gè)小圖像塊按行從左到右依次排列組成。默認(rèn)為假。warning off all%關(guān)閉警告% 參數(shù)的默認(rèn)值 %exist(A),測試A是否存在,'var'表示只檢測變量 if ~exist('opt_normalize', 'var') || isempty(opt_normalize)opt_normalize= true; endif ~exist('opt_graycolor', 'var') || isempty(opt_graycolor)opt_graycolor= true; endif ~exist('opt_colmajor', 'var') || isempty(opt_colmajor)opt_colmajor = false; end% rescale整幅大圖像或整個(gè)數(shù)據(jù)0均值化 A = A - mean(A(:));if opt_graycolor, colormap(gray); end %如果要顯示灰度圖,就把該圖形的色圖(即:colormap)設(shè)置為gray% 計(jì)算整幅大圖像中每一行中小圖像塊的個(gè)數(shù)和第一列中小圖像塊的個(gè)數(shù),即列數(shù)n和行數(shù)m compute rows, cols % compute rows, cols [L M]=size(A);% M即為小圖像塊的總數(shù) sz=sqrt(L);% 每個(gè)小圖像塊內(nèi)像素點(diǎn)的行數(shù)和列數(shù) buf=1; % 用于把每個(gè)小圖像塊隔開,即小圖像塊之間的緩沖區(qū)。每個(gè)小圖像塊的邊緣都是一行和一列像素值為-1的像素點(diǎn)。 if ~exist('cols', 'var') %沒有給定列數(shù)的情況下 % 如變量cols不存在時(shí)if floor(sqrt(M))^2 ~= M %M不是平方數(shù)時(shí) % 如果M的均方根不是整數(shù),列數(shù)n就先暫時(shí)取值為M均方根的向右取整n=ceil(sqrt(M));while mod(M, n)~=0 && n<1.2*sqrt(M), n=n+1; end % 當(dāng)M不是n的整數(shù)倍且n小于1.2倍的M均方根值時(shí),列數(shù)n加1m=ceil(M/n); %m是最終要的小patch圖像的尺寸大小 % 行數(shù)m取值為小圖像塊總數(shù)M除以大圖像中每一行中小圖像塊的個(gè)數(shù)n,再向右取整elsen=sqrt(M); % 如果M的均方根是整數(shù),那m和n都取值為M的均方根m=n;end elsen = cols; % 如果變量cols存在,就直接令列數(shù)n等于cols,行數(shù)m為M除以n后向右取整m = ceil(M/n); endarray=-ones(buf+m*(sz+buf),buf+n*(sz+buf));%要保證每個(gè)小圖像塊的四周邊緣都是單行和單列像素值為-1的像素點(diǎn)。所以得到這個(gè)目標(biāo)矩陣if ~opt_graycolor % 如果分隔區(qū)不顯示黑色,而顯示灰度,那就要是要保證:每個(gè)小圖像塊的四周邊緣都是單行和單列像素值為-0.1的像素點(diǎn)array = 0.1.* array; endif ~opt_colmajor % 如果opt_colmajor為假,即:整個(gè)大圖像由每個(gè)小圖像塊按行從左到右依次排列組成k=1; %第k個(gè)小圖像塊for i=1:m % 行數(shù)for j=1:n % 列數(shù)if k>M, continue; endclim=max(abs(A(:,k)));if opt_normalizearray(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz)/clim; %從這可看是n是列數(shù),m是行數(shù)elsearray(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz)/max(abs(A(:)));endk=k+1;endend else % 如果opt_colmajor為真,即:整個(gè)大圖像由每個(gè)小圖像塊按列從上到下依次排列組成k=1;for j=1:n %列數(shù)for i=1:m %行數(shù)if k>M, continue; endclim=max(abs(A(:,k)));if opt_normalizearray(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz)/clim;elsearray(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz);endk=k+1;endend endif opt_graycolor % 要顯示灰度圖,此時(shí)每個(gè)小圖像塊的四周邊緣都是單行和單列像素值為-1的像素點(diǎn)。h=imagesc(array,'EraseMode','none',[-1 1]);%這里講EraseMode設(shè)置為none,表示重繪時(shí)不擦除任何像素點(diǎn)%圖形的EraseMode屬性設(shè)置為none:即為在該圖像上不做任何擦除,直接在原來圖形上繪制 else % 不顯示灰度圖,此時(shí)每個(gè)小圖像塊的四周邊緣都是單行和單列像素值為-0.1的像素點(diǎn)h=imagesc(array,'EraseMode','none',[-1 1]); end axis image off %去掉坐標(biāo)軸drawnow; %刷新屏幕,使圖像可一點(diǎn)一點(diǎn)地顯示warning on all %打開警告參考文獻(xiàn)
Deep Learning 8_深度學(xué)習(xí)UFLDL教程:Stacked Autocoders and Implement deep networks for digit classification_Exercise(斯坦福大學(xué)深度學(xué)習(xí)教程)
Exercise: Implement deep networks for digit classification
Deep learning:十六(deep networks)
UFLDL教程(六)之棧式自編碼器
UFLDL教程答案(6):Exercise:Implement deep networks for digit classification
吳恩達(dá) Andrew Ng 的公開課
總結(jié)
以上是生活随笔為你收集整理的UFLDL教程: Exercise: Implement deep networks for digit classification的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 农行燃梦白金信用卡不激活收年费吗?有什么
- 下一篇: UFLDL教程: Exercise:Le