當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

UFLDL教程：Exercise:Softmax Regression

發布時間：2023/12/13 编程问答 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 UFLDL教程：Exercise:Softmax Regression 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Softmax分類函數的Python實現

Deep Learning and Unsupervised Feature Learning Tutorial Solutions

邏輯回歸假設函數

在線性回歸問題中，假設函數具有如下形式：

在 logistic 回歸中，我們的訓練集由m 個已標記的樣本構成：，其中輸入特征。由于 logistic 回歸是針對二分類問題的，因此類標記。邏輯回歸的假設函數的輸出值位于[0, 1]之間，所以，我們需要找到一個滿足這個性質的假設函數。

在邏輯回歸問題中，將該函數的形式轉換為如下形式：

其中，函數g稱為S型函數（sigmoid function）或者是邏輯函數（Logistic function）（這兩個術語是可以互換的），它具有如下形式：

該函數圖形如下圖所示：

可以看到，S型函數的取值位于（0,1）之間，滿足。那么

邏輯回歸的假設函數(hypothesis function) 可以為如下形式：

雖然該算法中有”回歸”二字，但其實它并不是一種回歸算法，而是一種分類算法。

成本函數Cost和代價函數J

介紹構造邏輯回歸問題的成本函數Cost和代價函數J。給定了一個含有m個樣本的訓練樣本集，每個樣本具有n維特征。x(i)為第i個樣本的特征向量，y(i)為第i個樣本的分類標簽，回歸問題的假設函數為hθ(x)，那么，如何能夠根據給定的訓練樣本集得到假設函數中的參數向量θ（也就是模型參數）呢？

參考線性回歸分析中的代價函數J(θ)：

可以構造如下的代價函數J(θ)

其中成本函數Cost為（類比線性回歸問題構造的）：

但是，這個成本函數Cost是關于θ的非凸函數，它的圖形如下圖左側所示，梯度下降法不能保證該函數收斂到全局最小值，但梯度下降法可以保證凸函數（如下圖右側所示）收斂到全局最小解。所以，我們需要尋找另外的代價函數，使它是凸函數。

對于邏輯回歸問題，構造如下式所示的成本函數：

也就是說，當樣本標簽為1時，利用-log(hθ(x))計算成本函數，該函數圖像如下圖左側所示；當樣本標簽為0時，利用-log(1-hθ(x))計算成本函數，該函數圖像如下圖右側所示。

從圖中可以看出成本函數的意義：
當y=1（即標簽為1）時
假設函數的輸出值為1，則成本函數取值為0（實現了正確的分類，不需要付出代價）。
假設函數的輸出值為0，則成本函數取值為無窮大（分類錯誤，付出了無窮大的代價）。

邏輯回歸的極大使然推導

邏輯回歸的理激活函數是sigmoid函數，可理解成一個被sigmoid函數歸一化后的線性回歸。因為sigmoid函數把實數映射到了[0,1]區間。給定有一個訓練數據，構造它的似然函數（likelihood function）為

一般會使用最大釋然求解參數，這時取一個負的log對數（negative logarithm），得到：

上式被稱為交叉熵(cross entropy) loss函數，因為取了一個負對數，之前的最大化就變成了最小化，所以只需求解是交叉熵loss函數最小的參數。

對loss函數求導得到

到現在為止，我們已經得到了loss函數以及關于參數的偏導數，只需要通過梯度下降就可以得到參數的解。

簡化成本函數Cost和代價函數J

下式給出了邏輯回歸問題的代價函數J(θ)（即考慮了m個訓練樣本）

可以看到，在分類問題中，訓練樣本的y值永遠是1或者0，所以可以據此來簡化成本函數的書寫（即將原來的分情況討論的兩個式子合并為一個式子），簡化后的成本函數如下：

從而，邏輯回歸的代價函數J(θ)可以寫為：

通過最小化代價函數J(θ)可以得到最優的向量參數。

邏輯回歸問題的梯度下降法

相應的損失函數為

優化問題的關鍵問題是代價函數對待優化參數的梯度和代價函數取值的求解。

利用下式計算代價函數取值

利用下式計算代價函數的偏導數

偏導數公式和向量化推導

偏導數的推導過程

向量化的推導過程

% Exercise 4 -- Logistic Regressionclear all; close all; clcx = load('ex4x.dat'); y = load('ex4y.dat');[m, n] = size(x);% Add intercept term to x x = [ones(m, 1), x]; % Plot the training data % Use different markers for positives and negatives figure pos = find(y); neg = find(y == 0);%find是找到的一個向量，其結果是find函數括號值為真時的值的編號 plot(x(pos, 2), x(pos,3), '+') hold on plot(x(neg, 2), x(neg, 3), 'o') hold on xlabel('Exam 1 score') ylabel('Exam 2 score')% Initialize fitting parameters theta = zeros(n+1, 1);% Define the sigmoid function g = inline('1.0 ./ (1.0 + exp(-z))'); % Newton's method MAX_ITR = 7; J = zeros(MAX_ITR, 1);for i = 1:MAX_ITR% Calculate the hypothesis functionz = x * theta;h = g(z);%轉換成logistic函數% Calculate gradient and hessian.% The formulas below are equivalent to the summation formulas% given in the lecture videos.grad = (1/m).*x' * (h-y);%梯度的矢量表示法H = (1/m).*x' * diag(h) * diag(1-h) * x;%hessian矩陣的矢量表示法% Calculate J (for testing convergence)J(i) =(1/m)*sum(-y.*log(h) - (1-y).*log(1-h));%損失函數的矢量表示法theta = theta - H\grad;%是這樣子的嗎？ end % Display theta theta% Calculate the probability that a student with % Score 20 on exam 1 and score 80 on exam 2 % will not be admitted prob = 1 - g([1, 20, 80]*theta)%畫出分界面 % Plot Newton's method result % Only need 2 points to define a line, so choose two endpoints plot_x = [min(x(:,2))-2, max(x(:,2))+2]; % Calculate the decision boundary line，plot_y的計算公式見博客下面的評論。 plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1)); plot(plot_x, plot_y) legend('Admitted', 'Not admitted', 'Decision Boundary') hold off% Plot J figure plot(0:MAX_ITR-1, J, 'o--', 'MarkerFaceColor', 'r', 'MarkerSize', 8) xlabel('Iteration'); ylabel('J') % Display J J

Softmax的極大使然推導

在Logistic回歸中，樣本數據的值，而在softmax回歸中，其中k是類別種數，

比如在手寫識別中k=10，表示要識別的10個數字。設

那么

而且有

為了將多項式模型表述成指數分布族，先引入T(y)，它是一個k-1維的向量，那么

應用于一般線性模型，y必然是屬于k個類中的一種。用表示為真，同樣當為假時，有

，那么進一步得到聯合分布的概率密度函數為

對比一下，可以得到

由于

那么最終得到

可以得到期望值為

接下來得到對數似然函數函數為

其中是一個k(n+1)的矩陣，代表這k個類的所有訓練參數，每個類的參數是一個n+1維的向量。所以在softmax回歸中將x分類為類別 l 的概率為

跟Logistic回歸一樣，softmax也可以用梯度下降法或者牛頓迭代法求解，對對數似然函數求偏導數，得到

然后我們可以通過梯度上升法來更新參數

注意這里是第 l 個類的所有參數，它是一個向量。

在softmax回歸中直接用上述對數似然函數是不能更新參數的，因為它存在冗余的參數，通常用牛頓方法中的Hessian矩陣也不可逆，是一個非凸函數，那么可以通過添加一個權重衰減項來修改代價函數，使得代價函數是凸函數，并且得到的Hessian矩陣可逆。更多詳情參考如下鏈接。
鏈接：http://deeplearning.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92

Softmax回歸

在 softmax回歸中，我們解決的是多分類問題（相對于 logistic 回歸解決的二分類問題），類標 y 可以取 k 個不同的值（而不是 2 個）。因此，對于訓練集，我們有。（注意此處的類別下標從 1 開始，而不是 0）。

對于給定的測試輸入 x，我們想用假設函數針對每一個類別j估算出概率值 p(y=j | x)。也就是說，我們想估計 x 的每一種分類結果出現的概率。因此，我們的假設函數將要輸出一個 k 維的向量（向量元素的和為1）來表示這 k 個估計的概率值。具體地說，我們的假設函數形式如下：

其中是模型的參數。請注意這一項對概率分布進行歸一化，使得所有概率之和為 1

代價函數

這個公式是從邏輯回歸的代價函數推廣而來的。

注意在Softmax回歸中將 x 分類為類別 j的概率為：

備注

邏輯回歸的代價函數是根據極大似然估計推理得來，Softmax的代價函數也類似。

在邏輯回歸中我們梯度下降法求解最優值，Softmax回歸也是用梯度下降法求解最優值，梯度公式如下：

Softmax回歸模型參數具有“冗余”性

冗余性指的是最優解不止一個，有多個。假設我們從參數向量中減去了向量，這時，每一個都變成了。此時假設函數變成了以下的式子：

我們看到，從中減去完全不影響假設函數的預測結果！這就是Softmax回歸的冗余性。

權重衰減

針對上述的冗余性，我們應該怎么辦呢？權重衰減可以解決這個問題。
我們通過添加一個權重衰減項來修改代價函數，這個衰減項會懲罰過大的參數值，現在我們的代價函數變為：

有了這個權重衰減項以后，代價函數就變成了嚴格的凸函數，這樣就可以保證得到唯一的解了。

為了使用優化算法，我們需要求得這個新函數的導數，如下：

通過最小化，我們就能實現一個可用的 softmax 回歸模型。

Softmax回歸與Logistic回歸的關系

當類別數 k = 2 時，softmax 回歸退化為 logistic 回歸。這表明 softmax 回歸是 logistic 回歸的一般形式。具體地說，當 k = 2 時，softmax 回歸的假設函數為

利用softmax回歸參數冗余的特點，我們令，并且從兩個參數向量中都減去向量，得到:

因此，用來表示，我們就會發現 softmax 回歸器預測其中一個類別的概率為，另一個類別概率的為，這與 logistic回歸是一致的。

Softmax 回歸 vs. k 個二元分類器

如果你在開發一個音樂分類的應用，需要對k種類型的音樂進行識別，那么是選擇使用 softmax 分類器呢，還是使用 logistic 回歸算法建立 k 個獨立的二元分類器呢？

這一選擇取決于你的類別之間是否互斥，例如，如果你有四個類別的音樂，分別為：古典音樂、鄉村音樂、搖滾樂和爵士樂，那么你可以假設每個訓練樣本只會被打上一個標簽（即：一首歌只能屬于這四種音樂類型的其中一種），此時你應該使用類別數 k = 4 的softmax回歸。（如果在你的數據集中，有的歌曲不屬于以上四類的其中任何一類，那么你可以添加一個“其他類”，并將類別數 k 設為5。）

如果你的四個類別如下：人聲音樂、舞曲、影視原聲、流行歌曲，那么這些類別之間并不是互斥的。例如：一首歌曲可以來源于影視原聲，同時也包含人聲。這種情況下，使用4個二分類的 logistic 回歸分類器更為合適。這樣，對于每個新的音樂作品，我們的算法可以分別判斷它是否屬于各個類別。

現在我們來看一個計算視覺領域的例子，你的任務是將圖像分到三個不同類別中。
(i) 假設這三個類別分別是：室內場景、戶外城區場景、戶外荒野場景。你會使用sofmax回歸還是 3個logistic 回歸分類器呢？
(ii) 現在假設這三個類別分別是室內場景、黑白圖片、包含人物的圖片，你又會選擇 softmax 回歸還是多個 logistic 回歸分類器呢？

在第一個例子中，三個類別是互斥的，因此更適于選擇softmax回歸分類器 。
而在第二個例子中，建立三個獨立的 logistic回歸分類器更加合適。

實驗步驟

1.初始化參數，加載訓練數據集。注意：MNIST手寫數字數據集所有圖片的每個像素灰度值都已經被歸一化到了[0，1]之間，所以將來如果是用自己的訓練樣本，不要忘記歸一化像素值。
2.矢量化編程實現softmax回歸的代價函數及其梯度，即softmaxCost.m文件。
3.利用computeNumericalGradient函數檢查上一步中的梯度計算是否正確，該函數見Deep Learning一：Sparse Autoencoder練習（斯坦福大學UFLDL深度學習教程）。
4.用用L-BFGS算法訓練softmax回歸模型，得到模型softmaxModel，見softmaxTrain.m中的softmaxTrain函數
5.加載測試數據集，用上一步訓練得到模型softmaxModel來對測試數據進行分類，得到分類結果（見softmaxPredict.m），然后計算正確率。

softmaxExercise.m

%% CS294A/CS294W Softmax Exercise % Instructions % ------------ % % This file contains code that helps you get started on the % softmax exercise. You will need to write the softmax cost function % in softmaxCost.m and the softmax prediction function in softmaxPred.m. % For this exercise, you will not need to change any code in this file, % or any other files other than those mentioned above. % (However, you may be required to do so in later exercises)%%====================================================================== %% STEP 0: Initialise constants and parameters % % Here we define and initialise some constants which allow your code % to be used more generally on any arbitrary input. % We also initialise some parameters used for tuning the model.inputSize = 28 * 28; % Size of input vector (MNIST images are 28x28)特征維數 numClasses = 10; % Number of classes (MNIST images fall into 10 classes)樣本類別數lambda = 1e-4; % Weight decay parameter衰減項權重%%====================================================================== %% STEP 1: Load data % % In this section, we load the input and output data. % For softmax regression on MNIST pixels, % the input data is the images, and % the output data is the labels. %% Change the filenames if you've saved the files under different names % On some platforms, the files might be saved as % train-images.idx3-ubyte / train-labels.idx1-ubyte addpath mnist/ images = loadMNISTImages('mnist/train-images.idx3-ubyte');%每一列為MNIST數據集中的一個樣本的特征向量 labels = loadMNISTLabels('mnist/train-labels.idx1-ubyte');% 每個樣本的類標 labels(labels==0) = 10; % Remap 0 to 10inputData = images; clear images % For debugging purposes, you may wish to reduce the size of the input data % in order to speed up gradient checking. % Here, we create synthetic dataset using random data for testingDEBUG = true; % Set DEBUG to true when debugging. if DEBUGinputSize = 8;inputData = randn(8, 100);labels = randi(10, 100, 1); end%它的維數是k*n，k是類別數numClasses，n是輸入樣本的特征維數inputSize % Randomly initialise theta theta = 0.005 * randn(numClasses * inputSize, 1); %輸入的是一個列向量%%====================================================================== %% STEP 2: Implement softmaxCost % % Implement softmaxCost in softmaxCost.m. [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, inputData, labels);%%====================================================================== %% STEP 3: Gradient checking % % As with any learning algorithm, you should always check that your % gradients are correct before learning the parameters. % if DEBUGnumGrad = computeNumericalGradient( @(x) softmaxCost(x, numClasses, ...inputSize, lambda, inputData, labels), theta);% Use this to visually compare the gradients side by sidedisp([numGrad grad]); % Compare numerically computed gradients with those computed analyticallydiff = norm(numGrad-grad)/norm(numGrad+grad);disp(diff); % The difference should be small. % In our implementation, these values are usually less than 1e-7.% When your gradients are correct, congratulations! end%%====================================================================== %% STEP 4: Learning parameters % % Once you have verified that your gradients are correct, % you can start training your softmax regression code using softmaxTrain % (which uses minFunc).options.maxIter = 100; %softmaxModel其實只是一個結構體，里面包含了學習到的最優參數以及輸入尺寸大小和類別個數信息 softmaxModel = softmaxTrain(inputSize, numClasses, lambda, ...inputData, labels, options);% Although we only use 100 iterations here to train a classifier for the % MNIST data set, in practice, training for more iterations is usually % beneficial.%%====================================================================== %% STEP 5: Testing % % You should now test your model against the test images. % To do this, you will first need to write softmaxPredict % (in softmaxPredict.m), which should return predictions % given a softmax model and the input data.images = loadMNISTImages('mnist/t10k-images.idx3-ubyte'); labels = loadMNISTLabels('mnist/t10k-labels.idx1-ubyte'); labels(labels==0) = 10; % Remap 0 to 10inputData = images; clear images size(softmaxModel.optTheta) size(inputData) % You will have to implement softmaxPredict in softmaxPredict.m [pred] = softmaxPredict(softmaxModel, inputData);acc = mean(labels(:) == pred(:)); fprintf('Accuracy: %0.3f%%\n', acc * 100);% Accuracy is the proportion of correctly classified images % After 100 iterations, the results for our implementation were: % % Accuracy: 92.200% % % If your values are too low (accuracy less than 0.91), you should check % your code for errors, and make sure you are training on the % entire data set of 60000 28x28 training images % (unless you modified the loading code, this should be the case)

softmaxCost. m

function [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, data, labels)% numClasses - the number of classes % inputSize - the size N of the input vector % lambda - weight decay parameter % data - the N x M input matrix, where each column data(:, i) corresponds to % a single test set % labels - an M x 1 matrix containing the labels corresponding for the input data %% Unroll the parameters from theta theta = reshape(theta, numClasses, inputSize);%將輸入的參數列向量變成一個矩陣numCases = size(data, 2);%輸入樣本的個數groundTruth = full(sparse(labels, 1:numCases, 1)); %產生一個100*100的矩陣，它的第labels(i)行第i列的元素值為1，其余全為0，其中i為1到numCases，即：1到100 cost = 0;thetagrad = zeros(numClasses, inputSize);%% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute the cost and gradient for softmax regression. % You need to compute thetagrad and cost. % The groundTruth matrix might come in handy.M = bsxfun(@minus,theta*data,max(theta*data, [], 1)); % max(theta*data, [], 1)返回theta*data每一列的最大值，返回值為行向量 % theta*data的每個元素值都減去其對應列的最大值，即：把每一列的最大值都置為0了 % 這一步的目的是防止下一步計算指數函數時溢出 M = exp(M); p = bsxfun(@rdivide, M, sum(M)); cost = -1/numCases * groundTruth(:)' * log(p(:)) + lambda/2 * sum(theta(:) .^ 2); thetagrad = -1/numCases * (groundTruth - p) * data' + lambda * theta;% ------------------------------------------------------------------ % Unroll the gradient matrices into a vector for minFunc grad = [thetagrad(:)]; end

softmaxPredict.m

function [pred] = softmaxPredict(softmaxModel, data)% softmaxModel - model trained using softmaxTrain % data - the N x M input matrix, where each column data(:, i) corresponds to % a single test set % % Your code should produce the prediction matrix % pred, where pred(i) is argmax_c P(y(c) | x(i)).% Unroll the parameters from theta theta = softmaxModel.optTheta; % this provides a numClasses x inputSize matrix pred = zeros(1, size(data, 2));%% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute pred using theta assuming that the labels start % from 1.% t=theta*data; % [~,pred]=max(t,[],1); % pred=pred'; [nop, pred] = max(theta * data);% pred= max(peed_temp);% ---------------------------------------------------------------------end

參考文獻

softmax回歸

Coursera《machine learning》–（6）邏輯回歸

機器學習—-Softmax回歸

Logistic and Softmax Regression (邏輯回歸跟Softmax回歸)

Softmax回歸

Deep learning：十三(Softmax Regression)

Deep Learning 6_深度學習UFLDL教程：Softmax Regression_Exercise（斯坦福大學深度學習教程）

吳恩達 Andrew Ng 的公開課

總結

以上是生活随笔為你收集整理的UFLDL教程：Exercise:Softmax Regression的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：拉卡拉收款宝会员有什么用？会员等级决定交
下一篇： UFLDL教程: Exercise:Se