當(dāng)前位置：首頁 >

机器学习week9 ex8 review

發(fā)布時間：2023/12/15 41 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习week9 ex8 review 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

機器學(xué)習(xí)week9 ex8 review

這周學(xué)習(xí)異常監(jiān)測, 第一部分完成對一個網(wǎng)絡(luò)中故障的服務(wù)器的監(jiān)測。第二部分使用協(xié)同過濾來實現(xiàn)一個電影推薦系統(tǒng)。

1 Anomaly Detection

監(jiān)測服務(wù)器工作狀態(tài)的指標(biāo)：吞吐量（throughput）、延遲（latency）。
我們有的無標(biāo)簽數(shù)據(jù)集，這里認(rèn)為其中絕大多數(shù)都是正常工作的服務(wù)器，其中少量是異常狀態(tài)。
先通過散點圖來直觀判斷。

1.1 Gaussian distribution

對數(shù)據(jù)的分布情況選擇一個模型。
高斯分布的公式如下：

其中是平均值，是標(biāo)準(zhǔn)差。

1.2 Estimating parameters for Gaussian distribution

根據(jù)如下公式計算高斯分布的參數(shù)：

完成estimateGaussian.m如下：

function [mu sigma2] = estimateGaussian(X) %ESTIMATEGAUSSIAN This function estimates the parameters of a %Gaussian distribution using the data in X % [mu sigma2] = estimateGaussian(X), % The input X is the dataset with each n-dimensional data point in one row % The output is an n-dimensional vector mu, the mean of the data set % and the variances sigma^2, an n x 1 vector % % Useful variables [m, n] = size(X);% You should return these values correctly mu = zeros(n, 1); sigma2 = zeros(n, 1);% ====================== YOUR CODE HERE ====================== % Instructions: Compute the mean of the data and the variances % In particular, mu(i) should contain the mean of % the data for the i-th feature and sigma2(i) % should contain variance of the i-th feature. %mu = mean(X); sigma2 = var(X,1); % choose the way to divide by N rather than N-1% =============================================================end

完成之后，腳本文件會執(zhí)行繪制等高線的操作，即得到如下圖像：

1.3 Selecting the threshold

以為臨界值，的情況被認(rèn)為是異常狀況。
通過交叉驗證集來選擇這樣的。
交叉驗證集中的數(shù)據(jù)是帶標(biāo)簽的。根據(jù)之前學(xué)到的來評價選擇的優(yōu)劣。

其中分別代表true positive,false positive, false negative。

function [bestEpsilon bestF1] = selectThreshold(yval, pval) %SELECTTHRESHOLD Find the best threshold (epsilon) to use for selecting %outliers % [bestEpsilon bestF1] = SELECTTHRESHOLD(yval, pval) finds the best % threshold to use for selecting outliers based on the results from a % validation set (pval) and the ground truth (yval). %bestEpsilon = 0; bestF1 = 0; F1 = 0;stepsize = (max(pval) - min(pval)) / 1000; for epsilon = min(pval):stepsize:max(pval)% ====================== YOUR CODE HERE ======================% Instructions: Compute the F1 score of choosing epsilon as the% threshold and place the value in F1. The code at the% end of the loop will compare the F1 score for this% choice of epsilon and set it to be the best epsilon if% it is better than the current choice of epsilon.% % Note: You can use predictions = (pval < epsilon) to get a binary vector% of 0's and 1's of the outlier predictionsprediction = (pval < epsilon);tp = sum((prediction == 1) & (yval == 1)); % true positivefp = sum((prediction == 1) & (yval == 0)); % false positivefn = sum((prediction == 0) & (yval == 1)); % false negativeprec = tp / (tp + fp); % precisionrec = tp / (tp + fn); % recallF1 = 2 * prec * rec/ (prec + rec); % F1% =============================================================if F1 > bestF1bestF1 = F1;bestEpsilon = epsilon;end endend

按照選定的，判斷異常情況如下圖：

1.4 High dimensional Dataset

對上述函數(shù)，換用更高維的數(shù)據(jù)集。（11 features）
與之前2維的情況并沒有什么區(qū)別。

2 Recommender system

對關(guān)于電影評分的數(shù)據(jù)集使用協(xié)同過濾算法，實現(xiàn)推薦系統(tǒng)。
Datasets來源：MoiveLens 100k Datasets.
對矩陣可視化：

作為對比，四階單位矩陣可視化情況如下：

2.1 Movie rating dataset

矩陣 (大小為num_movies num_users);
矩陣 ( 表示電影被用戶評分過).

2.2 Collaborating filtering learning algorithm

整個2.2都是對cofiCostFunc.m的處理。
原文件中提供的代碼如下：

function [J, grad] = cofiCostFunc(params, Y, R, num_users, num_movies, ...num_features, lambda) %COFICOSTFUNC Collaborative filtering cost function % [J, grad] = COFICOSTFUNC(params, Y, R, num_users, num_movies, ... % num_features, lambda) returns the cost and gradient for the % collaborative filtering problem. %% Unfold the U and W matrices from params X = reshape(params(1:num_movies*num_features), num_movies, num_features); Theta = reshape(params(num_movies*num_features+1:end), ...num_users, num_features);% You need to return the following values correctly J = 0; X_grad = zeros(size(X)); Theta_grad = zeros(size(Theta));% ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost function and gradient for collaborative % filtering. Concretely, you should first implement the cost % function (without regularization) and make sure it is % matches our costs. After that, you should implement the % gradient and use the checkCostFunction routine to check % that the gradient is correct. Finally, you should implement % regularization. % % Notes: X - num_movies x num_features matrix of movie features % Theta - num_users x num_features matrix of user features % Y - num_movies x num_users matrix of user ratings of movies % R - num_movies x num_users matrix, where R(i, j) = 1 if the % i-th movie was rated by the j-th user % % You should set the following variables correctly: % % X_grad - num_movies x num_features matrix, containing the % partial derivatives w.r.t. to each element of X % Theta_grad - num_users x num_features matrix, containing the % partial derivatives w.r.t. to each element of Theta %% =============================================================grad = [X_grad(:); Theta_grad(:)];end

2.2.1 Collaborating filtering cost function

未經(jīng)過regularization的代價函數(shù)如下：

故增加如下代碼：

diff = (X * Theta' - Y); vari = diff.^2; J = 1/2 * sum(vari(R == 1));

2.2.2 Collaborating filtering gradient

公式如下：

按照文檔里的Tips進(jìn)行向量化，加入如下代碼：

for i = 1: num_movies,X_grad(i,:) = sum(((diff(i,:).* R(i,:))'.* Theta)); end;for j = 1: num_users,Theta_grad(j,:) = sum(((diff(:,j).* R(:,j)) .* X)); end;

想了一會，發(fā)現(xiàn)好像可以更徹底地向量化：

X_grad = diff.* R * Theta; Theta_grad = (diff.*R)' * X;

2.2.3 Regularized cost function

2.2.4 Regularized gradient

只需要在上述代碼中加入regularization的部分即可。
如下：

J = 1/2 * sum(vari(R == 1)) + lambda/2 * (sum((Theta.^2)(:)) + sum((X.^2)(:)));X_grad = diff.*R*Theta + lambda * X; Theta_grad = (diff.*R)' * X + lambda * Theta;

2.3 Learning movie recommendations

2.3.1 Recommendations

在腳本文件中填入自己對movie_list.txt中部分電影的評分。
似乎提供的電影都是新世紀(jì)以前上映的，因此我沒有看過太多。我挑選了如下幾部評分：

推薦系統(tǒng)給我推薦了如下電影：

我沒有辦法判斷準(zhǔn)不準(zhǔn)，因為我一部也沒有看過。但隨便搜了其中的幾部，感覺我可能并不會喜歡。
也許是我提供的樣本太小了，也許是這個推薦系統(tǒng)太簡陋了吧。

轉(zhuǎn)載于:https://www.cnblogs.com/EtoDemerzel/p/7919953.html

總結(jié)

以上是生活随笔為你收集整理的机器学习week9 ex8 review的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：微软历史最高市值是多少？
下一篇：【296天】我爱刷题系列055（2017