當前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

李宏毅深度学习作业二

發布時間：2025/3/21 pytorch 51 豆豆

生活随笔收集整理的這篇文章主要介紹了李宏毅深度学习作业二小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

任務說明

Binary classification is one of the most fundamental problem in machine learning. In this tutorial, you are going to build linear binary classifiers to predict whether the income of an indivisual exceeds 50,000 or not. We presented a discriminative and a generative approaches, the logistic regression(LR) and the linear discriminant anaysis(LDA). You are encouraged to compare the differences between the two, or explore more methodologies.

總結：在本次作業中，需要寫一個線性二元分類器，根據人們的個人信息，判斷其年收入是否高于 50,000 美元。作業將用logistic regression 與 generative model兩種模型來實現目標，并且鼓勵比較兩者異同點

數據說明

這個資料集是由 UCI Machine Learning Repository 的 Census-Income (KDD) Data Set) 經過一些處理而得來。

事實上在訓練過程中，只有 X_train、Y_train 和 X_test 這三個經過處理的檔案會被使用到，train.csv 和 test.csv 這兩個原始資料檔則可以提供你一些額外的資訊。

原數據經過了如下處理

移除了一些不必要的數據
對離散值進行了one-hot編碼
稍微平衡了正負標記的數據的比例的處理

X_train與X_test格式相同，利用jupter notebook打開X_train

import numpy as np import pandas as pdnp.random.seed(0) X_train_fpath = '/Users/zhucan/Desktop/李宏毅深度學習作業/第二次作業/X_train' Y_train_fpath = '/Users/zhucan/Desktop/李宏毅深度學習作業/第二次作業/Y_train' X_test_fpath = '/Users/zhucan/Desktop/李宏毅深度學習作業/第二次作業/X_test' output_fpath = './output_{}.csv'data = pd.read_csv(X_train_fpath,index_col=0) data

結果：

第一行是表頭，從第二行開始是具體數據，表頭是人們的個人信息，比如年齡、性別、學歷、婚姻狀況、孩子個數等等。

打開Y_train文件

target = pd.read_csv(Y_train_fpath,index_col=0) target

結果：

只有兩列，第一列是人的編號（ID），第二列是一個label——如果年收入＞50K美元，label就是1；如果年收入≤50K美元，label就是0.

任務目標

輸入：人們的個人信息

輸出：0（年收入≤50K）或1（年收入＞50K）

模型：logistic regression 或者 generative model

任務解答

預處理

將數據轉化為array

import numpy as npnp.random.seed(0)X_train_fpath = '/Users/zhucan/Desktop/李宏毅深度學習作業/第二次作業/X_train' Y_train_fpath = '/Users/zhucan/Desktop/李宏毅深度學習作業/第二次作業/Y_train' X_test_fpath = '/Users/zhucan/Desktop/李宏毅深度學習作業/第二次作業/X_test' output_fpath = './output_{}.csv'# Parse csv files to numpy array with open(X_train_fpath) as f:next(f) #next()讀取下一行X_train = np.array([line.strip('\n').split(',')[1:] for line in f], dtype = float) with open(Y_train_fpath) as f:next(f)Y_train = np.array([line.strip('\n').split(',')[1] for line in f], dtype = float) with open(X_test_fpath) as f:next(f)X_test = np.array([line.strip('\n').split(',')[1:] for line in f], dtype = float)print(X_train) print(Y_train) print(X_test) out: [[33. 1. 0. ... 52. 0. 1.][63. 1. 0. ... 52. 0. 1.][71. 0. 0. ... 0. 0. 1.]...[16. 0. 0. ... 8. 1. 0.][48. 1. 0. ... 52. 0. 1.][48. 0. 0. ... 0. 0. 1.]][1. 0. 0. ... 0. 0. 0.][[37. 1. 0. ... 52. 0. 1.][48. 1. 0. ... 52. 0. 1.][68. 0. 0. ... 0. 1. 0.]...[38. 1. 0. ... 52. 0. 1.][17. 0. 0. ... 40. 1. 0.][22. 0. 0. ... 25. 1. 0.]]

標準化

定義一個標準化函數_normalize()：

X：是指需要處理的數據
train：布爾變量，True表示訓練集，False表示測試集
specified_column：定義了需要被標準化的列。如果輸入為None，則表示所有列都需要被標準化。
X_mean：訓練集中每一列的均值。
X_std：訓練集中每一列的方差。

然后，對X_train和X_test分別調用該函數，完成標準化。

def _normalize(X, train = True, specified_column = None, X_mean = None, X_std = None):if specified_column == None:specified_column = np.arange(X.shape[1])if train:X_mean = np.mean(X[:, specified_column] ,0).reshape(1, -1)X_std = np.std(X[:, specified_column], 0).reshape(1, -1)X[:,specified_column] = (X[:, specified_column] - X_mean) / (X_std + 1e-8) #1e-8防止除零return X, X_mean, X_std# 標準化訓練數據和測試數據 X_train, X_mean, X_std = _normalize(X_train, train = True) X_test, _, _= _normalize(X_test, train = False, specified_column = None, X_mean = X_mean, X_std = X_std) # 用 _ 這個變量來存儲函數返回的無用值 out: [[-0.42755297 0.99959459 -0.1822401 ... 0.80645986 -1.014855231.01485523][ 1.19978055 0.99959459 -0.1822401 ... 0.80645986 -1.014855231.01485523][ 1.63373616 -1.00040556 -0.1822401 ... -1.4553617 -1.014855231.01485523]...[-1.34970863 -1.00040556 -0.1822401 ... -1.10738915 0.9853622-0.9853622 ][ 0.38611379 0.99959459 -0.1822401 ... 0.80645986 -1.014855231.01485523][ 0.38611379 -1.00040556 -0.1822401 ... -1.4553617 -1.014855231.01485523]] [[-0.21057517 0.99959459 -0.1822401 ... 0.80645986 -1.014855231.01485523][ 0.38611379 0.99959459 -0.1822401 ... 0.80645986 -1.014855231.01485523][ 1.47100281 -1.00040556 -0.1822401 ... -1.4553617 0.9853622-0.9853622 ]...[-0.15633072 0.99959459 -0.1822401 ... 0.80645986 -1.014855231.01485523][-1.29546418 -1.00040556 -0.1822401 ... 0.28450104 0.9853622-0.9853622 ][-1.02424193 -1.00040556 -0.1822401 ... -0.36794749 0.9853622-0.9853622 ]]

分割測試集與驗證集

對原來的X_train進行分割，分割比例為train:dev = 9:1。這里沒有shuffle，是固定分割

def _train_dev_split(X, Y, dev_ratio = 0.25):# This function spilts data into training set and development set.train_size = int(len(X) * (1 - dev_ratio))return X[:train_size], Y[:train_size], X[train_size:], Y[train_size:]# 把數據分成訓練集和驗證集 # 這里的Development set即為驗證集 dev_ratio = 0.1 X_train, Y_train, X_dev, Y_dev = _train_dev_split(X_train, Y_train, dev_ratio = dev_ratio)train_size = X_train.shape[0] #訓練集 dev_size = X_dev.shape[0] #驗證集 test_size = X_test.shape[0] #測試集 data_dim = X_train.shape[1] print('Size of training set: {}'.format(train_size)) print('Size of development set: {}'.format(dev_size)) print('Size of testing set: {}'.format(test_size)) print('Dimension of data: {}'.format(data_dim)) out: Size of training set: 48830 Size of development set: 5426 Size of testing set: 27622 Dimension of data: 510

這幾個函數可能會在訓練中被重復使用到

def _shuffle(X, Y):# This function shuffles two equal-length list/array, X and Y, together.randomize = np.arange(len(X))np.random.shuffle(randomize)return (X[randomize], Y[randomize])def _sigmoid(z):# Sigmoid function can be used to calculate probability.# To avoid overflow, minimum/maximum output value is set.return np.clip(1 / (1.0 + np.exp(-z)), 1e-8, 1 - (1e-8))# 該函數的作用是將數組a中的所有數限定到范圍1e-8和1 - (1e-8)之中。# 部分參數解釋：# a_min：被限定的最小值，所有比 1e-8 小的數都會強制變為 1e-8；# a_max：被限定的最大值，所有比 1 - (1e-8) 大的數都會強制變為1 - (1e-8)；def _f(X, w, b):# This is the logistic regression function, parameterized by w and b# Arguements:# X: input data, shape = [batch_size, data_dimension]# w: weight vector, shape = [data_dimension, ]# b: bias, scalar# Output:# predicted probability of each row of X being positively labeled, shape = [batch_size, ]return _sigmoid(np.matmul(X, w) + b)def _predict(X, w, b):# This function returns a truth value prediction for each row of X # by rounding the result of logistic regression function.return np.round(_f(X, w, b)).astype(np.int)def _accuracy(Y_pred, Y_label):# This function calculates prediction accuracyacc = 1 - np.mean(np.abs(Y_pred - Y_label))return acc

Logistic Regression

損失函數（交叉熵的求和）和梯度

def _cross_entropy_loss(y_pred, Y_label):# This function computes the cross entropy.## Arguements:# y_pred: probabilistic predictions, float vector# Y_label: ground truth labels, bool vector# Output:# cross entropy, scalarcross_entropy = -np.dot(Y_label, np.log(y_pred)) - np.dot((1 - Y_label), np.log(1 - y_pred))return cross_entropy def _gradient(X, Y_label, w, b):# This function computes the gradient of cross entropy loss with respect to weight w and bias b.y_pred = _f(X, w, b)pred_error = Y_label - y_predw_grad = -np.sum(pred_error * X.T, 1)b_grad = -np.sum(pred_error)return w_grad, b_grad

Training

使用小批次（mini-batch）的梯度下降法來訓練。訓練數據被分為許多小批次，針對每一個小批次，我們分別計算其梯度以及損失，并根據該批次來更新模型的參數。當一次循環（iteration）完成，也就是整個訓練集的所有小批次都被使用過一次以后，我們將所有訓練數據打散并且重新分成新的小批次，進行下一個循環，直到事先設定的循環數量（iteration）達成為止。

GD（Gradient Descent）：就是沒有利用Batch Size，用基于整個數據庫得到梯度，梯度準確，但數據量大時，計算非常耗時，同時神經網絡常是非凸的，網絡最終可能收斂到初始點附近的局部最優點。
SGD（Stochastic Gradient Descent）：就是Batch Size=1，每次計算一個樣本，梯度不準確，所以學習率要降低。
mini-batch SGD：就是選著合適Batch Size的SGD算法，mini-batch利用噪聲梯度，一定程度上緩解了GD算法直接掉進初始點附近的局部最優值。同時梯度準確了，學習率要加大。

# 初始化權重w和b，令它們都為0 w = np.zeros((data_dim,)) #[0,0,0,...,0] b = np.zeros((1,)) #[0]# 訓練時的超參數 max_iter = 10 batch_size = 8 learning_rate = 0.2# 保存每個iteration的loss和accuracy，以便后續畫圖 train_loss = [] dev_loss = [] train_acc = [] dev_acc = []# 累計參數更新的次數 step = 1# 迭代訓練 for epoch in range(max_iter):# 在每個epoch開始時，隨機打散訓練數據X_train, Y_train = _shuffle(X_train, Y_train)# Mini-batch訓練for idx in range(int(np.floor(train_size / batch_size))):X = X_train[idx*batch_size:(idx+1)*batch_size]Y = Y_train[idx*batch_size:(idx+1)*batch_size]# 計算梯度w_grad, b_grad = _gradient(X, Y, w, b)# 梯度下降法更新# 學習率隨時間衰減w = w - learning_rate/np.sqrt(step) * w_gradb = b - learning_rate/np.sqrt(step) * b_gradstep = step + 1# 計算訓練集和驗證集的loss和accuracyy_train_pred = _f(X_train, w, b)Y_train_pred = np.round(y_train_pred)train_acc.append(_accuracy(Y_train_pred, Y_train))train_loss.append(_cross_entropy_loss(y_train_pred, Y_train) / train_size)y_dev_pred = _f(X_dev, w, b)Y_dev_pred = np.round(y_dev_pred)dev_acc.append(_accuracy(Y_dev_pred, Y_dev))dev_loss.append(_cross_entropy_loss(y_dev_pred, Y_dev) / dev_size)#輸出最后一個值 print('Training loss: {}'.format(train_loss[-1])) print('Development loss: {}'.format(dev_loss[-1])) print('Training accuracy: {}'.format(train_acc[-1])) print('Development accuracy: {}'.format(dev_acc[-1]))

結果：

Training loss: 0.27375098820698607 Development loss: 0.29846019916163835 Training accuracy: 0.8825107515871391 Development accuracy: 0.877441946185035

畫出loss和accuracy的曲線?

import matplotlib.pyplot as plt# Loss curve plt.plot(train_loss) plt.plot(dev_loss) plt.title('Loss') plt.legend(['train', 'dev']) plt.savefig('loss.png') plt.show()# Accuracy curve plt.plot(train_acc) plt.plot(dev_acc) plt.title('Accuracy') plt.legend(['train', 'dev']) plt.savefig('acc.png') plt.show()

結果：

?測試

output_fpath = './output_{}.csv'# Predict testing labels predictions = _predict(X_test, w, b) with open(output_fpath.format('logistic'), 'w') as f:f.write('id,label\n')for i, label in enumerate(predictions):f.write('{},{}\n'.format(i, label))# Print out the most significant weights ind = np.argsort(np.abs(w))[::-1] with open(X_test_fpath) as f:content = f.readline().strip('\n').split(',') features = np.array(content) for i in ind[0:10]:print(features[i], w[i])

結果：

Other Rel <18 never married RP of subfamily -1.5156535032617535Other Rel <18 ever marr RP of subfamily -1.2493025752946474Unemployed full-time 1.14893439607246471 0.8323252735693378Italy -0.7951922604515268Neither parent present -0.7749673709650178Kentucky -0.7717486769177805 num persons worked for employer 0.7617890642364086Householder -0.753455652297259 dividends from stocks -0.6728525747897033

概率生成模型（Porbabilistic generative model）

訓練集與測試集的處理方法跟 logistic regression 一模一樣，然而因為 generative model 有可解析的最佳解，因此不必使用到驗證集（development set）

數據預處理

# Parse csv files to numpy array with open(X_train_fpath) as f:next(f)X_train = np.array([line.strip('\n').split(',')[1:] for line in f], dtype = float) with open(Y_train_fpath) as f:next(f)Y_train = np.array([line.strip('\n').split(',')[1] for line in f], dtype = float) with open(X_test_fpath) as f:next(f)X_test = np.array([line.strip('\n').split(',')[1:] for line in f], dtype = float)# Normalize training and testing data X_train, X_mean, X_std = _normalize(X_train, train = True) X_test, _, _= _normalize(X_test, train = False, specified_column = None, X_mean = X_mean, X_std = X_std)

均值和協方差矩陣

# 分別計算類別0和類別1的均值 X_train_0 = np.array([x for x, y in zip(X_train, Y_train) if y == 0]) X_train_1 = np.array([x for x, y in zip(X_train, Y_train) if y == 1])mean_0 = np.mean(X_train_0, axis = 0) mean_1 = np.mean(X_train_1, axis = 0) # 分別計算類別0和類別1的協方差 cov_0 = np.zeros((data_dim, data_dim)) cov_1 = np.zeros((data_dim, data_dim))for x in X_train_0:cov_0 += np.dot(np.transpose([x - mean_0]), [x - mean_0]) / X_train_0.shape[0] for x in X_train_1:cov_1 += np.dot(np.transpose([x - mean_1]), [x - mean_1]) / X_train_1.shape[0]# 共享協方差 = 獨立的協方差的加權求和 cov = (cov_0 * X_train_0.shape[0] + cov_1 * X_train_1.shape[0]) / (X_train_0.shape[0] + X_train_1.shape[0])

計算權重和偏差?

權重矩陣與偏差向量可以直接被計算出來，詳情可見視頻P10 Classification：

# 計算協方差矩陣的逆 # 協方差矩陣可能是奇異矩陣, 直接使用np.linalg.inv() 可能會產生錯誤 # 通過SVD矩陣分解，可以快速準確地獲得方差矩陣的逆 u, s, v = np.linalg.svd(cov, full_matrices=False) inv = np.matmul(v.T * 1 / s, u.T)# 計算w和b w = np.dot(inv, mean_0 - mean_1) b = (-0.5) * np.dot(mean_0, np.dot(inv, mean_0)) + 0.5 * np.dot(mean_1, np.dot(inv, mean_1))\+ np.log(float(X_train_0.shape[0]) / X_train_1.shape[0]) # 計算訓練集上的準確率 Y_train_pred = 1 - _predict(X_train, w, b) #這邊別和邏輯回歸弄混了_predict(X_train, w, b)算出來是屬于第0類的概率 print('Training accuracy: {}'.format(_accuracy(Y_train_pred, Y_train)))

結果：

Training accuracy: 0.8719404305514598

預測：

# Predict testing labels predictions = 1 - _predict(X_test, w, b) with open(output_fpath.format('generative'), 'w') as f:f.write('id,label\n')for i, label in enumerate(predictions):f.write('{},{}\n'.format(i, label))# Print out the most significant weights ind = np.argsort(np.abs(w))[::-1] with open(X_test_fpath) as f:content = f.readline().strip('\n').split(',') features = np.array(content) for i in ind[0:10]:print(features[i], w[i])

結果：

Professional specialty -7.3757 6.8125Retail trade 6.7695312529 6.7109375MSA to nonMSA -6.5Finance insurance and real estate -6.3125Different state same division 6.078125Abroad -6.0Sales -5.1562534 -5.041015625

模型修改

引入二次項

def _add_feature(X):X_2 = np.power(X,2)X = np.concatenate([X,X_2], axis=1)return X# 引入二次項 X_train = _add_feature(X_train) X_test = _add_feature(X_test)

?adagrad

# adagrad所需的累加和 adagrad_w = 0 adagrad_b = 0 # 防止adagrad除零 eps = 1e-8# 迭代訓練 for epoch in range(max_iter):# 在每個epoch開始時，隨機打散訓練數據X_train, Y_train = _shuffle(X_train, Y_train)# Mini-batch訓練for idx in range(int(np.floor(train_size / batch_size))):X = X_train[idx * batch_size:(idx + 1) * batch_size]Y = Y_train[idx * batch_size:(idx + 1) * batch_size]# 計算梯度w_grad, b_grad = _gradient(X, Y, w, b)adagrad_w += w_grad**2adagrad_b += b_grad**2# 梯度下降法adagrad更新w和bw = w - learning_rate / (np.sqrt(adagrad_w + eps)) * w_gradb = b - learning_rate / (np.sqrt(adagrad_b + eps)) * b_grad

總結

以上是生活随笔為你收集整理的李宏毅深度学习作业二的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。