當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

3、预测模型笔记

發(fā)布時間：2023/12/1 编程问答 40 豆豆

生活随笔收集整理的這篇文章主要介紹了 3、预测模型笔记小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

預(yù)測模型

1、簡介

預(yù)測建模（Predictive modeling）是一種用來預(yù)測系統(tǒng)未來行為的分析技術(shù)，它由一群能夠識別獨立輸入變量與反饋目標關(guān)聯(lián)關(guān)系的算法構(gòu)成。根據(jù)觀測值創(chuàng)建一個數(shù)學(xué)模型，然后用這個模型去預(yù)測未來發(fā)生的事情。
預(yù)測模型是用若干個可能對系統(tǒng)行為產(chǎn)生影響的特征構(gòu)建的，當(dāng)處理系統(tǒng)問題時，需要先判斷哪些因素可能會影響系統(tǒng)的行為，然后在訓(xùn)練模型之前將這些因素添加進特征中。

2、用SVM建立線性分類器

SVM是用來構(gòu)建分類器和回歸器的監(jiān)督學(xué)習(xí)模型，SVM通過對數(shù)學(xué)方程組求解，可以找出兩組數(shù)據(jù)之間的最佳分割邊界。下面先使用第2章的創(chuàng)建簡單分類器將數(shù)據(jù)分類并畫出。

# 1、加載數(shù)據(jù) import numpy as np import matplotlib.pyplot as plt input_file = 'data_multivar.txt'def load_data(input_file):X = []y = []with open(input_file, 'r') as f:for line in f.readlines():data = [float(x) for x in line.split(',')]X.append(data[:-1])y.append(data[-1])X=np.array(X)y = np.array(y)return X,y X,y=load_data(input_file)# 2、分類 class_0=np.array([X[i] for i in range(len(X)) if y[i]==0]) class_1=np.array([X[i] for i in range(len(X)) if y[i]==1]) print(class_0) # 3、畫圖 plt.figure() plt.scatter(class_0[:,0],class_0[:,1],facecolor='black',edgecolors='black',marker='s') plt.scatter(class_1[:,0],class_1[:,1],facecolor='none',edgecolors='black',marker='s') plt.show()# 定義畫圖函數(shù) def plot_classifier(classifier, X, y):# 獲取x，y的最大最小值，并設(shè)置余值x_min, x_max = min(X[:, 0]) - 1.0, max(X[:, 0] + 1.0)y_min, y_max = min(X[:, 1]) - 1.0, max(X[:, 1] + 1.0)# 設(shè)置網(wǎng)格步長step_size = 0.01# 設(shè)置網(wǎng)格x_values, y_values = np.meshgrid(np.arange(x_min, x_max, step_size), np.arange(y_min, y_max, step_size))# 計算出分類器的分類結(jié)果mesh_output = classifier.predict(np.c_[x_values.ravel(), y_values.ravel()])mesh_output = mesh_output.reshape(x_values.shape)# 畫圖plt.figure()# 選擇配色方案plt.pcolormesh(x_values, y_values, mesh_output, cmap=plt.cm.gray)# 畫點plt.scatter(X[:, 0], X[:, 1], c=y, s=80, edgecolors='black', linewidths=1, cmap=plt.cm.Paired)# 設(shè)置圖片取值范圍plt.xlim(x_values.min(), x_values.max())plt.ylim(y_values.min(), y_values.max())# 設(shè)置x與y軸plt.xticks((np.arange(int(min(X[:, 0]) - 1), int(max(X[:, 0]) + 1), 1.0)))plt.yticks((np.arange(int(min(X[:, 1]) - 1), int(max(X[:, 1]) + 1), 1.0)))plt.show()

如圖：

從上面的結(jié)果可以看出，空心和實心分別為不同類。下面使用SVM將不同類分開。結(jié)果如下圖：

代碼如下：

# 使用SVM from sklearn.svm import SVC from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=5) params = {'kernel': 'linear'} classifier = SVC(**params) # 訓(xùn)練線性SVM分類器，并查看結(jié)果邊界 classifier.fit(X_train, y_train) plot_classifier(classifier, X_train, y_train) # 測試數(shù)據(jù)集 y_test_pred = classifier.predict(X_test) plot_classifier(classifier, X_test, y_test) # 查看數(shù)據(jù)的精準性，訓(xùn)練數(shù)據(jù)集的分類報告 from sklearn.metrics import classification_report print(classification_report(y_train,classifier.predict(X_train),target_names=['Class-'+str(int(i)) for i in set(y)])) # 測試數(shù)據(jù)集的分類報告 print(classification_report(y_test,classifier.predict(X_test),target_names=['Class-'+str(int(i)) for i in set(y)]))

3、用SVM建立非線性分類器

從上面的分類報告可知，我們的分類情況并不理想，而且，最開始的數(shù)據(jù)可視化也可看出，實心完全被空心包圍著，所以，我們需要嘗試非線性分類器。
SVM為建立非線性分類器提供了許多選項，需要用不同的核函數(shù)建立費線性分類器。為了簡單起見，考慮一下兩種情況。

3.1、多項式函數(shù)

直接將線性中的params={'kernel':'linear'}替換成params={'kernel':'poly','degree':3}，其中degree表示三次多項式，隨著次數(shù)增加，可讓曲線變得更彎，但是訓(xùn)練時間也會越長，計算強度越大。結(jié)果如下圖：

3.2、徑向基函數(shù)（Radial Basis Function，RBF）

直接將線性中的params={'kernel':'linear'}替換成params={'kernel':'rbf'}，結(jié)果如下圖：

4、解決類型數(shù)量不平衡問題

在現(xiàn)實生活中，我們得到的數(shù)據(jù)可能會出現(xiàn)某類數(shù)據(jù)比其他類型數(shù)據(jù)多很多的情況，在這種情況下分類器會有偏差，邊界線也不會反應(yīng)數(shù)據(jù)的真實性，所以需要對數(shù)據(jù)進行調(diào)和。
換一個數(shù)據(jù)，以上面線性SVM為例，將數(shù)據(jù)換成data_multivar_imbalance.txt之后，顯示為

由圖可知，沒有邊界線，這是因為分類器不能區(qū)分這兩種類型，所以，將參數(shù)改為params={'kernel':'linear','class_weight':'balanced'}即可。結(jié)果如下：

5、提取置信度

當(dāng)一個信的數(shù)據(jù)點被分類為某一個已知類別時，我們可訓(xùn)練SVM來計算輸出類型的置信度。

input_datapoints = np.array([[2, 1.5], [8, 9], [4.8, 5.2], [4, 4], [2.5, 7], [7.6, 2], [5.4, 5.9]])for i in input_datapoints:print(i, '-->', classifier.decision_function(i)[0])# 測量點到邊界的距離params = {'kernel': 'rbf', 'probability': True} classifier = SVC(**params) classifier.fit(X_train, y_train)for i in input_datapoints:print(i, '-->', classifier.predict_proba(i)[0])# 這里要求params中probability必須為True，計算輸入數(shù)據(jù)點的置信度plot_classifier(classifier, input_datapoints, [0]*len(input_datapoints))

6、尋找最優(yōu)超參數(shù)

超參數(shù)對分類器的性能至關(guān)重要。尋找步驟如下：

# 1、加載數(shù)據(jù)，通過交叉驗證 parameter_grid = [{'kernel': ['linear'], 'C': [1, 10, 50, 600]},{'kernel': ['poly'], 'degree': [2, 3]},{'kernel': ['rbf'], 'gamma': [0.01, 0.001], 'C': [1, 10, 50, 600]},]metrics = ['precision', 'recall_weighted']from sklearn import svm, grid_search, cross_validation from sklearn.metrics import classification_report # 2、為每個指標搜索最優(yōu)超參數(shù) for metric in metrics:classifier = grid_search.GridSearchCV(svm.SVC(C=1), parameter_grid, cv=5, scoring=metric)# 獲取對象classifier.fit(X_train, y_train)# 訓(xùn)練for params, avg_score, _ in classifier.grid_scores_:# 看指標得分print(params, '-->', round(avg_score, 3))print('最好參數(shù)集：',classifier.best_params_)# 最優(yōu)參數(shù)集y_true, y_pred = y_test, classifier.predict(X_test)print(classification_report(y_true, y_pred))# 打印一下性能報告

7、建立時間預(yù)測器

這個例子和之前的差不多,主要是記住步驟。

# 1、讀取數(shù)據(jù) input_file='building_event_multiclass.txt' # input_file='building_event_binary.txt'X=[] y=[] with open(input_file,'r') as f:for line in f.readlines():data=line[:-1].split(',')X.append([data[0]]+data[2:]) X=np.array(X) # 2、編碼器編碼 from sklearn import preprocessing label_encoder=[] X_encoder=np.empty(X.shape) for i,item in enumerate(X[0]):if item.isdigit():X_encoder[:,i]=X[:,i]else:label_encoder.append(preprocessing.LabelEncoder())X_encoder[:,i]=label_encoder[-1].fit_transform(X[:,i]) X=np.array(X_encoder[:,:-1]).astype(int) y=np.array(X_encoder[:,-1]).astype(int) # 3、進行分類 from sklearn.svm import SVC params={'kernel':'rbf','probability':True,'class_weight':'balanced'} classifier=SVC(**params) classifier.fit(X,y) # 4、交叉驗證 from sklearn.model_selection import cross_val_score accuracy=cross_val_score(classifier,X,y,scoring='accuracy',cv=3) print('accuracy:',accuracy.mean()) # 5、對新數(shù)據(jù)進行驗證 input_data = ['Tuesday', '12:30:00','21','23'] input_data_encoder=[-1]*len(input_data) count=0for i,item in enumerate(input_data):if item.isdigit():input_data_encoder[i]=int(input_data[i])else:label=[]label.append(input_data[i])input_data_encoder[i]=label_encoder[count].transform(label)count=count+1result=int(classifier.predict(np.array(input_data_encoder))) print('result:',label_encoder[-1].inverse_transform(result))

8、估算交通流量

在之前的SVM都是用作分類器，現(xiàn)在展示一個回歸器的例子：

# 1、獲取數(shù)據(jù) X=[] input_file='traffic_data.txt' with open(input_file,'r') as f:for line in f.readlines():data=line[:-1].split(',')X.append(data)X=np.array(X)# 2、編碼 from sklearn import preprocessing label_encoder=[] X_encoder=np.empty(X.shape) for i,item in enumerate(X[0]):if item.isdigit():X_encoder[:,i]=X[:,i]else:label_encoder.append(preprocessing.LabelEncoder())X_encoder[:,i]=label_encoder[-1].fit_transform(X[:,i])X=X_encoder[:,:-1].astype(int) y=X_encoder[:,-1].astype(int)# 3、線性回歸 from sklearn.svm import SVR # params = {'kernel': 'rbf', 'C': 10.0, 'epsilon': 0.2} params={'kernel':'rbf','C':10.0,'epsilon':0.2}# C表示對分類的懲罰，參數(shù)epsilon表示不使用懲罰的限制 regressor=SVR(**params) regressor.fit(X,y) # 4、驗證 from sklearn.metrics import mean_absolute_error y_pred=regressor.predict(X) print('mean_absolute_error:',mean_absolute_error(y,y_pred)) # 5、預(yù)測新值 input_data = ['Tuesday', '13:35', 'San Francisco', 'yes'] input_data_encoder=[-1]*len(input_data) count=0 for i,item in enumerate(input_data):if item.isdigit():input_data_encoder[i]=int(input_data[i])else:label=[]label.append(input_data[i])input_data_encoder[i]=int(label_encoder[count].transform(label))count=count+1result=regressor.predict(input_data_encoder) print(result)

轉(zhuǎn)載于:https://www.cnblogs.com/NSGUF/p/8278119.html

總結(jié)

以上是生活随笔為你收集整理的3、预测模型笔记的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。