當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

网格搜索法

發布時間：2023/12/14 编程问答 37 豆豆

生活随笔收集整理的這篇文章主要介紹了网格搜索法小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

網格搜索法是指定參數值的一種窮舉搜索方法，通過將估計函數的參數通過交叉驗證的方法進行優化來得到最優的學習算法。

即，將各個參數可能的取值進行排列組合，列出所有可能的組合結果生成“網格”。然后將各組合用于SVM訓練，并使用交叉驗證對表現進行評估。在擬合函數嘗試了所有的參數組合后，返回一個合適的分類器，自動調整至最佳參數組合，可以通過clf.best_params_獲得參數值

交叉驗證與網格搜索

交叉驗證與網格搜索是機器學習中的兩個非常重要且基本的概念，但是這兩個概念在剛入門的時候并不是非常容易理解與掌握，自己開始學習的時候，對這兩個概念理解的并不到位，現在寫一篇關于交叉驗證與網格搜索的文章，將這兩個基本的概念做一下梳理。

網格搜索

網格搜索（Grid Search）名字非常大氣，但是用簡答的話來說就是你手動的給出一個模型中你想要改動的所用的參數，程序自動的幫你使用窮舉法來將所用的參數都運行一遍。決策樹中我們常常將最大樹深作為需要調節的參數；AdaBoost中將弱分類器的數量作為需要調節的參數。

評分方法

為了確定搜索參數，也就是手動設定的調節的變量的值中，那個是最好的，這時就需要使用一個比較理想的評分方式（這個評分方式是根據實際情況來確定的可能是accuracy、f1-score、f-beta、pricise、recall等）

交叉驗證

有了好的評分方式，但是只用一次的結果就能說明某組的參數組合比另外的參數組合好嗎？這顯然是不嚴謹的，上小學的時候老師就告訴我們要求平均��。所以就有了交叉驗證這一概念。下面以K折交叉驗證為例介紹這一概念。

首先進行數據分割?
將原始數據集分為訓練集和測試集。如下圖以8：2的方式分割：?
?
訓練集使用來訓練模型，測試集使用來測試模型的準確率。?
注意：絕對不能使用測試集來訓練數據，這相當于考試的時候先讓你把考試的答案背過了，又讓你參加考試。

數據驗真?
在k折交叉驗證方法中其中K-1份作為訓練數據，剩下的一份作為驗真數據：

?
這個過程一共需要進行K次，將最后K次使用實現選擇好的評分方式的評分求平均返回，然后找出最大的一個評分對用的參數組合。這也就完成了交叉驗證這一過程。

### 舉例下面使用一個簡單的例子（預測年收入是否大于5萬美元）來進行說明網格搜索與交叉驗證的使用。數據集來自[UCI機器學習知識庫](https://archive.ics.uci.edu/ml/datasets/Census+Income)。

import numpy as np import pandas as pd from IPython.display import display from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split from sklearn.metrics import make_scorer, fbeta_score, accuracy_score from sklearn.model_selection import GridSearchCV, KFold%matplotlib inline data = pd.read_csv("census.csv")# 將數據切分成特征和標簽 income_raw = data['income'] features_raw = data.drop('income', axis=1)# 顯示部分數據 # display(features_raw.head(n=1))# 因為原始數據中的，capital-gain 和 capital-loss的傾斜度非常高，所以要是用對數轉換。 skewed = ['capital-gain', 'capital-loss'] features_raw[skewed] = data[skewed].apply(lambda x: np.log(x + 1))# 歸一化數字特征,是為了保證所有的特征均被平等的對待 scaler = MinMaxScaler() numerical = ['age', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week'] features_raw[numerical] = scaler.fit_transform(data[numerical]) # display(features_raw.head(n=1))# 獨熱編碼，將非數字的形式轉化為數字 features = pd.get_dummies(features_raw) income = income_raw.replace(['>50K', ['<=50K']], [1, 0])# 切分數據集 X_train, X_test, y_train, y_test = train_test_split(features, income, test_size=0.2, random_state=0)# Adaboost from sklearn.ensemble import AdaBoostClassifier clf_Ada = AdaBoostClassifier(random_state=0)# 決策樹 from sklearn.tree import DecisionTreeClassifier clf_Tree = DecisionTreeClassifier(random_state=0)# KNN from sklearn.neighbors import KNeighborsClassifier clf_KNN = KNeighborsClassifier()# SVM from sklearn.svm import SVC clf_svm = SVC(random_state=0)# Logistic from sklearn.linear_model import LogisticRegression clf_log = LogisticRegression(random_state=0)# 隨機森林 from sklearn.ensemble import RandomForestClassifier clf_forest = RandomForestClassifier(random_state=0)# GBDT from sklearn.ensemble import GradientBoostingClassifier clf_gbdt = GradientBoostingClassifier(random_state=0)# GaussianNB from sklearn.naive_bayes import GaussianNB clf_NB = GaussianNB()scorer = make_scorer(accuracy_score)# 參數調優kfold = KFold(n_splits=10) # 決策樹 parameter_tree = {'max_depth': xrange(1, 10)} grid = GridSearchCV(clf_Tree, parameter_tree, scorer, cv=kfold) grid = grid.fit(X_train, y_train)print "best score: {}".format(grid.best_score_) display(pd.DataFrame(grid.cv_results_).T)

best score: 0.855737070514.dataframe thead tr:only-child th { text-align: right; } .dataframe thead th { text-align: left; } .dataframe tbody tr th { vertical-align: top; }

?012345678mean_fit_timemean_score_timemean_test_scoremean_train_scoreparam_max_depthparamsrank_test_scoresplit0_test_scoresplit0_train_scoresplit1_test_scoresplit1_train_scoresplit2_test_scoresplit2_train_scoresplit3_test_scoresplit3_train_scoresplit4_test_scoresplit4_train_scoresplit5_test_scoresplit5_train_scoresplit6_test_scoresplit6_train_scoresplit7_test_scoresplit7_train_scoresplit8_test_scoresplit8_train_scoresplit9_test_scoresplit9_train_scorestd_fit_timestd_score_timestd_test_scorestd_train_score

0.0562535	0.0692133	0.0885126	0.110233	0.128337	0.158719	0.17124	0.193637	0.223979
0.00240474	0.00228212	0.00221529	0.0026047	0.00226772	0.00254297	0.00231481	0.00246696	0.00256622
0.75114	0.823811	0.839345	0.839926	0.846671	0.852392	0.851508	0.853139	0.855737
0.75114	0.82421	0.839628	0.840503	0.847878	0.853329	0.855264	0.859202	0.863667
1	2	3	4	5	6	7	8	9
{u’max_depth’: 1}	{u’max_depth’: 2}	{u’max_depth’: 3}	{u’max_depth’: 4}	{u’max_depth’: 5}	{u’max_depth’: 6}	{u’max_depth’: 7}	{u’max_depth’: 8}	{u’max_depth’: 9}
9	8	7	6	5	3	4	2	1
0.760641	0.8267	0.843836	0.844666	0.851575	0.855721	0.855445	0.86042	0.859038
0.750084	0.823828	0.839184	0.83943	0.847538	0.852913	0.854295	0.859947	0.863233
0.758154	0.821172	0.839138	0.842454	0.845218	0.849641	0.847706	0.850746	0.852957
0.750361	0.824442	0.839706	0.845911	0.850088	0.854203	0.855831	0.861482	0.864984
0.754837	0.824212	0.840243	0.84052	0.8466	0.854616	0.854339	0.854063	0.856551
0.750729	0.824718	0.839031	0.839307	0.847323	0.852237	0.854203	0.859578	0.86397
0.73162	0.820619	0.838032	0.838308	0.8466	0.850746	0.848535	0.846877	0.852957
0.753309	0.824503	0.839829	0.840106	0.848337	0.853742	0.85537	0.858104	0.863171
0.746545	0.818684	0.83361	0.833886	0.83969	0.847982	0.845495	0.85047	0.848811
0.751651	0.824718	0.840321	0.840597	0.844897	0.853558	0.858319	0.861912	0.864922
0.754284	0.826147	0.844942	0.845218	0.854063	0.859038	0.85738	0.858209	0.861802
0.750791	0.823889	0.839061	0.839338	0.847323	0.852729	0.854111	0.856967	0.862741
0.754284	0.825318	0.838032	0.837756	0.845495	0.848535	0.848535	0.852128	0.857103
0.750791	0.823981	0.839829	0.840167	0.848429	0.853773	0.855647	0.857766	0.863141
0.749793	0.821399	0.835499	0.835499	0.844623	0.85264	0.852087	0.853746	0.85264
0.75129	0.824416	0.840111	0.840418	0.848372	0.853501	0.854945	0.860811	0.863882
0.753387	0.826375	0.838264	0.83854	0.84407	0.852087	0.852917	0.852364	0.858446
0.750891	0.823864	0.839803	0.84008	0.848372	0.853071	0.854945	0.857801	0.863391
0.747857	0.827481	0.841858	0.842411	0.84877	0.852917	0.85264	0.852364	0.857064
0.751505	0.823741	0.839404	0.839681	0.848096	0.853563	0.854975	0.857647	0.863237
0.0123583	0.00442788	0.00552026	0.00631691	0.0053195	0.0157011	0.00476991	0.00622854	0.0147429
0.000529214	0.000467091	0.000355028	0.000760624	0.000460829	0.000504627	0.000446289	0.000445256	0.000449312
0.00769898	0.00292464	0.00333118	0.00358776	0.00382496	0.00324406	0.00360414	0.00366389	0.00363761
0.000855482	0.000366166	0.000418973	0.00185264	0.00124698	0.000553171	0.00116151	0.00168732	0.000726325

總結

以上是生活随笔為你收集整理的网格搜索法的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

网格

上一篇：空间分析知识点总结
下一篇： 2048游戏作者：2048的成功和我