日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

【Python-ML】SKlearn库特征选择SBS算法

發布時間:2025/4/16 python 20 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【Python-ML】SKlearn库特征选择SBS算法 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
# -*- coding: utf-8 -*- ''' Created on 2018年1月17日 @author: Jason.F @summary: 特征選擇-序列后向選擇算法(Sequential Backward Selection,SBS) ''' import pandas as pd import numpy as np import time from sklearn.cross_validation import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.base import clone from itertools import combinations from sklearn.metrics import accuracy_score from sklearn.neighbors import KNeighborsClassifier import matplotlib.pyplot as plt #SBS類 class SBS(object):def __init__(self,estimator,k_features,scoring=accuracy_score,test_size=0.2,random_state=1):self.scoring=scoringself.estimator=clone(estimator)self.k_features=k_featuresself.test_size=test_sizeself.random_state=random_statedef fit(self,X,y):X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=self.test_size,random_state=self.random_state)dim=X_train.shape[1]self.indices_=tuple(range(dim))self.subsets_=[self.indices_]score=self._calc_score(X_train,X_test,y_train,y_test,self.indices_)self.scores_=[score]while dim>self.k_features:scores=[]subsets=[]for p in combinations(self.indices_,r=dim-1):score=self._calc_score(X_train,X_test,y_train,y_test,p)scores.append(score)subsets.append(p)best=np.argmax(scores)self.indices_=subsets[best]self.subsets_.append(self.indices_)dim-=1self.scores_.append(scores[best])self.k_score_=self.scores_[-1]return selfdef transform(self,X):return X[:,self.indices_]def _calc_score(self,X_train,X_test,y_train,y_test,indices):self.estimator.fit(X_train[:,indices],y_train)y_pred=self.estimator.predict(X_test[:,indices])score=self.scoring(y_test,y_pred)return scoreif __name__ == "__main__": start = time.clock() #導入數據df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data',header=None)df_wine.columns=['Class label','Alcohol','Malic acid','Ash','Alcalinity of ash','Magnesium','Total phenols','Flavanoids','Nonflavanoid phenols','Proanthocyanins','Color intensity','Hue','OD280/OD315 of diluted wines','Proline']X,y=df_wine.iloc[:,1:].values,df_wine.iloc[:,0].valuesX_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)stdsc=StandardScaler()#標準化X_train_std=stdsc.fit_transform(X_train)X_test_std=stdsc.fit_transform(X_test)#SBS訓練knn=KNeighborsClassifier(n_neighbors=2)sbs=SBS(knn,k_features=1)sbs.fit(X_train_std,y_train)k_feat=[len(k) for k in sbs.subsets_]plt.plot(k_feat,sbs.scores_,marker='o')plt.ylim([0.7,1.1])plt.ylabel('Accuracy')plt.xlabel('Number of features')plt.grid()plt.show()#在原始特征上的訓練knn.fit(X_train_std,y_train)print ('Training accuracy:',knn.score(X_train_std,y_train))print ('Test accuracy:',knn.score(X_test_std,y_test)) #存在過擬合#選定SBS得到的最好5個特征來比較k5=list(sbs.subsets_[8])print (df_wine.columns[1:][k5])knn.fit(X_train_std[:,k5],y_train)print ('Training accuracy:',knn.score(X_train_std[:,k5],y_train))print ('Test accuracy:',knn.score(X_test_std[:,k5],y_test)) #過擬合得到緩解end = time.clock() print('finish all in %s' % str(end - start))

結果:

('Training accuracy:', 0.9859154929577465) ('Test accuracy:', 0.91666666666666663) Index([u'Alcohol', u'Malic acid', u'Ash', u'Color intensity', u'Proline'], dtype='object') ('Training accuracy:', 0.95070422535211263) ('Test accuracy:', 0.97222222222222221) finish all in 21.7107086315

《新程序員》:云原生和全面數字化實踐50位技術專家共同創作,文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的【Python-ML】SKlearn库特征选择SBS算法的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。