日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

[Kaggle] Digit Recognizer 手写数字识别

發布時間:2024/7/5 编程问答 26 豆豆
生活随笔 收集整理的這篇文章主要介紹了 [Kaggle] Digit Recognizer 手写数字识别 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

    • 1. Baseline KNN
    • 2. Try SVC

Digit Recognizer 練習地址

相關博文:[Hands On ML] 3. 分類(MNIST手寫數字預測)

1. Baseline KNN

  • 讀取數據
import pandas as pd train = pd.read_csv('train.csv') X_test = pd.read_csv('test.csv')
  • 特征、標簽分離
train.head() y_train = train['label'] X_train = train.drop(['label'], axis=1) X_train

  • 網格搜索 KNN 模型最佳參數
from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import GridSearchCV from sklearn.metrics import accuracy_score # help(KNeighborsClassifier) para_dict = [{'weights':["uniform", "distance"], 'n_neighbors':[3,4,5], 'leaf_size':[10,20]} ] knn_clf = KNeighborsClassifier() grid_search = GridSearchCV(knn_clf, para_dict, cv=3,scoring='accuracy',n_jobs=-1) grid_search.fit(X_train, y_train) 輸出 GridSearchCV(cv=3, estimator=KNeighborsClassifier(), n_jobs=-1,param_grid=[{'leaf_size': [10, 20], 'n_neighbors': [3, 4, 5],'weights': ['uniform', 'distance']}],scoring='accuracy')
  • 最佳參數
grid_search.best_params_ # {'leaf_size': 10, 'n_neighbors': 4, 'weights': 'distance'}
  • 最好得分
grid_search.best_score_ # 0.9677619047619048
  • 生成 test 集預測結果
y_pred = grid_search.predict(X_test)
  • 寫入結果文件
image_id = pd.Series(range(1,len(y_pred)+1)) output = pd.DataFrame({'ImageId':image_id, 'Label':y_pred}) output.to_csv("submission.csv", index=False) # 不要index列
  • 預測結果

排行榜

以上 KNN 模型得分 0.97067,目前排名2467

2. Try SVC

  • 讀取數據
import pandas as pd train = pd.read_csv('train.csv') X_test = pd.read_csv('test.csv') y_train = train['label'] X_train = train.drop(['label'], axis=1)
  • 導入包
from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.svm import SVC, LinearSVC from sklearn.model_selection import GridSearchCV from sklearn.metrics import classification_report from sklearn.metrics import accuracy_score
  • 搜索最佳參數
pipeline = Pipeline([("scaler",StandardScaler()),('clf', SVC(decision_function_shape="ovr", gamma="auto")) ])from sklearn.model_selection import RandomizedSearchCV from scipy.stats import reciprocal, uniformparam_distributions = {"clf__gamma": reciprocal(0.001, 0.1), "clf__C": uniform(1, 10)} rnd_search_cv = RandomizedSearchCV(pipeline, param_distributions, n_iter=10, verbose=2, cv=3)rnd_search_cv.fit(X_train, y_train)
  • 訓練花費12個小時 [Parallel(n_jobs=1)]: Done 30 out of 30 | elapsed: 744.1min finished
rnd_search_cv.best_estimator_
  • 最佳評估器
Pipeline(steps=[('scaler', StandardScaler()),('clf',SVC(C=10.729327185542381, gamma=0.0022750096640207287))])
  • 最好得分
rnd_search_cv.best_score_ # 0.9584285714285713
  • 預測
y_pred = rnd_search_cv.best_estimator_.predict(X_test) image_id = pd.Series(range(1,len(y_pred)+1)) output = pd.DataFrame({'ImageId':image_id, 'Label':y_pred}) output.to_csv("submission_svc.csv", index=False)


SVC 支持向量機分類模型 得分 0.96464 沒有上面 KNN 模型高(KNN 得分 0.97067)

總結

以上是生活随笔為你收集整理的[Kaggle] Digit Recognizer 手写数字识别的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。