當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

探索sklearn的数据集——以红酒数据集为例

發(fā)布時(shí)間：2024/3/13 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了探索sklearn的数据集——以红酒数据集为例小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

剛剛使用SKLearn學(xué)習(xí)機(jī)器學(xué)習(xí)進(jìn)行數(shù)據(jù)分析，分享一些概念和想法，希望可以大家一起討論，如果理解或者表達(dá)有不準(zhǔn)確的地方，請(qǐng)多多指點(diǎn)，不吝賜教，非常感謝～～

在sklearn.datasets庫(kù)中有非常多的知名數(shù)據(jù)集，在使用數(shù)據(jù)集前我總是對(duì)數(shù)據(jù)沒(méi)有直觀了解，所以下面整理一些datasets庫(kù)中數(shù)據(jù)集的屬性及方法，以紅酒數(shù)據(jù)集為例。隨著學(xué)習(xí)還會(huì)持續(xù)更新！

導(dǎo)入數(shù)據(jù)集模塊并實(shí)例化一個(gè)數(shù)據(jù)集

from sklearn.datasets import load_wine wine = load_wine()

探索數(shù)據(jù)集

數(shù)據(jù)集類型

# 查看數(shù)據(jù)集類型 type(wine) # 結(jié)果 sklearn.utils.Bunch

打印數(shù)據(jù)集

#這里是sklearn.datasets庫(kù)中各個(gè)模塊的方法和屬性 #將此庫(kù)中的數(shù)據(jù)實(shí)例化后，便繼承了庫(kù)中模塊的功能和屬性 wine --------------------------- # 得到一個(gè)“字典”{key1:value1,key2:value2} {'data': array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,1.065e+03],[1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,1.050e+03],[1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,1.185e+03],...,[1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,8.350e+02],[1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,8.400e+02],[1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,5.600e+02]]),'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,2, 2]),'target_names': array(['class_0', 'class_1', 'class_2'], dtype='<U7'),'DESCR': '.. _wine_dataset:\n\nWine recognition dataset\n------------------------\n\n**Data Set Characteristics:**\n\n :Number of Instances: 178 (50 in each of three classes)\n :Number of Attributes: 13 numeric, predictive attributes and the class\n :Attribute Information:\n \t\t- Alcohol\n \t\t- Malic acid\n \t\t- Ash\n\t\t- Alcalinity of ash \n \t\t- Magnesium\n\t\t- Total phenols\n \t\t- Flavanoids\n \t\t- Nonflavanoid phenols\n \t\t- Proanthocyanins\n\t\t- Color intensity\n \t\t- Hue\n \t\t- OD280/OD315 of diluted wines\n \t\t- Proline\n\n - class:\n - class_0\n - class_1\n - class_2\n\t\t\n :Summary Statistics:\n \n ============================= ==== ===== ======= =====\n Min Max Mean SD\n ============================= ==== ===== ======= =====\n Alcohol: 11.0 14.8 13.0 0.8\n Malic Acid: 0.74 5.80 2.34 1.12\n Ash: 1.36 3.23 2.36 0.27\n Alcalinity of Ash: 10.6 30.0 19.5 3.3\n Magnesium: 70.0 162.0 99.7 14.3\n Total Phenols: 0.98 3.88 2.29 0.63\n Flavanoids: 0.34 5.08 2.03 1.00\n Nonflavanoid Phenols: 0.13 0.66 0.36 0.12\n Proanthocyanins: 0.41 3.58 1.59 0.57\n Colour Intensity: 1.3 13.0 5.1 2.3\n Hue: 0.48 1.71 0.96 0.23\n OD280/OD315 of diluted wines: 1.27 4.00 2.61 0.71\n Proline: 278 1680 746 315\n ============================= ==== ===== ======= =====\n\n :Missing Attribute Values: None\n :Class Distribution: class_0 (59), class_1 (71), class_2 (48)\n :Creator: R.A. Fisher\n :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\n :Date: July, 1988\n\nThis is a copy of UCI ML Wine recognition datasets.\nhttps://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data\n\nThe data is the results of a chemical analysis of wines grown in the same\nregion in Italy by three different cultivators. There are thirteen different\nmeasurements taken for different constituents found in the three types of\nwine.\n\nOriginal Owners: \n\nForina, M. et al, PARVUS - \nAn Extendible Package for Data Exploration, Classification and Correlation. \nInstitute of Pharmaceutical and Food Analysis and Technologies,\nVia Brigata Salerno, 16147 Genoa, Italy.\n\nCitation:\n\nLichman, M. (2013). UCI Machine Learning Repository\n[https://archive.ics.uci.edu/ml]. Irvine, CA: University of California,\nSchool of Information and Computer Science. \n\n.. topic:: References\n\n (1) S. Aeberhard, D. Coomans and O. de Vel, \n Comparison of Classifiers in High Dimensional Settings, \n Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of \n Mathematics and Statistics, James Cook University of North Queensland. \n (Also submitted to Technometrics). \n\n The data was used with many others for comparing various \n classifiers. The classes are separable, though only RDA \n has achieved 100% correct classification. \n (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) \n (All results using the leave-one-out technique) \n\n (2) S. Aeberhard, D. Coomans and O. de Vel, \n "THE CLASSIFICATION PERFORMANCE OF RDA" \n Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of \n Mathematics and Statistics, James Cook University of North Queensland. \n (Also submitted to Journal of Chemometrics).\n','feature_names': ['alcohol','malic_acid','ash','alcalinity_of_ash','magnesium','total_phenols','flavanoids','nonflavanoid_phenols','proanthocyanins','color_intensity','hue','od280/od315_of_diluted_wines','proline']}# 需要注意的是，字典數(shù)據(jù)類型本身是無(wú)法直接打印字典的

分別打印數(shù)據(jù)集的鍵和值

wine.keys() wine.values()dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names']) dict_values(...略...)

探索數(shù)據(jù)集的每一對(duì)鍵值
（1）data，數(shù)據(jù)類型是array——數(shù)據(jù)集中的數(shù)據(jù)

# 查看"data"對(duì)應(yīng)的值 wine.data # 結(jié)果返回對(duì)應(yīng)的值，數(shù)據(jù)類型為：“array”# 查看“data"的數(shù)據(jù)“結(jié)構(gòu)” wine.data.shape # 結(jié)果 (178, 13) 說(shuō)明一共178行，13列（即數(shù)據(jù)集中有13個(gè)特征變量）# 對(duì)比“字典”的基本操作中，是無(wú)法直接使用：字典.鍵值獲得其對(duì)應(yīng)的值的

（2）target，數(shù)據(jù)類型是array——數(shù)據(jù)集中各個(gè)數(shù)據(jù)的標(biāo)簽
（3）feature_names，數(shù)據(jù)類型是list——數(shù)據(jù)集特征變量的名稱

直觀的觀察樣本的特征以及標(biāo)簽

# 使用pandas對(duì)數(shù)據(jù)進(jìn)行可視化表操作 import pandas as pd # 將“樣本數(shù)據(jù)”和“標(biāo)簽”按照“行向”連接起來(lái) sample=pd.concat([pd.DataFrame(wine.data),pd.DataFrame(wine.target)],axis=1) # 展示表格的頭5行數(shù)據(jù) sample.head()

– 待續(xù)

想了解sklearn數(shù)據(jù)集是如何建模的，請(qǐng)?jiān)L問(wèn)其他文章，例如《sklearn的DecisionTreeClassifier與紅酒數(shù)據(jù)集（criterion及創(chuàng)建一個(gè)樹(shù)）》：https://blog.csdn.net/weixin_42969619/article/details/98884082
如果想了解pandas如何對(duì)數(shù)據(jù)操作的，請(qǐng)查看《python_pandas(創(chuàng)建／加載數(shù)據(jù)／選擇數(shù)據(jù))》：https://blog.csdn.net/weixin_42969619/article/details/96863875

總結(jié)

以上是生活随笔為你收集整理的探索sklearn的数据集——以红酒数据集为例的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：好玩的海外游戏集结，有没有你玩过的？
下一篇： Easypack: JEECG的容器化编