當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

python中的auto_ml自动机器学习框架学习实践

發(fā)布時(shí)間：2024/8/1 python 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 python中的auto_ml自动机器学习框架学习实践小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

? ? ?之前就有接觸過auto_ml這個(gè)自動(dòng)機(jī)器學(xué)習(xí)框架，但是一直沒有時(shí)間做一個(gè)簡單的記錄總結(jié)，以便于后續(xù)有時(shí)間繼續(xù)學(xué)習(xí)，我相信隨著機(jī)器學(xué)習(xí)的普及推廣和發(fā)展，自動(dòng)機(jī)器學(xué)習(xí)一定會(huì)占據(jù)越來越大的作用，因?yàn)闄C(jī)器學(xué)習(xí)、深度學(xué)習(xí)里面很大的一部分時(shí)間都要花在特征工程、模型選擇、組合和參數(shù)調(diào)優(yōu)上面，auto_ml框架提供了一種很好的解決思路，當(dāng)前的自動(dòng)學(xué)習(xí)框架也有很多，想要完整地進(jìn)行學(xué)習(xí)還是需要花費(fèi)一定的時(shí)間的，這里就簡單對(duì)之前使用的auto_ml做個(gè)簡單的記錄。

? ? ?由于數(shù)據(jù)集的緣故我不能隨意公開使用，這里索性直接使用官方提供的Demo來簡單學(xué)習(xí)實(shí)踐一下，之后使用自己的數(shù)據(jù)集的時(shí)候只需要做一點(diǎn)數(shù)據(jù)集規(guī)范格式的統(tǒng)一處理就好了。

? ? ? 以波士頓房價(jià)數(shù)據(jù)為例，簡單的一個(gè)小例子如下：

def bostonSimpleFunc():'''波士頓房價(jià)數(shù)據(jù)的簡單應(yīng)用實(shí)例'''train_data,test_data = get_boston_dataset()column_descriptions = {'MEDV': 'output','CHAS': 'categorical'}ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)ml_predictor.train(train_data)ml_predictor.score(test_data, test_data.MEDV)

? ? ? ?運(yùn)行結(jié)果如下：

? ? ? ?作者說了，這個(gè)auto_ml是為了產(chǎn)品研發(fā)的，提供了很完整的應(yīng)用，這里從訓(xùn)練測試數(shù)據(jù)集劃分、模型訓(xùn)練、模型持久化、模型加載、模型預(yù)測幾個(gè)部分來拿波士頓房價(jià)數(shù)據(jù)做一個(gè)完成的實(shí)踐，具體如下：

def bostonWholeFunc():'''波士頓房價(jià)數(shù)據(jù)的一個(gè)比較完整的實(shí)例包括：訓(xùn)練測試數(shù)據(jù)集劃分、模型訓(xùn)練、模型持久化、模型加載、模型預(yù)測'''train_data,test_data = get_boston_dataset()column_descriptions = {'MEDV': 'output', 'CHAS': 'categorical'}ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)ml_predictor.train(train_data)test_score = ml_predictor.score(test_data, test_data.MEDV)file_name = ml_predictor.save()trained_model = load_ml_model(file_name)predictions = trained_model.predict(test_data)print('=====================predictions===========================')print(predictions)predictions = trained_model.predict_proba(test_data)print('=====================predictions===========================')print(predictions)

? ? ? ?結(jié)果如下：

Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.If you have any issues, or new feature ideas, let us know at http://auto.ml You are running on version 2.9.10 Now using the model training_params that you passed in: {} After overwriting our defaults with your values, here are the final params that will be used to initialize the model: {'presort': False, 'warm_start': True, 'learning_rate': 0.1} Running basic data cleaning Fitting DataFrameVectorizer Now using the model training_params that you passed in: {} After overwriting our defaults with your values, here are the final params that will be used to initialize the model: {'presort': False, 'warm_start': True, 'learning_rate': 0.1}******************************************************************************************** About to fit the pipeline for the model GradientBoostingRegressor to predict MEDV Started at: 2019-06-12 09:21:21 [1] random_holdout_set_from_training_data's score is: -9.93 [2] random_holdout_set_from_training_data's score is: -9.281 [3] random_holdout_set_from_training_data's score is: -8.683 [4] random_holdout_set_from_training_data's score is: -8.03 [5] random_holdout_set_from_training_data's score is: -7.494 [6] random_holdout_set_from_training_data's score is: -7.074 [7] random_holdout_set_from_training_data's score is: -6.649 [8] random_holdout_set_from_training_data's score is: -6.374 [9] random_holdout_set_from_training_data's score is: -6.115 [10] random_holdout_set_from_training_data's score is: -5.877 [11] random_holdout_set_from_training_data's score is: -5.566 [12] random_holdout_set_from_training_data's score is: -5.391 [13] random_holdout_set_from_training_data's score is: -5.088 [14] random_holdout_set_from_training_data's score is: -4.911 [15] random_holdout_set_from_training_data's score is: -4.692 [16] random_holdout_set_from_training_data's score is: -4.566 [17] random_holdout_set_from_training_data's score is: -4.379 [18] random_holdout_set_from_training_data's score is: -4.296 [19] random_holdout_set_from_training_data's score is: -4.14 [20] random_holdout_set_from_training_data's score is: -4.009 [21] random_holdout_set_from_training_data's score is: -3.92 [22] random_holdout_set_from_training_data's score is: -3.856 [23] random_holdout_set_from_training_data's score is: -3.81 [24] random_holdout_set_from_training_data's score is: -3.72 [25] random_holdout_set_from_training_data's score is: -3.632 [26] random_holdout_set_from_training_data's score is: -3.601 [27] random_holdout_set_from_training_data's score is: -3.538 [28] random_holdout_set_from_training_data's score is: -3.487 [29] random_holdout_set_from_training_data's score is: -3.459 [30] random_holdout_set_from_training_data's score is: -3.458 [31] random_holdout_set_from_training_data's score is: -3.422 [32] random_holdout_set_from_training_data's score is: -3.408 [33] random_holdout_set_from_training_data's score is: -3.356 [34] random_holdout_set_from_training_data's score is: -3.335 [35] random_holdout_set_from_training_data's score is: -3.323 [36] random_holdout_set_from_training_data's score is: -3.313 [37] random_holdout_set_from_training_data's score is: -3.262 [38] random_holdout_set_from_training_data's score is: -3.236 [39] random_holdout_set_from_training_data's score is: -3.207 [40] random_holdout_set_from_training_data's score is: -3.214 [41] random_holdout_set_from_training_data's score is: -3.198 [42] random_holdout_set_from_training_data's score is: -3.188 [43] random_holdout_set_from_training_data's score is: -3.174 [44] random_holdout_set_from_training_data's score is: -3.164 [45] random_holdout_set_from_training_data's score is: -3.122 [46] random_holdout_set_from_training_data's score is: -3.122 [47] random_holdout_set_from_training_data's score is: -3.109 [48] random_holdout_set_from_training_data's score is: -3.11 [49] random_holdout_set_from_training_data's score is: -3.119 [50] random_holdout_set_from_training_data's score is: -3.113 [52] random_holdout_set_from_training_data's score is: -3.113 [54] random_holdout_set_from_training_data's score is: -3.099 [56] random_holdout_set_from_training_data's score is: -3.102 [58] random_holdout_set_from_training_data's score is: -3.097 [60] random_holdout_set_from_training_data's score is: -3.069 [62] random_holdout_set_from_training_data's score is: -3.061 [64] random_holdout_set_from_training_data's score is: -3.024 [66] random_holdout_set_from_training_data's score is: -2.999 [68] random_holdout_set_from_training_data's score is: -2.999 [70] random_holdout_set_from_training_data's score is: -2.984 [72] random_holdout_set_from_training_data's score is: -2.978 [74] random_holdout_set_from_training_data's score is: -2.96 [76] random_holdout_set_from_training_data's score is: -2.943 [78] random_holdout_set_from_training_data's score is: -2.947 [80] random_holdout_set_from_training_data's score is: -2.938 [82] random_holdout_set_from_training_data's score is: -2.921 [84] random_holdout_set_from_training_data's score is: -2.914 [86] random_holdout_set_from_training_data's score is: -2.91 [88] random_holdout_set_from_training_data's score is: -2.901 [90] random_holdout_set_from_training_data's score is: -2.906 [92] random_holdout_set_from_training_data's score is: -2.892 [94] random_holdout_set_from_training_data's score is: -2.885 [96] random_holdout_set_from_training_data's score is: -2.884 [98] random_holdout_set_from_training_data's score is: -2.894 [100] random_holdout_set_from_training_data's score is: -2.88 [103] random_holdout_set_from_training_data's score is: -2.893 [106] random_holdout_set_from_training_data's score is: -2.889 [109] random_holdout_set_from_training_data's score is: -2.886 [112] random_holdout_set_from_training_data's score is: -2.869 [115] random_holdout_set_from_training_data's score is: -2.875 [118] random_holdout_set_from_training_data's score is: -2.852 [121] random_holdout_set_from_training_data's score is: -2.855 [124] random_holdout_set_from_training_data's score is: -2.848 [127] random_holdout_set_from_training_data's score is: -2.854 [130] random_holdout_set_from_training_data's score is: -2.86 [133] random_holdout_set_from_training_data's score is: -2.857 [136] random_holdout_set_from_training_data's score is: -2.854 [139] random_holdout_set_from_training_data's score is: -2.856 [142] random_holdout_set_from_training_data's score is: -2.854 [145] random_holdout_set_from_training_data's score is: -2.845 [148] random_holdout_set_from_training_data's score is: -2.84 [151] random_holdout_set_from_training_data's score is: -2.838 [154] random_holdout_set_from_training_data's score is: -2.838 [157] random_holdout_set_from_training_data's score is: -2.839 [160] random_holdout_set_from_training_data's score is: -2.837 [163] random_holdout_set_from_training_data's score is: -2.838 [166] random_holdout_set_from_training_data's score is: -2.838 [169] random_holdout_set_from_training_data's score is: -2.84 [172] random_holdout_set_from_training_data's score is: -2.828 [175] random_holdout_set_from_training_data's score is: -2.836 [178] random_holdout_set_from_training_data's score is: -2.834 [181] random_holdout_set_from_training_data's score is: -2.836 [184] random_holdout_set_from_training_data's score is: -2.837 [187] random_holdout_set_from_training_data's score is: -2.86 [190] random_holdout_set_from_training_data's score is: -2.862 [193] random_holdout_set_from_training_data's score is: -2.856 [196] random_holdout_set_from_training_data's score is: -2.855 [199] random_holdout_set_from_training_data's score is: -2.857 [202] random_holdout_set_from_training_data's score is: -2.856 [205] random_holdout_set_from_training_data's score is: -2.86 [208] random_holdout_set_from_training_data's score is: -2.859 [211] random_holdout_set_from_training_data's score is: -2.857 [214] random_holdout_set_from_training_data's score is: -2.855 [217] random_holdout_set_from_training_data's score is: -2.852 [220] random_holdout_set_from_training_data's score is: -2.849 [223] random_holdout_set_from_training_data's score is: -2.853 [226] random_holdout_set_from_training_data's score is: -2.845 [229] random_holdout_set_from_training_data's score is: -2.846 [232] random_holdout_set_from_training_data's score is: -2.849 The number of estimators that were the best for this training dataset: 172 The best score on the holdout set: -2.827876248876794 Finished training the pipeline! Total training time: 0:00:01Here are the results from our GradientBoostingRegressor predicting MEDV Calculating feature responses, for advanced analytics. The printed list will only contain at most the top 100 features. +----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+ | | Feature Name | Importance | Delta | FR_Decrementing | FR_Incrementing | FRD_abs | FRI_abs | FRD_MAD | FRI_MAD | |----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------| | 12 | CHAS=0.0 | 0.0000 | nan | nan | nan | nan | nan | nan | nan | | 1 | ZN | 0.0004 | 11.5619 | -0.0194 | 0.0204 | 0.0205 | 0.0230 | 0.0000 | 0.0000 | | 13 | CHAS=1.0 | 0.0005 | nan | nan | nan | nan | nan | nan | nan | | 2 | INDUS | 0.0031 | 3.4430 | 0.1103 | 0.0494 | 0.1565 | 0.1543 | 0.0597 | 0.0000 | | 7 | RAD | 0.0059 | 4.2895 | -0.3558 | 0.0537 | 0.3620 | 0.1431 | 0.3727 | 0.0000 | | 5 | AGE | 0.0105 | 13.9801 | 0.2805 | -0.3050 | 0.5735 | 0.4734 | 0.3615 | 0.2435 | | 10 | B | 0.0118 | 45.7266 | -0.1885 | 0.1507 | 0.3139 | 0.2903 | 0.1688 | 0.0582 | | 8 | TAX | 0.0167 | 82.9834 | 1.1477 | -0.4399 | 1.2920 | 0.4563 | 0.2671 | 0.2617 | | 9 | PTRATIO | 0.0247 | 1.1130 | 0.5095 | -0.2323 | 0.5599 | 0.4590 | 0.2984 | 0.3357 | | 0 | CRIM | 0.0284 | 4.4320 | -0.4701 | -0.2061 | 0.7788 | 0.4938 | 0.5027 | 0.2806 | | 3 | NOX | 0.0298 | 0.0588 | 0.3083 | -0.1691 | 0.4285 | 0.3968 | 0.0745 | 0.0745 | | 6 | DIS | 0.0608 | 1.0643 | 3.4966 | -0.3628 | 3.5823 | 0.8045 | 0.9935 | 0.3655 | | 4 | RM | 0.3571 | 0.3543 | -1.2174 | 1.4995 | 1.3628 | 1.7090 | 0.7740 | 1.0375 | | 11 | LSTAT | 0.4504 | 3.5508 | 1.9849 | -1.8635 | 2.0343 | 1.9289 | 1.8354 | 1.5375 | +----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+******* Legend: Importance = Feature ImportanceExplanation: A weighted measure of how much of the variance the model is able to explain is due to this column FR_delta = Feature Response Delta AmountExplanation: Amount this column was incremented or decremented by to calculate the feature reponses FR_Decrementing = Feature Response From Decrementing Values In This Column By One FR_deltaExplanation: Represents how much the predicted output values respond to subtracting one FR_delta amount from every value in this column FR_Incrementing = Feature Response From Incrementing Values In This Column By One FR_deltaExplanation: Represents how much the predicted output values respond to adding one FR_delta amount to every value in this column FRD_MAD = Feature Response From Decrementing- Median Absolute DeltaExplanation: Takes the absolute value of all changes in predictions, then takes the median of those. Useful for seeing if decrementing this feature provokes strong changes that are both positive and negative FRI_MAD = Feature Response From Incrementing- Median Absolute DeltaExplanation: Takes the absolute value of all changes in predictions, then takes the median of those. Useful for seeing if incrementing this feature provokes strong changes that are both positive and negative FRD_abs = Feature Response From Decrementing Avg Absolute ChangeExplanation: What is the average absolute change in predicted output values to subtracting one FR_delta amount to every value in this column. Useful for seeing if output is sensitive to a feature, but not in a uniformly positive or negative way FRI_abs = Feature Response From Incrementing Avg Absolute ChangeExplanation: What is the average absolute change in predicted output values to adding one FR_delta amount to every value in this column. Useful for seeing if output is sensitive to a feature, but not in a uniformly positive or negative way *******None*********************************************** Advanced scoring metrics for the trained regression model on this particular dataset:Here is the overall RMSE for these predictions: 2.4474947386663786Here is the average of the predictions: 21.2925792927Here is the average actual value on this validation set: 21.4882352941Here is the median prediction: 20.457423442279662Here is the median actual value: 20.15Here is the mean absolute error: 1.844793596155306Here is the median absolute error (robust to outliers): 1.3340192567295777Here is the explained variance: 0.9188375538746201Here is the R-squared value: 0.9183155397464807 Count of positive differences (prediction > actual): 51 Count of negative differences: 51 Average positive difference: 1.64913759477 Average negative difference: -2.04044959754***********************************************We have saved the trained pipeline to a filed called "auto_ml_saved_pipeline.dill" It is saved in the directory: C:\Users\18706\Desktop\myBlogs\auto_ml_use To use it to get predictions, please follow the following flow (adjusting for your own uses as necessary:`from auto_ml.utils_models import load_ml_model `trained_ml_pipeline = load_ml_model("auto_ml_saved_pipeline.dill") `trained_ml_pipeline.predict(data)`Note that this pickle/dill file can only be loaded in an environment with the same modules installed, and running the same Python version. This version of Python is: sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0)When passing in new data to get predictions on, columns that were not present (or were not found to be useful) in the training data will be silently ignored. It is worthwhile to make sure that you feed in all the most useful data points though, to make sure you can get the highest quality predictions. =====================predictions=========================== [23.503099796820333, 32.63486484873551, 17.607843570794248, 22.96364141712182, 18.037259790025, 22.154154350077157, 18.157171399351753, 14.490724400217747, 20.91569106207268, 21.371745165599958, 19.978460029298827, 17.617959317911595, 6.657480263073871, 21.259425283809687, 19.30470390603625, 23.54754498054679, 20.616057833202493, 8.569816325663448, 45.01902942229479, 15.319975928505148, 23.84765254861352, 24.49050663723932, 12.344561585629016, 23.24874551694055, 15.137348894013865, 15.067038653704085, 21.674735923166942, 12.88017013620315, 19.43339890697579, 20.933210490656045, 20.235546222120107, 22.99264652948031, 20.45638944287541, 20.50831821637611, 14.026411558432988, 17.14000803427353, 34.322736768893236, 19.82116882409099, 20.757084718131125, 23.523990773770624, 17.92101235838185, 30.745980540024213, 45.09505946725109, 18.76719301853909, 23.69250732281568, 14.627546717865679, 15.404318347865019, 23.856332667077602, 18.597560915078148, 28.295069087679007, 20.335783749261154, 35.49551328178157, 17.049478769941757, 27.36240739278428, 49.168123673644864, 21.919364008618228, 16.431621230418827, 32.50614954154076, 22.60486571683311, 17.190717714534216, 24.86659240393153, 34.726632201151446, 32.56154963374883, 17.991423510542266, 23.19139847589728, 16.3827778391806, 13.763406903575234, 23.041746542718485, 28.897952087920405, 15.16115409656009, 20.54704218671605, 27.630784534960636, 9.265217126500687, 20.218468086624206, 22.678130640115423, 3.978712919679104, 20.458457441683915, 44.47945990229906, 12.603336785642627, 11.482102006681343, 21.066151218556975, 13.559181962607349, 21.19973222974325, 10.447704116792627, 20.110776756244167, 28.928923567731772, 15.527462244687818, 23.24725371877329, 25.743821297087276, 18.04671684265537, 22.950747524482065, 9.088864852661203, 19.075035374223955, 18.42257896844079, 23.564483816162195, 19.647455910849818, 44.12778583727594, 11.427374611849514, 12.040264853009598, 16.998049081305517, 20.25692214075818, 22.80453061159547] =====================predictions=========================== [[1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0]] [Finished in 3.3s]

? ? ? ?在官方提供的原始實(shí)例上，我對(duì)輸出多加了一層類別的輸出。

? ? ? ?完整的程序如下：

#!usr/bin/env python #encoding:utf-8 from __future__ import division''' __Author__:沂水寒城功能： auto_ml 學(xué)習(xí)實(shí)踐使用GitHub地址：https://github.com/yishuihanhan/auto_ml 官方文檔：https://auto-ml.readthedocs.io/en/latest/formatting_data.html '''from auto_ml import Predictor from auto_ml.utils import get_boston_dataset from auto_ml.utils_models import load_ml_modeldef bostonSimpleFunc():'''波士頓房價(jià)數(shù)據(jù)的簡單應(yīng)用實(shí)例'''train_data,test_data = get_boston_dataset()column_descriptions = {'MEDV': 'output','CHAS': 'categorical'}ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)ml_predictor.train(train_data)ml_predictor.score(test_data, test_data.MEDV)def bostonWholeFunc():'''波士頓房價(jià)數(shù)據(jù)的一個(gè)比較完整的實(shí)例包括：訓(xùn)練測試數(shù)據(jù)集劃分、模型訓(xùn)練、模型持久化、模型加載、模型預(yù)測'''train_data,test_data = get_boston_dataset()column_descriptions = {'MEDV': 'output', 'CHAS': 'categorical'}ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)ml_predictor.train(train_data)test_score = ml_predictor.score(test_data, test_data.MEDV)file_name = ml_predictor.save()trained_model = load_ml_model(file_name)predictions = trained_model.predict(test_data)print('=====================predictions===========================')print(predictions)predictions = trained_model.predict_proba(test_data)print('=====================predictions===========================')print(predictions)if __name__=='__main__':bostonSimpleFunc()bostonWholeFunc()

? ? ? ? 相應(yīng)地GitHub地址和官方文檔地址在代碼的開頭部分我都給出來了，感興趣的話可以去看看，記錄學(xué)習(xí)了！

總結(jié)

以上是生活随笔為你收集整理的python中的auto_ml自动机器学习框架学习实践的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：实时音频编解码之二编码学数学知识
下一篇：声卡驱动正常但就是没有声音,驱动人生解决