日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

随机森林算法的随机性_理解随机森林算法的图形指南

發(fā)布時間:2023/12/15 编程问答 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 随机森林算法的随机性_理解随机森林算法的图形指南 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

隨機森林算法的隨機性

本文是關(guān)于什么的 (What this article is about)

In this article , we will see how the Random Forest algorithm works internally. To truly appreciate it, it might be helpful to understand a bit about Decision-Tree Classifiers. But its not entirely required.

在本文中,我們將了解隨機森林算法在內(nèi)部如何工作。 要真正欣賞它,了解一些有關(guān)決策樹分類器可能會有所幫助。 但是它不是完全必需的。

👉 Note : We are not covering the pre-processing or feature creature steps involved in modelling — but only see what happens within the algorithm when we call the .fit() and .transform() methods for sklearn’s RandomForestClassifier package does.

注意 :我們不討論建模中涉及的預(yù)處理或特征生物步驟,而是僅調(diào)用sklearn的RandomForestClassifier包的.fit()和.transform()方法時,才能看到算法中發(fā)生什么。

一段隨機森林 (Random Forest in one paragraph)

Random Forest ( RF) is a tree based algorithm . It is an ensemble of multiple random trees of different kinds. The final value of the model is the average of all the prediction/estimates created by each individual tree .

隨機森林(RF)是一種基于樹的算法。 它是多個不同種類的隨機樹的集合。 模型的最終值是每個單獨的樹創(chuàng)建的所有預(yù)測/估計的平均值。

包裝 (The Package)

We will be basing our article on sklearn’s RandomForestClassifier module

我們將以sklearn的RandomForestClassifier模塊為基礎(chǔ)

sklearn.ensemble.RandomForestClassifier

sklearn.ensemble.RandomForestClassifier

數(shù)據(jù) (The Data)

For illustration, we will be using a training data similar to the one below.

為了便于說明,我們將使用與以下數(shù)據(jù)類似的訓(xùn)練數(shù)據(jù)。

Image by Author)作者提供的圖像)

👉 Note :age ,glucose_level, weight, gender, smoking .. … f98, f99 are all the independent variables or the features.

👉 注意 : age ,glucose_level, weight, gender, smoking .. … f98, f99均為自變量或特征。

Diabetic is the y-variable / dependent variable that we have to predict.

Diabetic是我們必須預(yù)測的y變量/因變量。

內(nèi)部真正發(fā)生了什么 (What really happens internally)

With these basic information, lets get started and understand what happens with we pass this training set to the algorithm …

有了這些基本信息,我們就可以開始并了解將培訓(xùn)集傳遞給算法所發(fā)生的事情……

第1步 -自舉 (Step 1 — Bootstrapping)

Image by Author)作者提供的圖像)

Once we provide the training data to the RandomForestClassifier model, it (the algorithm) selects a bunch of rows randomly . This process is called Bootstrapping (Random replacement). For our example, lets assume that it selects m records.

一旦我們將訓(xùn)練數(shù)據(jù)提供給RandomForestClassifier模型,它( 該算法 )就會隨機選擇一堆行。 此過程稱為自舉(隨機替換)。 對于我們的示例,假設(shè)它選擇了m條記錄。

Note 👉 The number of rows to be selected can be provided by the user in the hyper-parameter- max_samples)

注意 to可以通過在超參數(shù)- max_samples)使用r提供要選擇的行數(shù)

Note 👉 One row might get selected more than once

注意👉可能會多次選擇一行

第2步-選擇子樹的功能 (Step 2 — Selecting features for sub-trees)

Choose the features for the mini decision tree選擇迷你決策樹的功能

Now, RF randomly selects a subset of features / columns . Here for the sake of simplicity and for the example, we are choosing 3 random features.

現(xiàn)在,RF隨機選擇要素/列的子集。 為了簡單起見,在此示例中,我們選擇3個隨機特征。

Note 👉 You can control this number in your hyper-parameter — max_features similar to the code below

注意 👉您可以在超參數(shù)中控制數(shù)字— max_features與下面的代碼類似

import sklearn.ensemblemy_rf = RandomForestClassifiermax_features=8)

步驟3 —選擇根節(jié)點 (Step 3 — Selecting root node)

Once the 3 random features are selected, the algorithm runs a splitting of the m record (from step 1) and does a quick calculation of the before and after values of a metric.

一旦選擇了3個隨機特征,該算法將對m條記錄進(jìn)行分割(來自步驟1),并快速計算度量的前后值。

This metric could be either gini-impurity or the entropy. It is based on the The criteria — gini or entropy based on the choice you have provided in your hyper-parameter .

該度量可以是基尼雜質(zhì)或熵。 它基于準(zhǔn)則- gini或entropy基于您在超參數(shù)中提供的選擇。

criterion = 'gini' ( or 'entropy' . default= 'gini’ )

criterion = 'gini' (或' entropy '。 default= 'gini' )

Which ever of the random feature gives the most minimum gini impurity / entropy value is selected as the root node .

選擇哪個隨機特征給出最大的基尼雜質(zhì)/熵值最小的根節(jié)點。

The records are split at this node based on the best splitting point.

將根據(jù)最佳拆分點在此節(jié)點上拆分記錄。

步驟4 —選擇子節(jié)點 (Step 4 — Selecting the child nodes)

Select the features randomly隨機選擇功能

The algorithm performs the same process as in Step 2 and Step 4 and selects another set of 3 random features . ( 3 is the number we have specified — you can choose what you like — or leave it to the algorithm to choose the best number )

該算法執(zhí)行與步驟2和步驟4相同的過程,并選擇另一組3個隨機特征。 (3是我們指定的數(shù)字-您可以選擇自己喜歡的數(shù)字-或?qū)⑵淞艚o算法以選擇最佳數(shù)字)

Based on the criteria ( gini / entropy ), it selects which feature will go into the next node / child node , and further splitting of the records happens here .

根據(jù)條件(基尼/熵),它選擇哪個特征將進(jìn)入下一個節(jié)點/子節(jié)點,并在此處進(jìn)一步分割記錄。

步驟5 —進(jìn)一步拆分并創(chuàng)建子節(jié)點 (Step 5 —Further split and create child nodes)

continue selection of the features ( columns ) to select the further child nodes繼續(xù)選擇特征(列)以選擇其他子節(jié)點

This process continues ( Steps 2, 4 ) of selecting the random feature and splitting of the nodes happens till either of the following conditions happen

繼續(xù)選擇隨機特征并分裂節(jié)點的過程(步驟2、4),直到發(fā)生以下任一情況

  • a) you have ran out of the number of rows to split ( or the threshold — minimum number of rows to be present in each child node )

    a)您已用完要拆分的行數(shù)(或閾值-每個子節(jié)點中存在的最小行數(shù))
  • b) the gini / entropy after splitting does not decrease

    b)分裂后的基尼/熵不降低
Now we have the first level of child nodes現(xiàn)在我們有了第一級子節(jié)點

You now have your first “mini-decision tree ”.

現(xiàn)在,您有了第一個“小型決策樹”。

The first mini-decision tree created using the randomly selected rows ( records) & columns (features) (Image by Author)使用隨機選擇的行(記錄)和列(功能)創(chuàng)建的第一個小型決策樹( 作者提供的圖像)

第6步-創(chuàng)建更多的小型決策樹 (Step 6 — Create more mini-decision trees)

Algorithm goes back to your data and does steps 1–5 to creates the 2nd “mini-tree”

算法返回到您的數(shù)據(jù)并執(zhí)行步驟1-5,以創(chuàng)建第二個“迷你樹”

second mini tree that we created using another set of randomly chosen rows & columns第二個迷你樹

步驟7.建立樹木森林 (Step 7. Build the forest of trees)

Once the default value of 100 trees is reached ( you now have 100 mini decision trees ), the model is said to have completed its fit() process.

一旦達(dá)到100棵樹的默認(rèn)值(您現(xiàn)在有100棵微型決策樹),該模型就被稱為完成了fit()過程。

2 trees from the list of 100 trees100棵樹中的2棵樹

Note 👉 You can specify the number of trees you want to generate in your hyper-parameter ( n_estimators)

注意 👉您可以在超參數(shù)中指定要生成的樹數(shù)( n_estimators)

import sklearn.ensemblemy_rf = RandomForestClassifiern_estimators=300)n_estimators variable or a default value of 100, if not specified ) (n_estimators變量指定的數(shù)量,或者如果未指定,則默認(rèn)值為100)( Image by Author)作者提供的圖像)

Now you have a forest of randomly created mini-trees ( hence the name Random Forest )

現(xiàn)在,您有一個隨機創(chuàng)建的迷你樹森林( 因此命名為Random Forest )

步驟7.推論 (Step 7. Inferencing)

Now lets predict the values in an unseen data set ( the test data set )

現(xiàn)在讓我們預(yù)測一個看不見的數(shù)據(jù)集(測試數(shù)據(jù)集)中的值

For inferencing (more commonly referred to as predicting/ scoring ) the test data, the algorithm passes the record through each mini-tree.

為了推斷( 更通常稱為預(yù)測/評分 )測試數(shù)據(jù),算法將記錄傳遞到每個小樹中。

Image by Author)作者提供的圖像)

The values from the record traverses through the mini tree based on the variables that each node represents,and reaches a leaf node ultimately. Based on the predetermined value of the leaf-node(during training) where this record ends up, that mini-tree is assigned one prediction output.

記錄中的值基于每個節(jié)點表示的變量遍歷迷你樹,并最終到達(dá)葉節(jié)點。 根據(jù)該記錄最終到達(dá)的葉節(jié)點的預(yù)定值(在訓(xùn)練過程中),為該小樹分配一個預(yù)測輸出。

Image by Author)作者提供的圖片)

Similarly, the same record goes through all the 100 mini-decision trees and each of the 100 trees have a prediction output. The final prediction value for this record is calculated by taking a simple voting of these 100 mini trees.

同樣,同一條記錄遍歷所有100個小型決策樹,并且這100棵樹中的每一個都有預(yù)測輸出。 該記錄的最終預(yù)測值是通過對這100棵迷你樹進(jìn)行簡單表決而計算出的

Now we have the prediction for a single record.

現(xiàn)在我們有了單個記錄的預(yù)測。

The algorithm iterates through all the records of the test set following the same process and does a calculation of the overall accuracy !

該算法按照相同的過程遍歷測試集的所有記錄,并計算整體精度

Iterate the process of obtaining the prediction for each row of the test set to arrive at the final accuracy.迭代獲取測試集每一行的預(yù)測的過程,以達(dá)到最終精度。

翻譯自: https://towardsdatascience.com/a-pictorial-guide-to-understanding-random-forest-algorithm-fbf570a0ae0d

隨機森林算法的隨機性

總結(jié)

以上是生活随笔為你收集整理的随机森林算法的随机性_理解随机森林算法的图形指南的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 亚洲影视一区 | 99视频+国产日韩欧美 | 久久影视网 | 永久免费AV无码网站韩国毛片 | 日日夜夜一区 | www.色哟哟 | 动漫精品一区二区三区 | 日本老太婆做爰视频 | 超碰人人在线观看 | 成年人在线免费观看网站 | 欧美乱视频 | 一本加勒比波多野结衣 | 男人av网站 | 毛片基地免费 | 国产丝袜视频 | 91在线欧美| 亚洲综合久久婷婷 | 精品国产乱码久久久久久闺蜜 | 青草青在线视频 | 亚洲精品91天天久久人人 | 日韩精品免费一区二区三区竹菊 | 欧美色图在线播放 | 欧美丰满美乳xxⅹ高潮www | 娇妻之欲海泛舟无弹窗笔趣阁 | 99亚洲欲妇 | 国产精品久久久久久久久免费桃花 | www.男人天堂.com | 夜夜精品一区二区无码 | 伊人免费在线观看高清版 | 久久夜夜操 | 久久综合伊人77777麻豆最新章节 | 国内外成人免费视频 | 美女爽爽爽 | 欧洲日韩一区二区三区 | 丁香婷婷九月 | 亚洲精品欧洲 | 国产经典毛片 | 99视频一区二区 | 九一网站在线观看 | 二区三区 | 久久亚洲AV成人无码国产野外 | 摸丰满大乳奶水www免费 | 色婷婷狠狠爱 | 中文字幕乱轮 | 久久久不卡 | av在线免 | 女生高潮视频在线观看 | 我们的生活第五季在线观看免费 | 久草热在线观看 | 亚洲成人自拍网 | 中国av一区二区 | 善良的女邻居在线观看 | 亚洲人成色777777精品音频 | 婷婷五月在线视频 | 精品一区二区三区蜜桃 | 久久网一区二区 | 亚洲区在线播放 | 亚洲国产精品毛片 | 久久蜜桃精品 | 一区二区在线观看av | 人人爱国产 | 污的视频在线观看 | 欧美怡红院视频 | 国产视频1区| 黄网站在线观看视频 | 中文字幕一区二区三区四区五区 | 色婷婷亚洲综合 | 免费色片 | 伊人成人在线视频 | 亚洲欧美日韩在线一区二区 | 欧美h在线观看 | 欧美午夜精品久久久久久孕妇 | 亚洲一二三四在线 | 荷兰av | 中文字幕日韩在线播放 | 成人毛片软件 | 成人免费毛片入口 | 国产黄频在线观看 | 超碰98| 国产精品天美传媒入口 | 国产成人97精品免费看片 | 成人3d动漫在线观看 | 欧美一区二区三区电影 | 欧美成人一二三区 | 一级片av| 免费看毛片网站 | 婷婷毛片 | 欧美午夜精品久久久久久孕妇 | 四虎影视免费 | 国产精品麻豆一区二区三区 | 丰满大乳少妇在线观看网站 | 亚洲色图另类图片 | 亚洲综合一区中 | 一级片观看 | 午夜免费影院 | 精品99久久久久成人网站免费 | 欧美一区二区三区免费看 | 91视频在线免费看 | 日本在线不卡一区二区三区 |