日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习偏差方差_机器学习101 —偏差方差难题

發(fā)布時(shí)間:2023/12/15 编程问答 31 豆豆
生活随笔 收集整理的這篇文章主要介紹了 机器学习偏差方差_机器学习101 —偏差方差难题 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

機(jī)器學(xué)習(xí)偏差方差

Determining the performance of our model is one of the most crucial steps in the machine learning process. Understanding the bias-variance trade-off is a significant step towards interpreting the results of our model. Despite its trivial nature, the concepts behind this trade-off are simple to grasp and will allow us to create better and more useful models.

確定模型的性能是機(jī)器學(xué)習(xí)過程中最關(guān)鍵的步驟之一。 理解偏差方差的權(quán)衡是朝解釋模型結(jié)果邁出的重要一步。 盡管具有微不足道的性質(zhì),但這種權(quán)衡取舍的概念仍然易于掌握,將使我們能夠創(chuàng)建更好,更有用的模型。

The generalization error of any machine learning model can be defined as the sum of three different errors—

任何機(jī)器學(xué)習(xí)模型的泛化誤差都可以定義為三個(gè)不同誤差的總和-

  • Irreducible Error: As the name suggests, it can’t be reduced regardless of the algorithm we choose. It is introduced into our model because of the way we frame our problem and may be caused by unknown variables that affect the prediction of our target variable.

    不可減少的錯(cuò)誤:顧名思義,無論我們選擇哪種算法,都無法減少錯(cuò)誤 。 由于我們將問題框架化的方式而被引入到我們的模型中,并且可能是由影響目標(biāo)變量預(yù)測的未知變量引起的。

  • Bias Error: It occurs when our model makes the wrong assumptions

    偏差錯(cuò)誤:當(dāng)我們的模型做出錯(cuò)誤的假設(shè)時(shí)會發(fā)生

  • Variance Error: It is caused by sensitivity to small variations in the training set

    方差錯(cuò)誤:這是由于對訓(xùn)練集中的小變化敏感

  • When we discuss prediction models, prediction errors can be decomposed into two main subcomponents we care about: error due to “bias” and error due to “variance”. There is a tradeoff between a model’s ability to minimize bias and variance. Understanding these two types of error can help us diagnose model results and avoid the mistake of over- or under-fitting. ~ Scott Fortman-Roe

    當(dāng)我們討論預(yù)測模型時(shí),預(yù)測誤差可以分解為我們關(guān)注的兩個(gè)主要子組件:“偏差”引起的誤差和“方差”引起的誤差。 在模型最小化偏差和方差的能力之間需要權(quán)衡。 了解這兩種錯(cuò)誤類型可以幫助我們診斷模型結(jié)果,并避免過擬合或欠擬合的錯(cuò)誤。 ?斯科特·福特曼·羅

    In this blog post, we’re going to focus on the bias error, variance error and the bias-variance trade-off.

    在此博客文章中,我們將重點(diǎn)介紹偏差誤差,方差誤差和偏差方差的權(quán)衡。

    偏差誤差 (Bias Error)

    Bias is the amount by which the expected prediction of our model differs from the actual target value, i.e. how far our predictions are from the real values. Essentially, the bias of our model is determined by the assumptions it makes to predict our target value. Simply stated, a high bias means that the underlying patterns are not captured by our learning algorithm. Such models subsequently produce a large error on both the training and test sets.

    偏差是模型的預(yù)期預(yù)測與實(shí)際目標(biāo)值相差的量,即我們的預(yù)測與實(shí)際值相差多遠(yuǎn)。 本質(zhì)上,我們模型的偏差是由其預(yù)測目標(biāo)值的假設(shè)所決定的。 簡而言之,高偏差意味著我們的學(xué)習(xí)算法無法捕獲基本模式。 這樣的模型隨后會在訓(xùn)練集和測試集上產(chǎn)生很大的誤差。

    • Decision Trees, k-Nearest Neighbors and Support Vector Machines are low bias machine learning algorithms

      決策樹,k最近鄰和支持向量機(jī)是低偏差機(jī)器學(xué)習(xí)算法

    • Linear Regression and Logistic Regression are high bias machine learning algorithms

      線性回歸和邏輯回歸是高偏差機(jī)器學(xué)習(xí)算法

    方差誤差 (Variance Error)

    It is defined as the amount by which the prediction of our model would changes if we use a different training set. Models with a high variance tend to pay more attention to the data present in the training set and don’t generalize well, i.e. they don’t perform well on the test set. In other words, such machine learning algorithms try to fit themselves to the training data as much as possible. By doing so they make complex assumptions which may only be true for the training data and hence they perform much worse on the test set.

    它定義為如果我們使用不同的訓(xùn)練集,模型預(yù)測的變化量。 具有高方差的模型傾向于更加關(guān)注訓(xùn)練集中的數(shù)據(jù),并且不能很好地概括,即它們在測試集中的表現(xiàn)不佳。 換句話說,這樣的機(jī)器學(xué)習(xí)算法試圖使自己盡可能地適合訓(xùn)練數(shù)據(jù)。 這樣,他們會做出復(fù)雜的假設(shè),這可能僅適用于訓(xùn)練數(shù)據(jù),因此它們在測試集上的表現(xiàn)要差得多。

    • Linear Regression and Logistic Regression are low variance machine learning algorithms

      線性回歸和邏輯回歸是低方差機(jī)器學(xué)習(xí)算法

    • Decision Trees, k-Nearest Neighbors and Support Vector Machines are high variance machine learning algorithms

      決策樹,k最近鄰和支持向量機(jī)是高方差機(jī)器學(xué)習(xí)算法

    偏差-偏差權(quán)衡 (Bias-Variance Trade-off)

    Now, let’s try and understand the trade-off between bias and variance with the help of a bullseye diagram. One thing we already know is that bias and variance are inversely proportional to one another, i.e. if bias increases then variance decreases and vice versa.

    現(xiàn)在,讓我們嘗試通過靶心圖了解偏差和方差之間的權(quán)衡。 我們已經(jīng)知道的一件事是,偏差和方差成反比,即,如果偏差增加,則方差減小,反之亦然。

    We assume that the center of the diagram is a model that perfectly predicts the target values, and the further we are from the center the worse our predictions get. If we repeat our model building process with a few changes here and there each time we get multiple hits on our target, each of which represents the performance of an individual model.

    我們假設(shè)圖的中心是一個(gè)可以完美預(yù)測目標(biāo)值的模型,并且距離中心越遠(yuǎn),我們的預(yù)測就越糟。 如果我們重復(fù)進(jìn)行模型構(gòu)建過程,并且每次在目標(biāo)上遇到多次打擊時(shí),都會在此處和那里進(jìn)行一些更改,每個(gè)打擊都代表單個(gè)模型的性能。

    Bulls-eye diagram depicting the Bias-Variance Tradeoff描繪偏差方差折衷的靶心圖

    To learn how to interpret our results, let’s go through the different cases we may observe:

    要了解如何解釋我們的結(jié)果,我們來研究一下我們可能觀察到的不同情況:

  • Low Bias & Low Variance

    低偏差和低方差

    • Ideal situation for our machine learning model

      我們的機(jī)器學(xué)習(xí)模型的理想情況
    • The error of prediction is as low as possible

      預(yù)測誤差盡可能低
    • The predictions don’t change much when we choose a different training set

      當(dāng)我們選擇不同的訓(xùn)練集時(shí),預(yù)測不會有太大變化

    2. High Bias & High Variance

    2.高偏差和高方差

    • Worst possible situation for our machine learning model

      我們的機(jī)器學(xué)習(xí)模型可能出現(xiàn)的最糟糕情況
    • The error of prediction is extremely high

      預(yù)測誤差極高
    • The predictions fluctuate massively when we use a different training set

      當(dāng)我們使用不同的訓(xùn)練集時(shí),預(yù)測會大幅波動

    3. High Bias & Low Variance

    3.高偏差低方差

    • Often referred to as underfitting, which means that our model is unable to capture the underlying patterns present in our data

      通常稱為欠擬合,這意味著我們的模型無法捕獲數(shù)據(jù)中存在的潛在模式

    • Usually occurs due to the presence of a small amount of data

      通常是由于存在少量數(shù)據(jù)而發(fā)生
    Underfitting vs Overfitting擬合不足與擬合過度

    4. Low Bias & High Variance

    4.低偏差和高方差

    • Also known as overfitting, which means that our model finds underlying patterns present in our data but also interprets the noise as useful information

      也稱為過擬合 ,這意味著我們的模型可以找到數(shù)據(jù)中存在的潛在模式,但也可以將噪聲解釋為有用的信息

    • It occurs when we train our model over data which hasn’t been cleaned properly

      當(dāng)我們針對未正確清理的數(shù)據(jù)訓(xùn)練模型時(shí)會發(fā)生這種情況

    摘要 (Summary)

    At its heart, the bias-variance trade-off aims to avoid both underfitting and overfitting. As the complexity of our model increases the bias reduces and while the variance also increases. In other words, if we keep adding more features to our model our primary concern shifts from reducing the bias to reducing the variance of our model.

    從本質(zhì)上講,偏差方差折衷旨在避免擬合不足和過度擬合。 隨著模型復(fù)雜度的增加,偏差減小,而方差也增大。 換句話說,如果我們繼續(xù)向模型添加更多功能,則我們的主要關(guān)注點(diǎn)將從減少偏差轉(zhuǎn)變?yōu)闇p少模型的方差。

    Error Complexity Curve誤差復(fù)雜度曲線

    As mentioned earlier, the generalization error of our model comprises of three different errors and can be depicted mathematically as follows:

    如前所述,我們模型的泛化誤差包括三個(gè)不同的誤差,可以用以下數(shù)學(xué)方式表示:

    The dotted line in the error complexity curve displayed above denotes the optimum model complexity and is considered the sweet spot for our machine learning model. We can say that the sweet spot has been found when the increase in bias is equal to the reduction in variance of our model. Mathematically we get:

    顯示在以上錯(cuò)誤復(fù)雜曲線的虛線表示的最佳模型的復(fù)雜性,被認(rèn)為是甜蜜點(diǎn)我們的機(jī)器學(xué)習(xí)模型。 可以說,當(dāng)偏差的增加等于模型方差的減少時(shí),已經(jīng)找到了最佳點(diǎn)。 數(shù)學(xué)上我們得到:

    If the complexity of our model goes past the sweet spot then we are overfitting our model, and if we do not reach the sweet spot then we are underfitting our model.

    如果模型的復(fù)雜性超過了最佳點(diǎn),那么我們就過度擬合了模型,如果我們沒有達(dá)到最佳點(diǎn),那么我們就對模型進(jìn)行了擬合。

    結(jié)語… (Wrapping Up…)

    In essence, we can define the relationship between bias and variance as follows:

    本質(zhì)上,我們可以定義偏差和方差之間的關(guān)系,如下所示:

    • Increasing the bias will decrease the variance; and

      增加偏差將減小方差; 和
    • Increasing the variance will decrease the bias

      增加方差將減少偏差

    Although there is no definitive method to obtain the so called sweet spot, we can do our best to find it by either using appropriate metrics to analyse the performance of our model or by choosing the correct algorithms (and their proper configuration) for our purposes. Thus, we can conclude that the bias-variance trade-off is an important consideration that we can use as a starting point to determine the predictive performance of our machine learning models.

    盡管沒有確定的方法來獲得所謂的最佳位置,但是我們可以通過使用適當(dāng)?shù)亩攘縼矸治瞿P偷男阅?#xff0c;或者通過選擇適合我們目的的正確算法(及其正確配置)來盡力找到它。 因此,我們可以得出結(jié)論,偏差方差折衷是一個(gè)重要的考慮因素,我們可以以此為出發(fā)點(diǎn)來確定機(jī)器學(xué)習(xí)模型的預(yù)測性能。

  • Gentle Introduction to the Bias-Variance Trade-off in Machine Learning

    機(jī)器學(xué)習(xí)中的偏方差權(quán)衡的溫和介紹

  • Understanding the Bias-Variance Tradeoff

    了解偏差-方差折衷

  • Bias-Variance Tradeoff — Bhavesh Bhatt

    偏差偏差權(quán)衡— Bhavesh Bhatt

  • Gain Access to Expert View — Subscribe to DDI Intel

    獲得訪問專家視圖的權(quán)限- 訂閱DDI Intel

    翻譯自: https://medium.com/datadriveninvestor/machine-learning-101-the-bias-variance-conundrum-f4143ba9f179

    機(jī)器學(xué)習(xí)偏差方差

    總結(jié)

    以上是生活随笔為你收集整理的机器学习偏差方差_机器学习101 —偏差方差难题的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。