因果关系和相关关系 大数据_数据科学中的相关性与因果关系
因果關(guān)系和相關(guān)關(guān)系 大數(shù)據(jù)
Let’s jump into it right away.
讓我們馬上進入。
相關(guān)性 (Correlation)
Correlation means relationship and association to another variable. For example, a movement in one variable associates with the movement in another variable. For example, ice-cream sales go up as the weather turns hot.
關(guān)聯(lián)是指與另一個變量的關(guān)系和關(guān)聯(lián)。 例如,一個變量的運動與另一變量的運動相關(guān)。 例如,隨著天氣變熱,冰淇淋銷售量上升。
A positive correlation means, the movement is in the same direction (left plot); negative correlation means that variables move in opposite direction (middle plot). The farther right plot is when there no correlation between the variables.
正相關(guān)表示運動方向相同(左圖); 負相關(guān)表示變量沿相反方向移動(中間圖)。 最右邊的圖是變量之間沒有相關(guān)性時。
因果關(guān)系 (Causation)
Causation means that one variable causes another to change, which means one variable is dependent on the other. It is also called cause and effect. One example would be as weather gets hot, people experience more sunburns. In this case, the weather caused an effect which is sunburn.
因果關(guān)系意味著一個變量導(dǎo)致另一個變量改變,這意味著一個變量依賴于另一個變量。 也稱為因果關(guān)系。 一個例子是隨著天氣變熱,人們遭受更多的曬傷。 在這種情況下,天氣會導(dǎo)致曬傷。
Anthony Figueroa Anthony Figueroa攝correlation is not causation關(guān)聯(lián)不是因果關(guān)系相關(guān)與因果差異 (Correlation vs Causation Difference)
Let’s try another example with this visualization. Your computer running out of battery causes it to shut down. It also causes video player to shut down. Now, computer and video player shutting down events are correlated; the actual cause is running out of battery.
讓我們嘗試另一個可視化示例。 您的計算機電池電量耗盡會導(dǎo)致其關(guān)閉。 它還會導(dǎo)致視頻播放器關(guān)閉。 現(xiàn)在,計算機和視頻播放器的關(guān)閉事件是相關(guān)的。 實際原因是電池電量耗盡。
correlation vs causation相關(guān)性與因果關(guān)系為什么這在數(shù)據(jù)科學(xué)中很重要? (Why is this important in data science?)
How many times have you seen studies that imply A causes B. For example, going to the gym results in higher productivity and focus. Is this really causation?
您看過多少次暗示A導(dǎo)致B的研究。例如,去健身房可以提高工作效率和專注力。 這真的是因果關(guān)系嗎?
As a data scientist, you should not let the correlation force your into bias because it can lead to faulty feature engineering and incorrect conclusions.
作為數(shù)據(jù)科學(xué)家,您不應(yīng)讓相關(guān)性強加偏見,因為它可能導(dǎo)致錯誤的特征工程和錯誤的結(jié)論。
Correlation does not imply causation.
相關(guān)并不表示因果關(guān)系。
If you were to write a machine learning model for gym and productivity relationship, instead of focusing on features that are correlated (going to gym), you should focus on actual causes of high performance (hard work, perseverance, routine, etc) to validate cause-and-effect.
如果您要為健身房和生產(chǎn)力之間的關(guān)系編寫機器學(xué)習(xí)模型,而不是專注于相關(guān)的功能(去健身房),則應(yīng)關(guān)注造成高性能的實際原因(努力,毅力,例行等)以進行驗證因果關(guān)系。
R中的相關(guān)性 (Correlation in R)
Let’s say you have a dataset and you want to evaluate if certain features in the dataset are correlated. I am using mtcars dataset, one of the built-in datasets in R.
假設(shè)您有一個數(shù)據(jù)集,并且想要評估數(shù)據(jù)集中的某些特征是否相關(guān)。 我正在使用mtcars數(shù)據(jù)集,這是R中的內(nèi)置數(shù)據(jù)集之一。
library(ggcorrplot)#read mtcars, one of the built in dataset in Rdata(mtcars)#use cor function get correlation
corr <- cor(mtcars)#build correlation plot
ggcorrplot(corr, hc.order = TRUE, type = "lower", lab = TRUE)
Try it yourself. Copy & paste the above code in R.
自己嘗試。 將以上代碼復(fù)制并粘貼到R中。
output from above code snippet以上代碼段的輸出When you run the code, you should get an output with a correlation plot and values. A value closer to +1 means positive correlation and negative correlation if closer to -1. In the above example, you can observe that disp and wt have a positive correlation of +0.89; whereas, mpg and cyl have a negative correlation of -0.85.
運行代碼時,應(yīng)該獲得帶有相關(guān)圖和值的輸出。 接近+1的值表示正相關(guān),如果接近-1則意味著負相關(guān)。 在上面的示例中,您可以觀察到disp和wt呈正相關(guān),為+0.89 ; mpg和cyl呈負相關(guān)-0.85 。
因果影響方法 (Causal Impact Methods)
Causation is harder to conclude than correlation but possible. One of the most common methods of determining causal impact is through experimentation and incremental studies.
因果關(guān)系比關(guān)聯(lián)性更難斷定,但可能。 確定因果影響的最常見方法之一是通過實驗和增量研究。
Photo by Analytics Vidya What’s the difference between Causality and Correlation? 因果攝影和相關(guān)性之間有什么區(qū)別?Continue learning causal impact methods with this video. It covers causal impact methodologies, specifically digital experimentation (A/B testing) and randomization techniques with real-world examples.
繼續(xù)通過本視頻學(xué)習(xí)因果影響方法。 它涵蓋了因果影響方法論,尤其是數(shù)字實驗(A / B測試)和帶有實際示例的隨機化技術(shù)。
Sundas YouTube ChannelSundas YouTube頻道👩🏻?💻 Learn more about me at sundaskhalid.com📝 Connect with me on LinkedIn, Twitter, Instagram, YouTube
👩🏻💻了解更多關(guān)于我在sundaskhalid.com 📝與我連接上LinkedIn , Twitter的 , Instagram , YouTube的
翻譯自: https://medium.com/@sundaskhalid/correlation-vs-causation-in-data-science-66b6cfa702f0
因果關(guān)系和相關(guān)關(guān)系 大數(shù)據(jù)
總結(jié)
以上是生活随笔為你收集整理的因果关系和相关关系 大数据_数据科学中的相关性与因果关系的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到房塌了意味着什么
- 下一篇: 分类结果可视化python_可视化分类结