数据可视化及其重要性:Python
Data visualization is an important skill to possess for anyone trying to extract and communicate insights from data. In the field of machine learning, visualization plays a key role throughout the entire process of analysis.
對于任何試圖從數(shù)據(jù)中提取和傳達見解的人來說,數(shù)據(jù)可視化都是一項重要技能。 在機器學習領域,可視化在整個分析過程中都扮演著關鍵角色。
Why do we need to visualize the data?
為什么我們需要可視化數(shù)據(jù)?
Let’s say, we have data set of Car Sales across four continents in the first 11 months.
假設我們在前11個月?lián)碛兴拇笾薜钠囦N售數(shù)據(jù)集。
Car Sales from Jan to Nov1月至11月的汽車銷量It is pretty cumbersome to analyze each column separately and draw some conclusions by the above data. So, what we generally do is, summarize the data and deduce some insights from it. Now, let’s see how the sales have performed in each continent when compared to others, for that, we’ll calculate the average of Discount and Sales for each continent,
分別分析各列并根據(jù)上述數(shù)據(jù)得出一些結論是非常麻煩的。 因此,我們通常要做的是匯總數(shù)據(jù)并從中得出一些見解。 現(xiàn)在,讓我們看看與其他大陸相比,每個大陸的銷售情況如何,為此,我們將計算每個大陸的折扣和銷售平均值,
Average of Discount and Sales折扣和銷售平均值It looks like the Sales have been pretty equal across the continents for the first 11 months. Let’s also take a look at the Standard Deviation of each column by further inspecting the data,
前11個月,各大洲的銷售情況似乎相當。 讓我們通過進一步檢查數(shù)據(jù)來查看每列的標準差,
Standard Deviation across the continents各大洲的標準差So, by the above data, we can infer that the performance of the sales has been the same when compared to the continents. See, this is where the summary statistics tend to mislead.
因此,根據(jù)以上數(shù)據(jù),我們可以推斷出與各大洲相比,銷售業(yè)績是相同的。 瞧,這就是匯總統(tǒng)計數(shù)據(jù)容易引起誤解的地方。
If we plot the Sales performance across the Discount rate from the above data in Python on a scatter plot, we get the following graphs.
如果我們根據(jù)散點圖上Python中上述數(shù)據(jù)在折現(xiàn)率上繪制Sales性能,則會得到以下圖形。
Scatter Plot散點圖Each of the continents had employed a different strategy to boost their sales and their discount rate, and the sales numbers were also quite different across all of them. It is difficult to understand the pattern or the strategy of each of the continents using the numbers alone. So, that’s why it is important to Visualize the data instead of drawing the conclusions based on only numbers.
每個大洲都采用了不同的策略來提高銷售量和折扣率,并且所有銷售量的差異也很大。 僅憑數(shù)字很難理解每個大洲的格局或戰(zhàn)略。 因此,這就是為什么要可視化數(shù)據(jù)而不是僅基于數(shù)字得出結論很重要的原因。
The above data-set is a modified version of Anscombe’s quartet, they were constructed in 1973 by the statistician Francis Anscombe, to counter the impression among statisticians that “numerical calculations are exact, but graphs are rough.”
上面的數(shù)據(jù)集是Anscombe四重奏的修改版本,它們是由統(tǒng)計學家Francis Anscombe于1973年構建的,目的是抵消統(tǒng)計學家的印象,即“數(shù)值計算是精確的,但圖形是粗糙的”。
You can find more about Anscombe’s quartet here.
您可以在此處找到有關Anscombe四重奏的更多信息。
So, now comes the million-dollar question,
因此,現(xiàn)在出現(xiàn)了百萬美元的問題,
我們應該使用哪個Python庫進行數(shù)據(jù)可視化? (Which Python Library should we use for Data Visualization?)
Python has some of the most interactive data visualization tools. The most basic plot types are shared between multiple libraries, but others are only available in certain libraries.
Python具有一些最具交互性的數(shù)據(jù)可視化工具。 最基本的繪圖類型在多個庫之間共享,但是其他類型僅在某些庫中可用。
The three main data visualization libraries used by every data scientist is:
每個數(shù)據(jù)科學家使用的三個主要的數(shù)據(jù)可視化庫是:
1. Matplotlib (1. Matplotlib)
Matplotlib is the most popular data visualization library of Python. It is used to generate simple yet powerful visualizations. Everyone, from beginners to seasoned professionals in Data science, Matplotlib is the most widely used library for plotting.
Matplotlib是最受歡迎的Python數(shù)據(jù)可視化庫。 它用于生成簡單而強大的可視化。 從初學者到經(jīng)驗豐富的數(shù)據(jù)科學專業(yè)人士,Matplotlib是最廣泛使用的繪圖庫。
Advantages:
優(yōu)點:
2. Seaborn (2. Seaborn)
The Python library Seaborn is a data visualization library based on Matplotlib. Seaborn provides a variety of visualization patterns. It is more integrated to work with Pandas dataframe compared to matplotlib. Seaborn is widely used for statistics visualization because it has some of the best statistical tasks built with-in.
Python庫Seaborn是基于Matplotlib的數(shù)據(jù)可視化庫。 Seaborn提供了多種可視化模式。 與matplotlib相比,它與Pandas數(shù)據(jù)框的集成度更高。 Seaborn被廣泛用于統(tǒng)計可視化,因為它具有一些內(nèi)置的最佳統(tǒng)計任務。
Advantages:
優(yōu)點:
3. Seaborn works with the whole dataset as a whole compared to matplotlib which deals with dataframes and arrays.
3.與處理數(shù)據(jù)幀和數(shù)組的matplotlib相比,Seaborn可以處理整個數(shù)據(jù)集。
3.密謀 (3. Plotly)
Plotly provides interactive plots and is easily readable to an audience who doesn’t have much knowledge of reading plots. Plotly is mostly used for handing the geographical, scientific, statistical, and financial data.
Plotly提供交互式繪圖,對于不了解繪圖的讀者很容易理解。 Plotly主要用于處理地理,科學,統(tǒng)計和財務數(shù)據(jù)。
Advantages:
優(yōu)點:
3. While using Plotly, if we mouse over on the Graph, it shows the values of the axis at that particular point.
3.使用Plotly時,如果將鼠標懸停在Graph上,它將顯示該特定點處的軸值。
There are some more data visualization libraries available in Python like Bokeh, Altair, ggplot, etc. But, the ones mentioned above are the most common and widely used libraries across the world.
Python中還有更多可用的數(shù)據(jù)可視化庫,例如Bokeh,Altair,ggplot等。但是,上面提到的那些是世界上最常見且使用最廣泛的庫。
結論 (Conclusion)
In this article first, we learned why it is important to visualize the data instead of inferring solely based on datasheets. After that, we have seen the different types of data visualization libraries in Python. There are a wide variety of data visualization tools available in Python apart from the ones discussed and mentioned above. It is important to familiarize yourself with the libraries before proceeding with a particular approach.
首先,在本文中,我們了解了為什么對數(shù)據(jù)進行可視化而不是僅基于數(shù)據(jù)表進行推斷很重要。 之后,我們看到了Python中不同類型的數(shù)據(jù)可視化庫。 除了上面討論和提到的工具外,Python還提供了各種各樣的數(shù)據(jù)可視化工具。 在繼續(xù)使用特定方法之前,一定要熟悉這些庫,這一點很重要。
Thank you for reading and Happy Coding!!!
感謝您的閱讀和快樂編碼!!!
在這里查看我以前有關Python的文章 (Check out my previous articles about Python here)
Pandas: Python
熊貓:Python
Matplotlib: Python
Matplotlib:Python
NumPy: Python
NumPy:Python
Time Complexity and Its Importance in Python
時間復雜度及其在Python中的重要性
Python Recursion or Recursive Function in Python
Python中的Python遞歸或遞歸函數(shù)
Python Programs to check for Armstrong Number (n digit) and Fenced Matrix
用于檢查Armstrong編號(n位)和柵欄矩陣的Python程序
Python: Problems for Basics Reference — Swapping, Factorial, Reverse Digits, Pattern Print
Python:基本參考問題-交換,階乘,反向數(shù)字,圖案打印
翻譯自: https://levelup.gitconnected.com/data-visualization-and-its-importance-python-7599c1092a09
總結
以上是生活随笔為你收集整理的数据可视化及其重要性:Python的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: openai-gpt_为什么到处都看到G
- 下一篇: 梦到逮到鱼又跑了好不好