数据可视化工具_数据可视化
數(shù)據(jù)可視化工具
Visualizations are a great way to show the story that data wants to tell. However, not all visualizations are built the same. My rule of thumb is stick to simple, easy to understand, and well labeled graphs. Line graphs, bar charts, and histograms always work best. The most recognized libraries for visualizations are matplotlib and seaborn. Seaborn is built on top of matplotlib, so it is worth looking at matplotlib first, but in this article we’ll look at matplotlib only. Let’s get started. First, we will import all the libraries we will be working with.
可視化是顯示數(shù)據(jù)要講述的故事的好方法。 但是,并非所有可視化文件的構(gòu)建都是相同的。 我的經(jīng)驗(yàn)法則是堅(jiān)持簡(jiǎn)單,易于理解且標(biāo)簽清晰的圖形。 折線圖,條形圖和直方圖總是最有效。 最受認(rèn)可的可視化庫是matplotlib和seaborn。 Seaborn是建立在matplotlib之上的,因此值得首先看一下matplotlib,但是在本文中,我們將只看一下matplotlib。 讓我們開始吧。 首先,我們將導(dǎo)入將要使用的所有庫。
import numpy as npimport matplotlib.pyplot as plt%matplotlib inlineWe imported numpy, so we can generate random data. From matplotlib, we imported pyplot. If you are working on visualizations in jupyter notebook, you can call the %matplotlib inline command. This will allow jupyter notebook to display your visualizations directly under the code that was ran. If you’d like an interactive chart, you can call the %matplotlib command. This will allow you to manipulate your visualizations such as zoom in, zoom out, and move them around their axis.
我們導(dǎo)入了numpy,因此我們可以生成隨機(jī)數(shù)據(jù)。 從matplotlib中,我們導(dǎo)入了pyplot。 如果要在jupyter Notebook中進(jìn)行可視化,則可以調(diào)用%matplotlib內(nèi)聯(lián)命令。 這將使jupyter Notebook在運(yùn)行的代碼下直接顯示可視化效果。 如果需要交互式圖表,可以調(diào)用%matplotlib命令。 這將允許您操縱可視化效果,例如放大,縮小并圍繞其軸移動(dòng)它們。
直方圖 (Histograms)
Fist, lets take a look at histograms on matplotlib. We will look at a normal distribution. Let’s build it with numpy and visualize it with matplotlib.
拳頭,讓我們看看matplotlib上的直方圖。 我們將看一個(gè)正態(tài)分布。 讓我們使用numpy構(gòu)建它,并使用matplotlib對(duì)其進(jìn)行可視化。
normal_distribution = np.random.normal(0,1,10000)plt.hist(normal_distribution)plt.show()Great! We had a histogram. But, what did we just do? First, we took 10,000 random sample from a distribution of mean 0 and standard deviation of 1. Then, we called the method hist() from matplotlib. Last, we called the show() method to display our figure. However, our histogram looks kind of…squared. Fear not! You can modify the width of each bin with the bins argument. Matplotlib defaults to 10 if an argument isn’t given. There are multiple ways to calculate bins, but I prefer to set it to ‘a(chǎn)uto’.
大! 我們有一個(gè)直方圖。 但是,我們只是做什么? 首先,我們從均值為0且標(biāo)準(zhǔn)差為1的分布中抽取了10,000個(gè)隨機(jī)樣本。然后,從matplotlib中調(diào)用了hist()方法。 最后,我們調(diào)用了show()方法來顯示我們的圖形。 但是,我們的直方圖看起來有點(diǎn)...平方。 不要怕! 您可以使用bins參數(shù)修改每個(gè)垃圾箱的寬度。 如果未提供參數(shù),則Matplotlib默認(rèn)為10。 有多種計(jì)算垃圾箱的方法,但我更喜歡將其設(shè)置為“自動(dòng)”。
plt.hist(normal_distribution,bins='auto')plt.show()!
!
Much better. You know what would be twice as fun? If we visualize another distribution, but if we visualize it on the same histogram, then that would be three times as fun. That’s exactly what we are going to do.
好多了。 您知道會(huì)帶來兩倍的樂趣嗎? 如果我們可視化另一個(gè)分布,但是如果我們?cè)谙嗤闹狈綀D中可視化它,那將是三倍的樂趣。 這正是我們要做的。
modified_distribution = np.random.normal(3,1.5,10000)plt.hist(normal_distribution,bins='auto',color='purple',label='Purple')plt.hist(modified_distribution,bins='auto',color='green',alpha=.75,label='Green')plt.legend(['Purple','Green'])plt.show()Wow! Now that’s a fun histogram. What just happen? We went from one blue histogram to one purple and one green. Let’s go over what we added. First, we created another distribution named modified_distribution. Then, we changed the color of each distribution with the color argument, passed the labelargument to name each distribution, and we passed the alpha argument to make the green distribution see through. Last, we passed the name of each distribution to the legend() method. When you have more than one set of data on a single chart, it is required to label the data to be able to tell the data apart. In this example, the data can be told apart easy, but in the real world each data can represent things that cannot be identified by color. For example, green can represent height of male college students, and purple the height of female college students. Of course, if that was the case the X axis would be on a different scale.
哇! 現(xiàn)在,這是一個(gè)有趣的直方圖。 剛剛發(fā)生什么事 我們從一個(gè)藍(lán)色直方圖變?yōu)橐粋€(gè)紫色和一個(gè)綠色。 讓我們來看看添加的內(nèi)容。 首先,我們創(chuàng)建了另一個(gè)名為modified_distribution的發(fā)行版。 然后,我們使用color參數(shù)更改每個(gè)分布的顏色 ,傳遞label參數(shù)以命名每個(gè)分布,然后傳遞alpha參數(shù)使綠色分布透明。 最后,我們將每個(gè)發(fā)行版的名稱傳遞給legend()方法。 如果單個(gè)圖表上有多個(gè)數(shù)據(jù)集,則需要標(biāo)記數(shù)據(jù)以便能夠區(qū)分?jǐn)?shù)據(jù)。 在此示例中,可以輕松區(qū)分?jǐn)?shù)據(jù),但在現(xiàn)實(shí)世界中,每個(gè)數(shù)據(jù)都可以代表無法用顏色標(biāo)識(shí)的事物。 例如,綠色可以代表男性大學(xué)生的身高,而紫色可以代表女性大學(xué)生的身高。 當(dāng)然,如果是這種情況,則X軸將處于不同的比例。
條形圖 (Bar Charts)
Let’s continue the fun. Now we are going to look at bar charts. This kind of charts are really useful when trying to visualize quantities, so let’s look at an example. In this case we will visualize at how people voted when asked about what type of pet they have or would like to have. Let’s randomly generate data.
讓我們繼續(xù)樂趣。 現(xiàn)在我們來看看條形圖。 當(dāng)試圖可視化數(shù)量時(shí),這種圖表非常有用,因此讓我們看一個(gè)示例。 在這種情況下,我們將可視化當(dāng)人們問起他們擁有或想要擁有哪種類型的寵物時(shí)人們?nèi)绾瓮镀薄?讓我們隨機(jī)生成數(shù)據(jù)。
options = ['Cats','Dogs','Parrots','Hamsters']votes = [np.random.randint(10,100) for i in range(len(options)]
votes.sort(reverse=True)
Perfect, we have a list of pets and a list of randomly generated numbers. Notice, we sorted the list in descending order. I like to order list this way because it is easier to see which category is the largest and smallest. Of course, in this example we just ordered the votes without ordering the options that match up to it. In reality, we would have to order both. I found that the easiest way to go about this is to make a dictionary and order the dictionary by values. Click herefor a helpful guide on stackoverflow on how to order dictionaries by values. Now, let’s visualize our data.
完美,我們有一個(gè)寵物清單和一個(gè)隨機(jī)生成的數(shù)字清單。 注意,我們以降序?qū)α斜磉M(jìn)行排序。 我喜歡以此方式訂購商品,因?yàn)檫@樣可以更輕松地查看最大和最小的類別。 當(dāng)然,在此示例中,我們只是對(duì)投票進(jìn)行了排序,而沒有對(duì)與之匹配的選項(xiàng)進(jìn)行排序。 實(shí)際上,我們將必須同時(shí)訂購兩者。 我發(fā)現(xiàn)最簡(jiǎn)單的方法是制作字典并按值對(duì)字典進(jìn)行排序。 單擊此處以獲取有關(guān)如何按值對(duì)字典進(jìn)行排序的stackoverflow的有用指南。 現(xiàn)在,讓我們可視化我們的數(shù)據(jù)。
plt.bar(options,votes)plt.title('Favorite Pet Survey')plt.xlabel('Options')plt.ylabel('Votes')plt.show()Great! We have an amazing looking graph. Notice, we can easily tell cats got the most votes, and hamsters got the least votes. Let’s look at the code. After we defined our X and height, we called the bar() method to build a bar chart. We passed options as X and votes as height. Then, we labeled the title, X axis, and y axis with methods title(), xlabel(), and ylabel() respectively. Easy enough! However, this bar chart looks a bit boring. Let’s make it look fun.
大! 我們有一個(gè)驚人的外觀圖。 注意,我們可以很容易地看出貓的得票最多,而倉鼠的得票最少。 讓我們看一下代碼。 定義X和高度后,我們調(diào)用bar()方法來構(gòu)建條形圖。 我們將選項(xiàng)作為X傳遞,將投票作為高度。 然后,我們分別使用方法title(),xlabel()和ylabel()標(biāo)記標(biāo)題,X軸和y軸。 很簡(jiǎn)單! 但是,此條形圖看起來有些無聊。 讓它看起來有趣。
with plt.style.context('ggplot'):plt.bar(options,votes)
plt.title('Favorite Pet Survey')
plt.xlabel('Options')
plt.ylabel('Votes')
plt.show()
This graph is so much fun. How did we do this? Notice, all our code looks mostly the same, but there is important code we added, and we changed the format. We added the with keyword and the context() method from plt.style to change our chart style. Really cool thing is that it only changes it for everything that’s directly under it and indented. It is important to indent the code after the first line. We used the ggplot style to make our graph more fun. Click here to view all the styles available in matplotlib. If we want to compare two datasets with the same options, it is a little harder than in histograms, but it is equally as fun. Let’s say we want to visualize male vs female vote on each category.
該圖非常有趣。 我們是如何做到的? 注意,我們所有的代碼看起來幾乎相同,但是添加了重要的代碼,并且更改了格式。 我們從plt.style添加了with關(guān)鍵字和context()方法來更改圖表樣式。 真正很酷的事情是,它僅針對(duì)直接在其下方并縮進(jìn)的所有內(nèi)容進(jìn)行更改。 在第一行之后縮進(jìn)代碼很重要。 我們使用了ggplot樣式使我們的圖表更加有趣。 單擊此處查看matplotlib中可用的所有樣式。 如果我們想比較兩個(gè)具有相同選項(xiàng)的數(shù)據(jù)集,這比直方圖要難一些,但同樣有趣。 假設(shè)我們要形象化每個(gè)類別的男性和女性投票。
votes_male = votesvotes_female = [np.random.randint(10,100) for i in range(len(options))]import pandas as pdwith plt.style.context('ggplot'):
pd.DataFrame({'male':votes_male,'female':votes_female,index=options).plot(kind='bar')
plt.title('Favorite Pet Survey (Male vs Female)')
plt.xticks(rotation=0)
plt.ylabel('votes')
plt.show()
Lots going on here, but you have seen most of it already. Let’s start from the top. First, we renamed the votes data to votes_male, and we generated new data for votes_female. Then, we imported pandas which is a library to work with data frames. We created a data frame for our data with male and female as our columns and pet options as our index. After, we called the plot() method from the data frame and passed bar for the kind arguement, so we can plot a bar chart. With the data frame plot method, it adds X labels for you, but they are at a 90-degree angle. To fix this, you can call the xticks() method from pyplot and pass the argument rotation 0. This will make the text like the graph above.
這里有很多事情,但是您已經(jīng)看到了大部分。 讓我們從頭開始。 首先,我們將選票數(shù)據(jù)重命名為voices_male,并為votes_female生成了新數(shù)據(jù)。 然后,我們導(dǎo)入了pandas,這是一個(gè)用于處理數(shù)據(jù)框的庫。 我們?yōu)閿?shù)據(jù)創(chuàng)建了一個(gè)數(shù)據(jù)框,其中男性和女性為列,寵物選項(xiàng)為索引。 之后,我們從數(shù)據(jù)框中調(diào)用plot()方法,并通過柱形圖進(jìn)行類型爭(zhēng)論,因此可以繪制條形圖。 使用數(shù)據(jù)框繪圖方法時(shí),它會(huì)為您添加X標(biāo)簽,但它們成90度角。 要解決此問題,您可以從pyplot調(diào)用xticks()方法并傳遞參數(shù)rotation0。這將使文本像上面的圖形一樣。
線形圖 (Line Graph)
Now, let’s look at line graphs. These graphs are great to visualize how Y changes as X changes. Most commonly, they are used to visualize time series data. In this example, we will visualize how much water a new town uses as their population grows.
現(xiàn)在,讓我們看一下折線圖。 這些圖表非常適合可視化Y隨著X的變化。 最常見的是,它們用于可視化時(shí)間序列數(shù)據(jù)。 在此示例中,我們將可視化一個(gè)新城鎮(zhèn)隨著人口增長(zhǎng)而消耗的水量。
town_population = np.linspace(0,10,10)town_water_usage = [i*5 for i in town_population]with plt.style.context('seaborn'):
plt.plot(town_population,town_water_usage)
plt.title('Water Usage of Cool Town by Population')
plt.xlabel('Population (in thousands)')
plt.ylabel('Water usage (in thousand gallons)')
plt.show()
What a nice graph! As you can see, we used everything we learned so far to create this graph. The only difference is the method we called is not as intuitive as the other ones. In this case we called the plot() method. We passed our X and Y, labeled our chart, and we visualized it with our show() method. Let’s add more data. This time, we are going to add the water usage of a nearby town.
多么漂亮的圖! 如您所見,我們使用到目前為止所學(xué)的所有知識(shí)來創(chuàng)建該圖。 唯一的區(qū)別是我們調(diào)用的方法不像其他方法那樣直觀。 在這種情況下,我們稱為plot()方法。 我們傳遞了X和Y,標(biāo)記了圖表,然后使用show()方法將其可視化。 讓我們添加更多數(shù)據(jù)。 這次,我們將增加附近城鎮(zhèn)的用水量。
nearby_town_water_usage = [i*.85 for i in town_water_usage]with plt.style.context('seaborn'):plt.plot(town_population,town_water_usage,label='Cool Town')
plt.plot(town_population,nearby_town_water_usage,label='Lame Town')
plt.title('Water Usage of Cool Town and Lame Town')
plt.xlabel('Population (in thousands)')
plt.ylabel('Water usage (in thousand gallons)')
plt.legend(['Cool Town','Lame Town'])
plt.show()
As you can see we just added another plot(), labeled, each line, updated the title, and we showed a legend of the graph. For the most part is the same process as other graphs. From the graph we can see that Lame Town is actually using less water than Cool town. I guess Lame Town isn’t so lame after all.
如您所見,我們只是添加了另一個(gè)plot(),標(biāo)記為每行,更新了標(biāo)題,并顯示了圖例。 在大多數(shù)情況下,該過程與其他圖形相同。 從圖中可以看出,me腳鎮(zhèn)實(shí)際上比涼爽鎮(zhèn)使用的水少。 我猜La子鎮(zhèn)畢竟不是那么la子。
結(jié)論 (Conclusion)
We covered some of the basics of visualizing data. We even went into how to generate random data! As you can see these are very versatile and efficient ways of showing data. Nothing too crazy, just old school ways of showing the story that the data tells.
我們介紹了可視化數(shù)據(jù)的一些基礎(chǔ)知識(shí)。 我們甚至研究了如何生成隨機(jī)數(shù)據(jù)! 如您所見,這些是顯示數(shù)據(jù)的非常通用和有效的方法。 沒什么太瘋狂的了,只是老式的方式來顯示數(shù)據(jù)所講述的故事。
翻譯自: https://medium.com/@a.colocho/data-visualization-9e151698a921
數(shù)據(jù)可視化工具
總結(jié)
以上是生活随笔為你收集整理的数据可视化工具_数据可视化的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: missforest_missfores
- 下一篇: 敏捷数据科学pdf_敏捷数据科学数据科学