seaborn 教程_使用Seaborn进行数据可视化教程
seaborn 教程
“Seaborn makes the exploratory data analysis phase of your data science project beautiful and painless”
“ Seaborn使您的數(shù)據(jù)科學(xué)項(xiàng)目的探索性數(shù)據(jù)分析階段變得美麗而輕松”
介紹 (Introduction)
This tutorial is targeted at the audience who have worked with Seaborn, but had lost the touch of it. I hope that, by reading this article, you can recollect Seaborn visualization style and commands to get started with your data exploration. This tutorial layout is such that, it shows how and what visualizations you can do using Seaborn, given you have x number of numerical features and y number of categorical features.
本教程針對(duì)的是與Seaborn合作但失去了聯(lián)系的讀者。 我希望通過(guò)閱讀本文,您可以回顧Seaborn的可視化樣式和命令,以開(kāi)始進(jìn)行數(shù)據(jù)探索。 本教程的布局是這樣的,它顯示了給定x個(gè)數(shù)字特征和y個(gè)類別特征的情況,以及如何使用Seaborn進(jìn)行可視化。
Lets import Seaborn:
讓我們導(dǎo)入Seaborn:
import seaborn as sns數(shù)據(jù)集: (The Dataset:)
We will be using the tips dataset available within the seaborn library.
我們將使用seaborn庫(kù)中提供的提示數(shù)據(jù)集。
Load the dataset using:
使用以下方法加載數(shù)據(jù)集:
tips = sns.load_dataset('tips')total_bill(numerical variable) : Total bill for the tabletip(Numeric): Tip for the waiter serving the table sex (Categorical): Gender of the bill payer (Male/Female)smoker(Categorical): Whether the bill payer was a smoker (Yes/No)day(Categorical): Day of the week (Sun, Mon… etc)table_size(Numerical) : Capacity of the tabledate: date and time of the bill payment
total_bill(數(shù)字變量):平板電腦的帳單總額(數(shù)字):服務(wù)表性別的服務(wù)員小費(fèi)(分類):帳單付款人的性別(男/女)吸煙者(分類):帳單付款人是否是吸煙者(是/否)天(類別):星期幾(星期日,星期一等)table_size(數(shù)值):表格的容量日期:帳單支付的日期和時(shí)間
海洋風(fēng)格: (Seaborn Styles:)
Let’s start with different styles available in Seaborn. Each style is differentiated by background colour, grid layout and axis ticks of the plot. There are five basic styles available in Seaborn: Dark, Darkgrid, White, White Grid and Ticks.
讓我們從Seaborn中可用的不同樣式開(kāi)始。 每種樣式都通過(guò)背景顏色,網(wǎng)格布局和繪圖的軸刻度來(lái)區(qū)分。 Seaborn中有五種基本樣式:深色,深色網(wǎng)格,白色,白色網(wǎng)格和刻度。
sns.set_style('dark')sns.set_style('darkgrid')sns.set_style('ticks')sns.set_style('white')sns.set_style('whitegrid')可視化 (Visualizations)
Let’s look at various visualizations we can do using Seaborn. Each segment below shows how to perform visualizations given the number of categorical and numerical variables that are available to you.
讓我們看一下我們可以使用Seaborn進(jìn)行的各種可視化。 下面的每個(gè)部分都顯示了如何根據(jù)給定的類別和數(shù)字變量數(shù)量執(zhí)行可視化。
一個(gè)數(shù)值變量: (One Numerical Variable:)
If we have one numerical variable, we can analyse the distribution of that variable.
如果我們有一個(gè)數(shù)值變量,我們可以分析該變量的分布。
g = sns.distplot(tips.tip)g.set_title('Tip Amount Distribution');g = sns.distplot(tips.tip,kde=False)
g.set_title('Tip Amount Histogram');g = sns.distplot(tips.tip,rug=True)
g.set_title('Tip Amount Distribution with rug');
We can observe that the tip amount data is approximately normal.
我們可以觀察到小費(fèi)金額數(shù)據(jù)大致正常。
一個(gè)分類變量 (One categorical variable)
If we have one categorical variable, we can do a count plot which shows frequency of occurrence of each value of the categorical variable.
如果我們有一個(gè)分類變量,我們可以做一個(gè)計(jì)數(shù)圖,顯示分類變量每個(gè)值的出現(xiàn)頻率。
g = sns.catplot(x="day",kind='count',order=['Thur','Fri','Sun','Sat'],data=tips);g.fig.suptitle("Frequency of days in the tips dataset [Count Plot]",y=1.05);兩個(gè)數(shù)值變量 (Two Numerical variables)
To analyse relationship between two numerical variables, we can do scatter plots in seaborn.
為了分析兩個(gè)數(shù)值變量之間的關(guān)系,我們可以繪制seaborn中的散點(diǎn)圖。
g = sns.relplot(x="total_bill",y="tip",data=tips,kind='scatter');g.fig.suptitle('Relationship between continuous variables [Scatter Plot]',y=1.05);Seaborn also makes it easy to visualize density distribution of the relationship between two numerical variables.
Seaborn還使可視化兩個(gè)數(shù)值變量之間關(guān)系的密度分布變得容易。
g = sns.jointplot(x="total_bill",y='tip',data=tips,kind='kde');g.fig.suptitle('Density distribution among tips and total_bill [Joint Plot]',y=1.05);kde plot is another plot to visualize the distribution of relationship between two continuous variables.
kde圖是另一個(gè)可視化兩個(gè)連續(xù)變量之間關(guān)系分布的圖。
g = sns.jointplot(x="total_bill",y='tip',data=tips,kind='kde');g.fig.suptitle('Density distribution among tips and total_bill [Joint Plot]',y=1.05);We can also plot a regression line with confidence intervals with one numerical variable as dependent variable and other as independent variable.
我們還可以繪制一條具有置信區(qū)間的回歸線,其中一個(gè)數(shù)值變量為因變量,另一數(shù)值為自變量。
g = sns.lmplot(x="total_bill",y="tip",data=tips);g.fig.suptitle('Relationship b/w tip and total_bill [Scatter Plot + Regression Line]',y=1.05);Scatter Plot with Regression line帶回歸線的散點(diǎn)圖If the independent variable is datetime, we can do a lineplot, which is also a timeseries plot.
如果自變量是日期時(shí)間,我們可以做一個(gè)線圖,它也是一個(gè)時(shí)間序列圖。
g = sns.lineplot(x="date",y="total_bill",data=tips);g.set_title('Total bill amount over time [Line plot]');兩個(gè)數(shù)值和一個(gè)類別變量 (Two Numerical and One Categorical Variable)
With two numerical variables and one categorical variable, we can do all the plots mentioned in the two numerical variables section . The additional dimension of categorical variable can be used as a colour/marker to distinguish the categorical variable values in the plot.
使用兩個(gè)數(shù)值變量和一個(gè)類別變量,我們可以繪制兩個(gè)數(shù)值變量部分中提到的所有圖。 分類變量的附加維度可以用作顏色/標(biāo)記,以區(qū)分繪圖中的分類變量值。
g = sns.relplot(x="total_bill",y="tip",hue='sex',kind='scatter',data=tips);g.fig.suptitle('Relationship b/w totalbill and tip distinguished by gender [Scatter Plot]',y=1.05);g = sns.relplot(x="total_bill",y="tip",style='sex',kind='scatter',data=tips)
g.fig.suptitle('Relationship b/w totalbill distinguished by gender as marker [Scatter Plot]',y=1.05);
Alternatively, we can use each categorical variable value as a group to plot relationship between two numerical variables for each categorical variable value.
或者,我們可以將每個(gè)分類變量值作為一組使用,以繪制每個(gè)分類變量值的兩個(gè)數(shù)字變量之間的關(guān)系。
g = sns.relplot(x="total_bill",y="tip",col='sex',kind='scatter',data=tips);g.fig.suptitle('Relationship between totalbill and tip by gender [Scatter Plot]',y=1.05);三個(gè)數(shù)值變量 (Three Numerical Variables)
If we have three numerical variables, we can do a scatter plot for two variables and third variables can be used as size of the points in the scatter plot.
如果我們有三個(gè)數(shù)值變量,我們可以對(duì)兩個(gè)變量做一個(gè)散點(diǎn)圖,第三個(gè)變量可以用作散點(diǎn)圖中點(diǎn)的大小。
g = sns.relplot(x="total_bill",y="tip",size='table_size',kind='scatter',data=tips);g.fig.suptitle('total bill vs tip distinguished by table size [Scatter Plot]',y=1.05);三個(gè)數(shù)值變量和一個(gè)類別變量: (Three Numerical Variables and One Categorical variable:)
If we have three numerical and one categorical variable, the same plot mentioned in the above section can be plotted for each value of the categorical variable.
如果我們有三個(gè)數(shù)值變量和一個(gè)分類變量,則可以為分類變量的每個(gè)值繪制上節(jié)中提到的同一圖。
g = sns.relplot(x="total_bill",y="tip",col='sex',size='table_size',kind='scatter',data=tips);g.fig.suptitle('Total bill vs tip by gender distinguished by table size [Scatter Plot]',y=1.03);一個(gè)數(shù)字變量和一個(gè)類別變量: (One Numerical and One Categorical variable:)
This is probably the most basic, common and useful plot in data visualization. If we have one numerical variable and one categorical variable, we can do various plots like bar plot and strip plot.
這可能是數(shù)據(jù)可視化中最基本,最通用和最有用的圖。 如果我們有一個(gè)數(shù)值變量和一個(gè)類別變量,我們可以做各種圖,如條形圖和條形圖。
g = sns.catplot(x="day",y="tip",kind='bar',order=['Thur','Fri','Sun','Sat'],ci=False,data=tips);g.fig.suptitle('Tip amount by day of week [Bar Plot]',y=1.05);g = sns.catplot(x="day",y="tip",kind='strip',order=['Thur','Fri','Sun','Sat'],ci=False,data=tips);g.fig.suptitle('Tip amount by day along with tips as scatter [Strip Plot]',y=1.03);The swarm plot and violin plot as shown below allow us to visualization of distribution of numerical variable within each categorical variable.
如下所示的群圖和小提琴圖使我們可以直觀地看到每個(gè)類別變量中數(shù)值變量的分布。
g = sns.catplot(x="day",y="tip",kind='swarm',order=['Thur','Fri','Sun','Sat'],ci=False,data=tips);g.fig.suptitle('Tip amount by day along with tip distribution [Swarm Plot]',y=1.05);g = sns.catplot(x="day",y="tip",kind='violin',order=['Thur','Fri','Sun','Sat'],data=tips);g.fig.suptitle('Tips distributions by day [Violin Plot]');We can visualize the Inter Quartile Range(25th percentile to 75th percentile) of continuous variable within each value of categorical variable using a point plot.
我們可以使用點(diǎn)圖可視化類別變量的每個(gè)值內(nèi)的連續(xù)變量的四分位間距(第25個(gè)百分點(diǎn)至第75個(gè)百分點(diǎn))。
g = sns.catplot(x="day",y="tip",kind='point',order=['Thur','Fri','Sun','Sat'],data=tips,capsize=0.5);g.fig.suptitle('IQR Range of tip by day [Point Plot]',y=1.05);一個(gè)數(shù)值變量和兩個(gè)分類變量: (One Numerical and Two Categorical variables:)
With one numerical and two categorical variables, we can use all the plots mentioned in the above section and accommodate the additional third categorical variable either as a column variable or as a subgroup in each subplot as shown below.
使用一個(gè)數(shù)字變量和兩個(gè)類別變量,我們可以使用上一節(jié)中提到的所有圖,并在每個(gè)子圖中以列變量或子組的形式容納額外的第三類變量,如下所示。
g = sns.catplot(x="day",y="tip",kind='bar',col='smoker',order=['Thur','Fri','Sun','Sat'],ci=False,data=tips);g.fig.suptitle('Tip amount by day of week by smoker/non-smoker [Bar Plot]',y=1.05);g = sns.catplot(x="day",y="tip",kind='bar',hue='smoker',order=['Thur','Fri','Sun','Sat'],ci=False,data=tips);g.fig.suptitle('Tips by day with smoker/non-smoker subgroup [Grouped Bar Plot]',y=1.05);一個(gè)數(shù)值變量和三個(gè)類別變量: (One Numerical and Three Categorical Variables:)
With one numerical and three categorical, we can do all the visualizations mentioned in the one categorical and one numerical variable section and accommodate additional two categorical variables with one variable as a column variable/ row variable of the figure and other as a sub group in each sub plot.
使用一個(gè)數(shù)值和三個(gè)類別,我們可以完成一個(gè)類別和一個(gè)數(shù)值變量部分中提到的所有可視化,并容納另外兩個(gè)類別變量,其中一個(gè)變量作為圖中的列變量/行變量,另一個(gè)作為子組子圖。
g = sns.catplot(x="day",y="tip",kind='bar',hue='smoker',col='sex',order=['Thur','Fri','Sun','Sat'],ci=False,data=tips);g.fig.suptitle('Tips by day with smoker/non-smoker subgroup by gender [Grouped Bar Plot]',y=1.05);超過(guò)三個(gè)連續(xù)變量: (More than three continuous variables:)
Finally, If we have more than three numerical variables, we can use heat map or pariplot. With these plots, we visaualize relationship between each and every other numerical variable in a single plot.
最后,如果我們具有三個(gè)以上的數(shù)值變量,則可以使用熱圖或偶極圖。 通過(guò)這些圖,我們將單個(gè)圖中每個(gè)其他數(shù)值變量之間的關(guān)系歸類化。
g = sns.heatmap(tips.corr());g.set_title('correlation between continuous variables [Heat Map]');g = sns.pairplot(tips);g.fig.suptitle('Relationship between continuous variables [Patiplot]',y=1.03);設(shè)置標(biāo)題,標(biāo)簽和圖例 (Setting Titles, Labels and legends)
Some Seaborn plots return matplotlib AxesSubplot while others return FacetGrid (If you forgot what are matplotlib AxesSubplots, check my notes on matplotlib for reference).
一些Seaborn圖返回matplotlib AxesSubplot,而另一些返回FacetGrid(如果您忘記了什么是matplotlib AxesSubplots,請(qǐng)查看我在matplotlib上的注釋以供參考)。
The FacetGrid is a grid(2D Array) of matplotlib AxesSubPlots. You can access each subplot using array indices and set labels, titles for each plot.
FacetGrid是matplotlib AxesSubPlots的網(wǎng)格(二維數(shù)組)。 您可以使用數(shù)組索引訪問(wèn)每個(gè)子圖,并設(shè)置標(biāo)簽,每個(gè)圖的標(biāo)題。
g = sns.relplot(x="total_bill",y="tip",data=tips,kind='scatter');g.axes[0,0].set_title('Relationship between continuous variables [Scatter Plot]');
g.axes[0,0].set_xlabel('Total Bill Amount');
g.axes[0,0].set_ylabel('Tip Amount');
If the plot returns AxesSubplot, you can use AxesSubplot methods to set titles and legends.
如果該圖返回AxesSubplot,則可以使用AxesSubplot方法設(shè)置標(biāo)題和圖例。
g = sns.distplot(tips.tip)g.set_title('Tip Amount Probablity Distribution');
g.set_xlabel('Tip Amount')
g.set_ylabel('probability')
For FacetGrid, you can get figure object from the FacetGrid object and set title for the figure object.
對(duì)于FacetGrid,可以從FacetGrid對(duì)象獲取圖形對(duì)象,并為圖形對(duì)象設(shè)置標(biāo)題。
g = sns.relplot(x="total_bill",y="tip",col='sex',kind='scatter',data=tips);g.fig.suptitle('Relationship between totalbill and tip by gender [Scatter Plot]',y=1.05);結(jié)論 (Conclusion)
Hopefully, you find this tutorial helpful in getting started with making beautiful visualizations, easily with seaborn. Although Seaborn is easy to use, it also offers a lot of customisation, which is an advanced topic. Once you are comfortable with basic plots, you can explore Seaborn further as you use it for your visualizations.
希望本教程對(duì)Seaborn輕松制作精美的可視化效果有所幫助。 盡管Seaborn易于使用,但它還提供了許多自定義功能,這是一個(gè)高級(jí)主題。 熟悉基本圖解后,可以在將Seaborn用于可視化時(shí)進(jìn)一步進(jìn)行探索。
翻譯自: https://towardsdatascience.com/data-visualisation-tutorial-using-seaborn-26e1ef9043db
seaborn 教程
總結(jié)
以上是生活随笔為你收集整理的seaborn 教程_使用Seaborn进行数据可视化教程的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: excel中乘法函数是哪个(Excel中
- 下一篇: alexnet 结构_AlexNet的体