组队学习-动手学数据分析-第二章第2、3节
復(fù)習(xí):在前面我們已經(jīng)學(xué)習(xí)了Pandas基礎(chǔ),第二章我們開(kāi)始進(jìn)入數(shù)據(jù)分析的業(yè)務(wù)部分,在第二章第一節(jié)的內(nèi)容中,我們學(xué)習(xí)了數(shù)據(jù)的清洗,這一部分十分重要,只有數(shù)據(jù)變得相對(duì)干凈,我們之后對(duì)數(shù)據(jù)的分析才可以更有力。而這一節(jié),我們要做的是數(shù)據(jù)重構(gòu),數(shù)據(jù)重構(gòu)依舊屬于數(shù)據(jù)理解(準(zhǔn)備)的范圍。
開(kāi)始之前,導(dǎo)入numpy、pandas包和數(shù)據(jù)
# 導(dǎo)入基本庫(kù) import numpy as np import pandas as pd df = pd.DataFrame([[1.4, np.nan],[np.nan, 2]],index=['a','b'],columns=['one','two']) df| 1.4 | NaN |
| NaN | 2.0 |
| 1 | 0 | 3 | Braund, Mr. Owen Harris |
| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... |
| 3 | 1 | 3 | Heikkinen, Miss. Laina |
| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) |
| 5 | 0 | 3 | Allen, Mr. William Henry |
| ... | ... | ... | ... |
| 435 | 0 | 1 | Silvey, Mr. William Baird |
| 436 | 1 | 1 | Carter, Miss. Lucile Polk |
| 437 | 0 | 3 | Ford, Miss. Doolina Margaret "Daisy" |
| 438 | 1 | 2 | Richards, Mrs. Sidney (Emily Hocking) |
| 439 | 0 | 1 | Fortune, Mr. Mark |
439 rows × 4 columns
2 第二章:數(shù)據(jù)重構(gòu)
2.4 數(shù)據(jù)的合并
2.4.1 任務(wù)一:將data文件夾里面的所有數(shù)據(jù)都載入,觀察數(shù)據(jù)的之間的關(guān)系
#寫(xiě)入代碼 df_left_up = pd.read_csv("data/train-left-up.csv") df_left_down = pd.read_csv("data/train-left-down.csv") df_right_up = pd.read_csv("data/train-right-up.csv") df_right_down = pd.read_csv("data/train-right-down.csv") #寫(xiě)入代碼 df_left_up.head()| 1 | 0 | 3 | Braund, Mr. Owen Harris |
| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... |
| 3 | 1 | 3 | Heikkinen, Miss. Laina |
| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) |
| 5 | 0 | 3 | Allen, Mr. William Henry |
| 440 | 0 | 2 | Kvillner, Mr. Johan Henrik Johannesson |
| 441 | 1 | 2 | Hart, Mrs. Benjamin (Esther Ada Bloomfield) |
| 442 | 0 | 3 | Hampe, Mr. Leon |
| 443 | 0 | 3 | Petterson, Mr. Johan Emil |
| 444 | 1 | 2 | Reynaldo, Ms. Encarnacion |
| male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
| female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
| female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
| male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
| male | 31.0 | 0 | 0 | C.A. 18723 | 10.500 | NaN | S |
| female | 45.0 | 1 | 1 | F.C.C. 13529 | 26.250 | NaN | S |
| male | 20.0 | 0 | 0 | 345769 | 9.500 | NaN | S |
| male | 25.0 | 1 | 0 | 347076 | 7.775 | NaN | S |
| female | 28.0 | 0 | 0 | 230434 | 13.000 | NaN | S |
【提示】結(jié)合之前我們加載的train.csv數(shù)據(jù),大致預(yù)測(cè)一下上面的數(shù)據(jù)是什么
2.4.2:任務(wù)二:使用concat方法:將數(shù)據(jù)train-left-up.csv和train-right-up.csv橫向合并為一張表,并保存這張表為result_up
#寫(xiě)入代碼 result_up = pd.concat([df_left_up,df_right_up],axis =1) result_up| 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
| 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 435 | 0 | 1 | Silvey, Mr. William Baird | male | 50.0 | 1 | 0 | 13507 | 55.9000 | E44 | S |
| 436 | 1 | 1 | Carter, Miss. Lucile Polk | female | 14.0 | 1 | 2 | 113760 | 120.0000 | B96 B98 | S |
| 437 | 0 | 3 | Ford, Miss. Doolina Margaret "Daisy" | female | 21.0 | 2 | 2 | W./C. 6608 | 34.3750 | NaN | S |
| 438 | 1 | 2 | Richards, Mrs. Sidney (Emily Hocking) | female | 24.0 | 2 | 3 | 29106 | 18.7500 | NaN | S |
| 439 | 0 | 1 | Fortune, Mr. Mark | male | 64.0 | 1 | 4 | 19950 | 263.0000 | C23 C25 C27 | S |
439 rows × 12 columns
2.4.3 任務(wù)三:使用concat方法:將train-left-down和train-right-down橫向合并為一張表,并保存這張表為result_down。然后將上邊的result_up和result_down縱向合并為result。
#寫(xiě)入代碼 result_down = pd.concat([df_left_down,df_right_down],axis=1) result_down| 440 | 0 | 2 | Kvillner, Mr. Johan Henrik Johannesson | male | 31.0 | 0 | 0 | C.A. 18723 | 10.500 | NaN | S |
| 441 | 1 | 2 | Hart, Mrs. Benjamin (Esther Ada Bloomfield) | female | 45.0 | 1 | 1 | F.C.C. 13529 | 26.250 | NaN | S |
| 442 | 0 | 3 | Hampe, Mr. Leon | male | 20.0 | 0 | 0 | 345769 | 9.500 | NaN | S |
| 443 | 0 | 3 | Petterson, Mr. Johan Emil | male | 25.0 | 1 | 0 | 347076 | 7.775 | NaN | S |
| 444 | 1 | 2 | Reynaldo, Ms. Encarnacion | female | 28.0 | 0 | 0 | 230434 | 13.000 | NaN | S |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.000 | NaN | S |
| 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.000 | B42 | S |
| 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.450 | NaN | S |
| 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.000 | C148 | C |
| 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.750 | NaN | Q |
452 rows × 12 columns
result = pd.concat([result_up,result_down]) result| 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
| 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
| 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
| 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
| 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
| 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
2.4.4 任務(wù)四:使用DataFrame自帶的方法join方法和append:完成任務(wù)二和任務(wù)三的任務(wù)
#寫(xiě)入代碼 result_up = df_left_up.join(df_right_up) result_down = df_left_down.join(df_right_down) result = result_up.append(result_down) result C:\Users\Ji-Luo\AppData\Local\Temp\ipykernel_11888\552922610.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.result = result_up.append(result_down)| 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
| 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
| 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
| 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
| 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
| 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
2.4.5 任務(wù)五:使用Panads的merge方法和DataFrame的append方法:完成任務(wù)二和任務(wù)三的任務(wù)
#寫(xiě)入代碼 result_up = pd.merge(df_left_up,df_right_up,left_index=True,right_index=True) result_up| 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
| 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 435 | 0 | 1 | Silvey, Mr. William Baird | male | 50.0 | 1 | 0 | 13507 | 55.9000 | E44 | S |
| 436 | 1 | 1 | Carter, Miss. Lucile Polk | female | 14.0 | 1 | 2 | 113760 | 120.0000 | B96 B98 | S |
| 437 | 0 | 3 | Ford, Miss. Doolina Margaret "Daisy" | female | 21.0 | 2 | 2 | W./C. 6608 | 34.3750 | NaN | S |
| 438 | 1 | 2 | Richards, Mrs. Sidney (Emily Hocking) | female | 24.0 | 2 | 3 | 29106 | 18.7500 | NaN | S |
| 439 | 0 | 1 | Fortune, Mr. Mark | male | 64.0 | 1 | 4 | 19950 | 263.0000 | C23 C25 C27 | S |
439 rows × 12 columns
result_down = pd.merge(df_left_down,df_right_down,left_index=True,right_index=True) result_down| 440 | 0 | 2 | Kvillner, Mr. Johan Henrik Johannesson | male | 31.0 | 0 | 0 | C.A. 18723 | 10.500 | NaN | S |
| 441 | 1 | 2 | Hart, Mrs. Benjamin (Esther Ada Bloomfield) | female | 45.0 | 1 | 1 | F.C.C. 13529 | 26.250 | NaN | S |
| 442 | 0 | 3 | Hampe, Mr. Leon | male | 20.0 | 0 | 0 | 345769 | 9.500 | NaN | S |
| 443 | 0 | 3 | Petterson, Mr. Johan Emil | male | 25.0 | 1 | 0 | 347076 | 7.775 | NaN | S |
| 444 | 1 | 2 | Reynaldo, Ms. Encarnacion | female | 28.0 | 0 | 0 | 230434 | 13.000 | NaN | S |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.000 | NaN | S |
| 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.000 | B42 | S |
| 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.450 | NaN | S |
| 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.000 | C148 | C |
| 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.750 | NaN | Q |
452 rows × 12 columns
result = result_up.append(result_down) result C:\Users\Ji-Luo\AppData\Local\Temp\ipykernel_11888\552922610.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.result = result_up.append(result_down)| 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
| 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
| 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
| 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
| 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
| 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
【思考】對(duì)比merge、join以及concat的方法的不同以及相同。思考一下在任務(wù)四和任務(wù)五的情況下,為什么都要求使用DataFrame的append方法,如何只要求使用merge或者join可不可以完成任務(wù)四和任務(wù)五呢?
2.4.6 任務(wù)六:完成的數(shù)據(jù)保存為result.csv
#寫(xiě)入代碼 result.to_csv('result.csv')2.5 換一種角度看數(shù)據(jù)
2.5.1 任務(wù)一:將我們的數(shù)據(jù)變?yōu)镾eries類型的數(shù)據(jù)
#寫(xiě)入代碼 unit_result=result.stack().head(20) unit_result 0 PassengerId 1Survived 0Pclass 3Name Braund, Mr. Owen HarrisSex maleAge 22.0SibSp 1Parch 0Ticket A/5 21171Fare 7.25Embarked S 1 PassengerId 2Survived 1Pclass 1Name Cumings, Mrs. John Bradley (Florence Briggs Th...Sex femaleAge 38.0SibSp 1Parch 0Ticket PC 17599 dtype: object #寫(xiě)入代碼復(fù)習(xí):在前面我們已經(jīng)學(xué)習(xí)了Pandas基礎(chǔ),第二章我們開(kāi)始進(jìn)入數(shù)據(jù)分析的業(yè)務(wù)部分,在第二章第一節(jié)的內(nèi)容中,我們學(xué)習(xí)了數(shù)據(jù)的清洗,這一部分十分重要,只有數(shù)據(jù)變得相對(duì)干凈,我們之后對(duì)數(shù)據(jù)的分析才可以更有力。而這一節(jié),我們要做的是數(shù)據(jù)重構(gòu),數(shù)據(jù)重構(gòu)依舊屬于數(shù)據(jù)理解(準(zhǔn)備)的范圍。
開(kāi)始之前,導(dǎo)入numpy、pandas包和數(shù)據(jù)
# 導(dǎo)入基本庫(kù) import numpy as np import pandas as pd # 載入上一個(gè)任務(wù)人保存的文件中:result.csv,并查看這個(gè)文件 df = pd.read_csv('result.csv') df.head()| 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1.0 | 0.0 | A/5 21171 | 7.2500 | NaN | S |
| 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1.0 | 0.0 | PC 17599 | 71.2833 | C85 | C |
| 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0.0 | 0.0 | STON/O2. 3101282 | 7.9250 | NaN | S |
| 3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1.0 | 0.0 | 113803 | 53.1000 | C123 | S |
| 4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0.0 | 0.0 | 373450 | 8.0500 | NaN | S |
2 第二章:數(shù)據(jù)重構(gòu)
第一部分:數(shù)據(jù)聚合與運(yùn)算
2.6 數(shù)據(jù)運(yùn)用
2.6.1 任務(wù)一:通過(guò)教材《Python for Data Analysis》P303、Google or anything來(lái)學(xué)習(xí)了解GroupBy機(jī)制
#寫(xiě)入心得2.4.2:任務(wù)二:計(jì)算泰坦尼克號(hào)男性與女性的平均票價(jià)
# 寫(xiě)入代碼 df.groupby('Sex')['Fare'].mean() Sex female 44.479818 male 25.523893 Name: Fare, dtype: float64在了解GroupBy機(jī)制之后,運(yùn)用這個(gè)機(jī)制完成一系列的操作,來(lái)達(dá)到我們的目的。
下面通過(guò)幾個(gè)任務(wù)來(lái)熟悉GroupBy機(jī)制。
2.4.3:任務(wù)三:統(tǒng)計(jì)泰坦尼克號(hào)中男女的存活人數(shù)
# 寫(xiě)入代碼 df.groupby('Sex')['Survived'].sum() Sex female 233 male 109 Name: Survived, dtype: int642.4.4:任務(wù)四:計(jì)算客艙不同等級(jí)的存活人數(shù)
# 寫(xiě)入代碼 df.groupby('Pclass')['Survived'].sum() Pclass 1 136 2 87 3 119 Name: Survived, dtype: int64【提示:】表中的存活那一欄,可以發(fā)現(xiàn)如果還活著記為1,死亡記為0
【思考】從數(shù)據(jù)分析的角度,上面的統(tǒng)計(jì)結(jié)果可以得出那些結(jié)論
#思考心得 df.groupby('Pclass')['Survived'].apply(lambda x: x.sum() / x.count()) Pclass 1 0.629630 2 0.472826 3 0.242363 Name: Survived, dtype: float64【思考】從任務(wù)二到任務(wù)三中,這些運(yùn)算可以通過(guò)agg()函數(shù)來(lái)同時(shí)計(jì)算。并且可以使用rename函數(shù)修改列名。你可以按照提示寫(xiě)出這個(gè)過(guò)程嗎?
#思考心得 df.groupby('Sex').agg({'Fare':'mean','Pclass':'count'}).rename(columns = {'Fare':'mean_fare','Pclass':'count_pclass'})| 44.479818 | 314 |
| 25.523893 | 577 |
2.4.5:任務(wù)五:統(tǒng)計(jì)在不同等級(jí)的票中的不同年齡的船票花費(fèi)的平均值
# 寫(xiě)入代碼 df.groupby(['Pclass','Age'])['Fare'].mean() Pclass Age 1 0.92 151.55002.00 151.55004.00 81.858311.00 120.000014.00 120.0000... 3 61.00 6.237563.00 9.587565.00 7.750070.50 7.750074.00 7.7750 Name: Fare, Length: 182, dtype: float642.4.6:任務(wù)六:將任務(wù)二和任務(wù)三的數(shù)據(jù)合并,并保存到sex_fare_survived.csv
# 寫(xiě)入代碼 df1 = df.groupby('Sex')['Fare'].mean() df2 = df.groupby('Sex')['Survived'].sum() pd.merge(df1,df2,on='Sex')| 44.479818 | 233 |
| 25.523893 | 109 |
2.4.7:任務(wù)七:得出不同年齡的總的存活人數(shù),然后找出存活人數(shù)最多的年齡段,最后計(jì)算存活人數(shù)最高的存活率(存活人數(shù)/總?cè)藬?shù))
# 寫(xiě)入代碼 df['Age2'] = pd.cut(df['Age'],[0,5,15,30,50,80]) chrs = df.groupby('Age2')['Survived'].sum() chrs Age2 (0, 5] 44 (5, 15] 39 (15, 30] 326 (30, 50] 241 (50, 80] 64 Name: Survived, dtype: int64 # 寫(xiě)入代碼 chrs.idxmax() Interval(15, 30, closed='right') # 寫(xiě)入代碼# 各年齡段/各年齡段總?cè)藬?shù)存活率 df.groupby('Age2')['Survived'].apply(lambda x:x.sum() / x.count()) Age2 (0, 5] 0.704545 (5, 15] 0.461538 (15, 30] 0.358896 (30, 50] 0.423237 (50, 80] 0.343750 Name: Survived, dtype: float64 # 寫(xiě)入代碼 # 總?cè)藬?shù) df.shape[0] # 存活人數(shù)# 各年齡段/總?cè)藬?shù)存活率 df.groupby('Age2')['Survived'].apply(lambda x:x.sum() / df.shape[0]) Age2 (0, 5] 0.034792 (5, 15] 0.020202 (15, 30] 0.131313 (30, 50] 0.114478 (50, 80] 0.024691 Name: Survived, dtype: float64總結(jié)
以上是生活随笔為你收集整理的组队学习-动手学数据分析-第二章第2、3节的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: Day14-HTTP协议 web开发概
- 下一篇: 测试出现org.apache.ibati