日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

六月:手动学数据分析(task02)

發(fā)布時間:2023/12/8 编程问答 36 豆豆
生活随笔 收集整理的這篇文章主要介紹了 六月:手动学数据分析(task02) 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

復(fù)習(xí): 在前面我們已經(jīng)學(xué)習(xí)了Pandas基礎(chǔ),第二章我們開始進入數(shù)據(jù)分析的業(yè)務(wù)部分,在第二章第一節(jié)的內(nèi)容中,我們學(xué)習(xí)了數(shù)據(jù)的清洗,這一部分十分重要,只有數(shù)據(jù)變得相對干凈,我們之后對數(shù)據(jù)的分析才可以更有力。而這一節(jié),我們要做的是數(shù)據(jù)重構(gòu),數(shù)據(jù)重構(gòu)依舊屬于數(shù)據(jù)理解(準(zhǔn)備)的范圍。

# Time: 2021-06-16 # 本文有少量備注,并對文章內(nèi)容進行了優(yōu)化 # 目標(biāo)是成為【優(yōu)秀學(xué)習(xí)者】 # 總結(jié)不易,望點贊鼓勵

文章目錄

  • 【task 02】數(shù)據(jù)清洗和特征處理
  • 第二章:數(shù)據(jù)清洗和特征處理
      • 2.1 數(shù)據(jù)的合并
        • 2.1.1 任務(wù)一:載入四份被分割的數(shù)據(jù)
        • 2.1.2 任務(wù)二:使用concat方法,合并兩CSV文件
        • 2.1.3 任務(wù)三:使用concat方法,兩表縱向合并
        • 2.1.4 任務(wù)四:join方法和append:完成任務(wù)二和任務(wù)三
        • 2.1.5 任務(wù)五:使用pd.merge和append方法:完成任務(wù)二和任務(wù)三的任務(wù)
        • 2.1.6 任務(wù)六:完成的數(shù)據(jù)保存為result.csv
      • 2.2 換一種角度看數(shù)據(jù)
        • 2.2.1 任務(wù)一:將我們的數(shù)據(jù)變?yōu)镾eries類型的數(shù)據(jù)

【task 02】數(shù)據(jù)清洗和特征處理

<--------感謝評論區(qū)指正,內(nèi)容已更新!--------->

第二章:數(shù)據(jù)清洗和特征處理

import numpy as np import pandas as pd

2.1 數(shù)據(jù)的合并

2.1.1 任務(wù)一:載入四份被分割的數(shù)據(jù)

將data文件夾里面的所有數(shù)據(jù)都載入,我們看到四分?jǐn)?shù)據(jù),是將上一講完整數(shù)據(jù)行、列進行了切割:

  • left_up:左上部分
  • left_down:左下部分
  • right-up:右上部分
  • right_down:右下部分
text_left_up = pd.read_csv("data/train-left-up.csv") text_left_down = pd.read_csv("data/train-left-down.csv") text_right_up = pd.read_csv("data/train-right-up.csv") text_right_down = pd.read_csv("data/train-right-down.csv") text_left_up.head() PassengerIdSurvivedPclassName01234
103Braund, Mr. Owen Harris
211Cumings, Mrs. John Bradley (Florence Briggs Th...
313Heikkinen, Miss. Laina
411Futrelle, Mrs. Jacques Heath (Lily May Peel)
503Allen, Mr. William Henry
  • 【乘客ID】【是否幸存】【艙位等級】【名字】
text_left_down.head() PassengerIdSurvivedPclassName01234
44002Kvillner, Mr. Johan Henrik Johannesson
44112Hart, Mrs. Benjamin (Esther Ada Bloomfield)
44203Hampe, Mr. Leon
44303Petterson, Mr. Johan Emil
44412Reynaldo, Ms. Encarnacion
  • 【乘客ID】【是否幸存】【艙位等級】【名字】
text_right_down.head() SexAgeSibSpParchTicketFareCabinEmbarked01234
male31.000C.A. 1872310.500NaNS
female45.011F.C.C. 1352926.250NaNS
male20.0003457699.500NaNS
male25.0103470767.775NaNS
female28.00023043413.000NaNS
  • 【性別】【年齡】【兄弟姐妹個數(shù)】【父母孩子個數(shù)】【船票信息】【票價】【船艙】【登船口】
text_right_up.head() SexAgeSibSpParchTicketFareCabinEmbarked01234
male22.010A/5 211717.2500NaNS
female38.010PC 1759971.2833C85C
female26.000STON/O2. 31012827.9250NaNS
female35.01011380353.1000C123S
male35.0003734508.0500NaNS
  • 【性別】【年齡】【兄弟姐妹個數(shù)】【父母孩子個數(shù)】【船票信息】【票價】【船艙】【登船口】

2.1.2 任務(wù)二:使用concat方法,合并兩CSV文件

  • pd.concat()
import pandas as pd pd.concat(object,axis=0,join='outer',join_axes=None,ignore_index=False,keys=None,levels=None,names=None,verify_integrity=False)

參數(shù)含義

  • objs:Series,DataFrame或Panel對象的序列或映射。如果傳遞了dict,則排序的鍵將用作鍵參數(shù),除非它被傳遞,在這種情況下,將選擇值(見下文)。任何無對象將被靜默刪除,除非它們都是無,在這種情況下將引發(fā)一個ValueError。
  • axis:{0,1,…},默認(rèn)為0。0是行,1是列。
  • join:{‘inner’,‘outer’},默認(rèn)為“outer”。如何處理其他軸上的索引。outer為聯(lián)合和inner為交集。
  • ignore_index:boolean,default False。如果為True,請不要使用并置軸上的索引值。結(jié)果軸將被標(biāo)記為0,…,n-1。如果要連接其中并置軸沒有有意義的索引信息的對象,這將非常有用。注意,其他軸上的索引值在連接中仍然受到尊重。
  • join_axes:Index對象列表。用于其他n-1軸的特定索引,而不是執(zhí)行內(nèi)部/外部設(shè)置邏輯。
  • keys:序列,默認(rèn)值無。使用傳遞的鍵作為最外層構(gòu)建層次索引。如果為多索引,應(yīng)該使用元組。
  • levels:序列列表,默認(rèn)值無。用于構(gòu)建MultiIndex的特定級別(唯一值)。否則,它們將從鍵推斷。
  • names:list,default無。結(jié)果層次索引中的級別的名稱。
  • verify_integrity:boolean,default False。檢查新連接的軸是否包含重復(fù)項。這相對于實際的數(shù)據(jù)串聯(lián)可能是非常昂貴的。
  • copy:boolean,default True。如果為False,請勿不必要地復(fù)制數(shù)據(jù)。

【默認(rèn)形式】
默認(rèn)形式是改行,列對齊

frames = [df1, df2, df3] #DF型 result = pd.concat(frames) #默認(rèn)是0 需要合并是行

【用KEY來區(qū)分不同表的來源】

result=pd.concat(frames,keys=['x','y','z'])


【列上的合并,axis=1】

result = pd.concat([df1, df4], axis=1)

  • 默認(rèn)join = ‘outer’,為取并集的關(guān)系,有相同索引的連接【如圖行索引2.3】,確實的NaN

【列上合并,內(nèi)聯(lián)join='inner’取交】

result = pd.concat([df1, df4], axis=1, join='inner')

【join_axes】

如果是join_axes的參數(shù)傳入,可以指定根據(jù)那個軸來對齊數(shù)據(jù)

result=pd.concat([df1,df4],axis=1,join_axes=[df1.index])

  • 列合并,以df1的索引為軸,將df4與其連接,缺失的用NaN

【任務(wù)要求】將數(shù)據(jù)train-left-up.csv和train-right-up.csv橫向合并為一張表,并保存這張表為result_up

list_up = [text_left_up,text_right_up] result_up = pd.concat(list_up,axis=1) result_up PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked01234...434435436437438
103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
503Allen, Mr. William Henrymale35.0003734508.0500NaNS
....................................
43501Silvey, Mr. William Bairdmale50.0101350755.9000E44S
43611Carter, Miss. Lucile Polkfemale14.012113760120.0000B96 B98S
43703Ford, Miss. Doolina Margaret "Daisy"female21.022W./C. 660834.3750NaNS
43812Richards, Mrs. Sidney (Emily Hocking)female24.0232910618.7500NaNS
43901Fortune, Mr. Markmale64.01419950263.0000C23 C25 C27S

439 rows × 12 columns

  • 現(xiàn)將表構(gòu)成list,然后在作為concat的輸入

2.1.3 任務(wù)三:使用concat方法,兩表縱向合并

使用concat方法:將train-left-down和train-right-down橫向合并為一張表,并保存這張表為result_down。然后將上邊的result_up和result_down縱向合并為result。

list_down=[text_left_down,text_right_down] result_down = pd.concat(list_down,axis=1) result_down PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked01234...447448449450451
44002Kvillner, Mr. Johan Henrik Johannessonmale31.000C.A. 1872310.500NaNS
44112Hart, Mrs. Benjamin (Esther Ada Bloomfield)female45.011F.C.C. 1352926.250NaNS
44203Hampe, Mr. Leonmale20.0003457699.500NaNS
44303Petterson, Mr. Johan Emilmale25.0103470767.775NaNS
44412Reynaldo, Ms. Encarnacionfemale28.00023043413.000NaNS
....................................
88702Montvila, Rev. Juozasmale27.00021153613.000NaNS
88811Graham, Miss. Margaret Edithfemale19.00011205330.000B42S
88903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.450NaNS
89011Behr, Mr. Karl Howellmale26.00011136930.000C148C
89103Dooley, Mr. Patrickmale32.0003703767.750NaNQ

452 rows × 12 columns

result = pd.concat([result_up,result_down]) result PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked01234...447448449450451
103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
503Allen, Mr. William Henrymale35.0003734508.0500NaNS
....................................
88702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
89011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

result.loc[1].head() PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked11
211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
44112Hart, Mrs. Benjamin (Esther Ada Bloomfield)female45.011F.C.C. 1352926.2500NaNS
  • 我們會發(fā)現(xiàn) 表是拼起來了 但是第一列索引是亂的

【解決】用到了drop

  • drop=True就是把原來的索引index列去掉,重置index。

  • drop=False就是保留原來的索引,添加重置的index。

result_1 = pd.concat([result_up,result_down]).reset_index(drop = True) result_1 PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked01234...886887888889890
103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
503Allen, Mr. William Henrymale35.0003734508.0500NaNS
....................................
88702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
89011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

2.1.4 任務(wù)四:join方法和append:完成任務(wù)二和任務(wù)三

resul_up = text_left_up.join(text_right_up) resul_up PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked01234...434435436437438
103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
503Allen, Mr. William Henrymale35.0003734508.0500NaNS
....................................
43501Silvey, Mr. William Bairdmale50.0101350755.9000E44S
43611Carter, Miss. Lucile Polkfemale14.012113760120.0000B96 B98S
43703Ford, Miss. Doolina Margaret "Daisy"female21.022W./C. 660834.3750NaNS
43812Richards, Mrs. Sidney (Emily Hocking)female24.0232910618.7500NaNS
43901Fortune, Mr. Markmale64.01419950263.0000C23 C25 C27S

439 rows × 12 columns

result_down = text_left_down.join(text_right_down) result = result_up.append(result_down) result PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked01234...447448449450451
103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
503Allen, Mr. William Henrymale35.0003734508.0500NaNS
....................................
88702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
89011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

2.1.5 任務(wù)五:使用pd.merge和append方法:完成任務(wù)二和任務(wù)三的任務(wù)

  • pd.merge()

以index為鏈接鍵,需要同時設(shè)置left_index= True 和 right_index= True,或者left_index設(shè)置的同時,right_on指定某個Key。總的來說就是需要指定left、right鏈接的鍵,可以同時是key、index或者混合使用。

result_up = pd.merge(text_left_up,text_right_up,left_index=True,right_index=True) result_down = pd.merge(text_left_down,text_right_down,left_index=True,right_index=True) result = result_up.append(result_down) result PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked01234...447448449450451
103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
503Allen, Mr. William Henrymale35.0003734508.0500NaNS
....................................
88702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
89011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

【思考】對比merge、join以及concat的方法的不同以及相同。思考一下在任務(wù)四和任務(wù)五的情況下,為什么都要求使用DataFrame的append方法,如何只要求使用merge或者join可不可以完成任務(wù)四和任務(wù)五呢?

2.1.6 任務(wù)六:完成的數(shù)據(jù)保存為result.csv

result.to_csv('result_task02.csv')

2.2 換一種角度看數(shù)據(jù)

2.2.1 任務(wù)一:將我們的數(shù)據(jù)變?yōu)镾eries類型的數(shù)據(jù)

# 將完整的數(shù)據(jù)加載出來 text = pd.read_csv('result_task02.csv') text.head() # 代碼寫在這里 unit_result=text.stack().head(30) unit_result 0 Unnamed: 0 0PassengerId 1Survived 0Pclass 3Name Braund, Mr. Owen HarrisSex maleAge 22.0SibSp 1Parch 0Ticket A/5 21171Fare 7.25Embarked S 1 Unnamed: 0 1PassengerId 2Survived 1Pclass 1Name Cumings, Mrs. John Bradley (Florence Briggs Th...Sex femaleAge 38.0SibSp 1Parch 0Ticket PC 17599Fare 71.2833Cabin C85Embarked C 2 Unnamed: 0 2PassengerId 3Survived 1Pclass 3Name Heikkinen, Miss. Laina dtype: object #將代碼保存為unit_result,csv unit_result.to_csv('unit_result.csv') test = pd.read_csv('unit_result.csv') test Unnamed: 0Unnamed: 10012345678910111213141516171819
0Unnamed: 00
0PassengerId1
0Survived0
0Pclass3
0NameBraund, Mr. Owen Harris
0Sexmale
0Age22.0
0SibSp1
0Parch0
0TicketA/5 21171
0Fare7.25
0EmbarkedS
1Unnamed: 01
1PassengerId2
1Survived1
1Pclass1
1NameCumings, Mrs. John Bradley (Florence Briggs Th...
1Sexfemale
1Age38.0
1SibSp1
  • 這個stack函數(shù)是干什么的?

stack是棧的意思 其實就是講列表傳入到棧中,每條記錄 收尾相接

函數(shù)原型為:stack(arrays, axis=0),arrays可以傳數(shù)組和列表。

總結(jié)

以上是生活随笔為你收集整理的六月:手动学数据分析(task02)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。