當前位置：首頁 > 编程语言 > python >内容正文

python

【Python】推荐20个好用到爆的Pandas函数方法

發布時間：2025/3/12 python 35 豆豆

生活随笔收集整理的這篇文章主要介紹了【Python】推荐20个好用到爆的Pandas函数方法小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

今天分享幾個不為人知的pandas函數，大家可能平時看到的不多，但是使用起來倒是非常的方便，也能夠幫助我們數據分析人員大幅度地提高工作效率，同時也希望大家看完之后能夠有所收獲

items()方法
iterrows()方法
insert()方法
assign()方法
eval()方法
pop()方法
truncate()方法
count()方法
add_prefix()方法/add_suffix()方法
clip()方法
filter()方法
first()方法
isin()方法
df.plot.area()方法
df.plot.bar()方法
df.plot.box()方法
df.plot.pie()方法

items()方法

pandas當中的items()方法可以用來遍歷數據集當中的每一列，同時返回列名以及每一列當中的內容，通過以元組的形式，示例如下

df?=?pd.DataFrame({'species':?['bear',?'bear',?'marsupial'],'population':?[1864,?22000,?80000]},index=['panda',?'polar',?'koala']) df

output

species??population panda???????bear????????1864 polar???????bear???????22000 koala??marsupial???????80000

然后我們使用items()方法

for?label,?content?in?df.items():print(f'label:?{label}')print(f'content:?{content}',?sep='\n')print("="?*?50)

output

label:?species content:?panda?????????bear polar?????????bear koala????marsupial Name:?species,?dtype:?object ================================================== label:?population content:?panda?????1864 polar????22000 koala????80000 Name:?population,?dtype:?int64 ==================================================

相繼的打印出了‘species’和‘population’這兩列的列名和相應的內容

iterrows()方法

而對于iterrows()方法而言，其功能則是遍歷數據集當中的每一行，返回每一行的索引以及帶有列名的每一行的內容，示例如下

for?label,?content?in?df.iterrows():print(f'label:?{label}')print(f'content:?{content}',?sep='\n')print("="?*?50)

output

label:?panda content:?species???????bear population????1864 Name:?panda,?dtype:?object ================================================== label:?polar content:?species????????bear population????22000 Name:?polar,?dtype:?object ================================================== label:?koala content:?species???????marsupial population????????80000 Name:?koala,?dtype:?object ==================================================

insert()方法

insert()方法主要是用于在數據集當中的特定位置處插入數據，示例如下

df.insert(1,?"size",?[2000,?3000,?4000])

output

species??size??population panda???????bear??2000????????1864 polar???????bear??3000???????22000 koala??marsupial??4000???????80000

可見在DataFrame數據集當中，列的索引也是從0開始的

assign()方法

assign()方法可以用來在數據集當中添加新的列，示例如下

df.assign(size_1=lambda?x:?x.population?*?9?/?5?+?32)

output

species??population????size_1 panda???????bear????????1864????3387.2 polar???????bear???????22000???39632.0 koala??marsupial???????80000??144032.0

從上面的例子中可以看出，我們通過一個lambda匿名函數，在數據集當中添加一個新的列，命名為‘size_1’，當然我們也可以通過assign()方法來創建不止一個列

df.assign(size_1?=?lambda?x:?x.population?*?9?/?5?+?32,size_2?=?lambda?x:?x.population?*?8?/?5?+?10)

output

species??population????size_1????size_2 panda???????bear????????1864????3387.2????2992.4 polar???????bear???????22000???39632.0???35210.0 koala??marsupial???????80000??144032.0??128010.0

eval()方法

eval()方法主要是用來執行用字符串來表示的運算過程的，例如

df.eval("size_3?=?size_1?+?size_2")

output

species??population????size_1????size_2????size_3 panda???????bear????????1864????3387.2????2992.4????6379.6 polar???????bear???????22000???39632.0???35210.0???74842.0 koala??marsupial???????80000??144032.0??128010.0??272042.0

當然我們也可以同時對執行多個運算過程

df?=?df.eval(''' size_3?=?size_1?+?size_2 size_4?=?size_1?-?size_2 ''')

output

species??population????size_1????size_2????size_3???size_4 panda???????bear????????1864????3387.2????2992.4????6379.6????394.8 polar???????bear???????22000???39632.0???35210.0???74842.0???4422.0 koala??marsupial???????80000??144032.0??128010.0??272042.0??16022.0

pop()方法

pop()方法主要是用來刪除掉數據集中特定的某一列數據

df.pop("size_3")

output

panda??????6379.6 polar?????74842.0 koala????272042.0 Name:?size_3,?dtype:?float64

而原先的數據集當中就沒有這個‘size_3’這一例的數據了

truncate()方法

truncate()方法主要是根據行索引來篩選指定行的數據的，示例如下

df?=?pd.DataFrame({'A':?['a',?'b',?'c',?'d',?'e'],'B':?['f',?'g',?'h',?'i',?'j'],'C':?['k',?'l',?'m',?'n',?'o']},index=[1,?2,?3,?4,?5])

output

A??B??C 1??a??f??k 2??b??g??l 3??c??h??m 4??d??i??n 5??e??j??o

我們使用truncate()方法來做一下嘗試

df.truncate(before=2,?after=4)

output

A??B??C 2??b??g??l 3??c??h??m 4??d??i??n

我們看到參數before和after存在于truncate()方法中，目的就是把行索引2之前和行索引4之后的數據排除在外，篩選出剩余的數據

count()方法

count()方法主要是用來計算某一列當中非空值的個數，示例如下

df?=?pd.DataFrame({"Name":?["John",?"Myla",?"Lewis",?"John",?"John"],"Age":?[24.,?np.nan,?25,?33,?26],"Single":?[True,?True,?np.nan,?True,?False]})

output

Name???Age?Single 0???John??24.0???True 1???Myla???NaN???True 2??Lewis??25.0????NaN 3???John??33.0???True 4???John??26.0??False

我們使用count()方法來計算一下數據集當中非空值的個數

df.count()

output

Name??????5 Age???????4 Single????4 dtype:?int64

add_prefix()方法/add_suffix()方法

add_prefix()方法和add_suffix()方法分別會給列名以及行索引添加后綴和前綴，對于Series()數據集而言，前綴與后綴是添加在行索引處，而對于DataFrame()數據集而言，前綴與后綴是添加在列索引處，示例如下

s?=?pd.Series([1,?2,?3,?4])

output

0??? 1 1????2 2????3 3????4 dtype:?int64

我們使用add_prefix()方法與add_suffix()方法在Series()數據集上

s.add_prefix('row_')

output

row_0????1 row_1????2 row_2????3 row_3????4 dtype:?int64

又例如

s.add_suffix('_row')

output

0_row????1 1_row????2 2_row????3 3_row????4 dtype:?int64

而對于DataFrame()形式數據集而言，add_prefix()方法以及add_suffix()方法是將前綴與后綴添加在列索引處的

df?=?pd.DataFrame({'A':?[1,?2,?3,?4],?'B':?[3,?4,?5,?6]})

output

A??B 0??1??3 1??2??4 2??3??5 3??4??6

示例如下

df.add_prefix("column_")

output

column_A??column_B 0?????????1?????????3 1?????????2?????????4 2?????????3?????????5 3?????????4?????????6

又例如

df.add_suffix("_column")

output

A_column??B_column 0?????????1?????????3 1?????????2?????????4 2?????????3?????????5 3?????????4?????????6

clip()方法

clip()方法主要是通過設置閾值來改變數據集當中的數值，當數值超過閾值的時候，就做出相應的調整

data?=?{'col_0':?[9,?-3,?0,?-1,?5],?'col_1':?[-2,?-7,?6,?8,?-5]} df?=?pd.DataFrame(data)

output

df.clip(lower?=?-4,?upper?=?4)

output

col_0??col_1 0??????4?????-2 1?????-3?????-4 2??????0??????4 3?????-1??????4 4??????4?????-4

我們看到參數lower和upper分別代表閾值的上限與下限，數據集當中超過上限與下限的值會被替代。

filter()方法

pandas當中的filter()方法是用來篩選出特定范圍的數據的，示例如下

df?=?pd.DataFrame(np.array(([1,?2,?3],?[4,?5,?6],?[7,?8,?9],?[10,?11,?12])),index=['A',?'B',?'C',?'D'],columns=['one',?'two',?'three'])

output

one??two??three A????1????2??????3 B????4????5??????6 C????7????8??????9 D???10???11?????12

我們使用filter()方法來篩選數據

df.filter(items=['one',?'three'])

output

one??three A????1??????3 B????4??????6 C????7??????9 D???10?????12

我們還可以使用正則表達式來篩選數據

df.filter(regex='e$',?axis=1)

output

one??three A????1??????3 B????4??????6 C????7??????9 D???10?????12

當然通過參數axis來調整篩選行方向或者是列方向的數據

df.filter(like='B',?axis=0)

output

one??two??three B????4????5??????6

first()方法

當數據集當中的行索引是日期的時候，可以通過該方法來篩選前面幾行的數據

index_1?=?pd.date_range('2021-11-11',?periods=5,?freq='2D') ts?=?pd.DataFrame({'A':?[1,?2,?3,?4,?5]},?index=index_1) ts

output

A 2021-11-11??1 2021-11-13??2 2021-11-15??3 2021-11-17??4 2021-11-19??5

我們使用first()方法來進行一些操作，例如篩選出前面3天的數據

ts.first('3D')

output

A 2021-11-11??1 2021-11-13??2

isin()方法

isin()方法主要是用來確認數據集當中的數值是否被包含在給定的列表當中

df?=?pd.DataFrame(np.array(([1,?2,?3],?[4,?5,?6],?[7,?8,?9],?[10,?11,?12])),index=['A',?'B',?'C',?'D'],columns=['one',?'two',?'three']) df.isin([3,?5,?12])

output

one????two??three A??False??False???True B??False???True??False C??False??False??False D??False??False???True

若是數值被包含在列表當中了，也就是3、5、12當中，返回的是True，否則就返回False

df.plot.area()方法

下面我們來講一下如何在Pandas當中通過一行代碼來繪制圖表，將所有的列都通過面積圖的方式來繪制

df?=?pd.DataFrame({'sales':?[30,?20,?38,?95,?106,?65],'signups':?[7,?9,?6,?12,?18,?13],'visits':?[20,?42,?28,?62,?81,?50], },?index=pd.date_range(start='2021/01/01',?end='2021/07/01',?freq='M'))ax?=?df.plot.area(figsize?=?(10,?5))

output

df.plot.bar()方法

下面我們看一下如何通過一行代碼來繪制柱狀圖

df?=?pd.DataFrame({'label':['A',?'B',?'C',?'D'],?'values':[10,?30,?50,?70]}) ax?=?df.plot.bar(x='label',?y='values',?rot=20)

output

當然我們也可以根據不同的類別來繪制柱狀圖

age?=?[0.1,?17.5,?40,?48,?52,?69,?88] weight?=?[2,?8,?70,?1.5,?25,?12,?28] index?=?['A',?'B',?'C',?'D',?'E',?'F',?'G'] df?=?pd.DataFrame({'age':?age,?'weight':?weight},?index=index) ax?=?df.plot.bar(rot=0)

output

當然我們也可以橫向來繪制圖表

ax?=?df.plot.barh(rot=0)

output

df.plot.box()方法

我們來看一下箱型圖的具體的繪制，通過pandas一行代碼來實現

data?=?np.random.randn(25,?3) df?=?pd.DataFrame(data,?columns=list('ABC')) ax?=?df.plot.box()

output

df.plot.pie()方法

接下來是餅圖的繪制

df?=?pd.DataFrame({'mass':?[1.33,?4.87?,?5.97],'radius':?[2439.7,?6051.8,?6378.1]},index=['Mercury',?'Venus',?'Earth']) plot?=?df.plot.pie(y='mass',?figsize=(8,?8))

output

除此之外，還有折線圖、直方圖、散點圖等等，步驟與方式都與上述的技巧有異曲同工之妙，大家感興趣的可以自己另外去嘗試。

往期精彩回顧適合初學者入門人工智能的路線及資料下載機器學習及深度學習筆記等資料打印機器學習在線手冊深度學習筆記專輯《統計學習方法》的代碼復現專輯 AI基礎下載黃海廣老師《機器學習課程》視頻課黃海廣老師《機器學習課程》711頁完整版課件

本站qq群554839127，加入微信群請掃碼：

總結

以上是生活随笔為你收集整理的【Python】推荐20个好用到爆的Pandas函数方法的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：理论加实践，终于把时间序列预测ARIMA
下一篇： Python二叉树遍历

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

python

【Python】推荐20个好用到爆的Pandas函数方法

items()方法

iterrows()方法

insert()方法

assign()方法

eval()方法

pop()方法

truncate()方法

count()方法

add_prefix()方法/add_suffix()方法

clip()方法

filter()方法

first()方法

isin()方法

df.plot.area()方法

df.plot.bar()方法

df.plot.box()方法

df.plot.pie()方法

總結