日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

整理了90个Pandas案例,强烈建议收藏!

發布時間:2023/12/19 编程问答 36 豆豆
生活随笔 收集整理的這篇文章主要介紹了 整理了90个Pandas案例,强烈建议收藏! 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

給大家分享90個Pandas案例。

文章很長,高低要忍一下,如果忍不了,那就收藏吧,總會用到的

  • 如何使用列表和字典創建 Series

    • 使用列表創建 Series

    • 使用 name 參數創建 Series

    • 使用簡寫的列表創建 Series

    • 使用字典創建 Series

  • 如何使用 Numpy 函數創建 Series

  • 如何獲取 Series 的索引和值

  • 如何在創建 Series 時指定索引

  • 如何獲取 Series 的大小和形狀

  • 如何獲取 Series 開始或末尾幾行數據

    • Head()

    • Tail()

    • Take()

  • 使用切片獲取 Series 子集

  • 如何創建 DataFrame

  • 如何設置 DataFrame 的索引和列信息

  • 如何重命名 DataFrame 的列名稱

  • 如何根據 Pandas 列中的值從 DataFrame 中選擇或過濾行

  • 在 DataFrame 中使用“isin”過濾多行

  • 迭代 DataFrame 的行和列

  • 如何通過名稱或索引刪除 DataFrame 的列

  • 向 DataFrame 中新增列

  • 如何從 DataFrame 中獲取列標題列表

  • 如何隨機生成 DataFrame

  • 如何選擇 DataFrame 的多個列

  • 如何將字典轉換為 DataFrame

  • 使用 ioc 進行切片

  • 檢查 DataFrame 中是否是空的

  • 在創建 DataFrame 時指定索引和列名稱

  • 使用 iloc 進行切片

  • iloc 和 loc 的區別

  • 使用時間索引創建空 DataFrame

  • 如何改變 DataFrame 列的排序

  • 檢查 DataFrame 列的數據類型

  • 更改 DataFrame 指定列的數據類型

  • 如何將列的數據類型轉換為 DateTime 類型

  • 將 DataFrame 列從 floats 轉為 ints

  • 如何把 dates 列轉換為 DateTime 類型

  • 兩個 DataFrame 相加

  • 在 DataFrame 末尾添加額外的行

  • 為指定索引添加新行

  • 如何使用 for 循環添加行

  • 在 DataFrame 頂部添加一行

  • 如何向 DataFrame 中動態添加行

  • 在任意位置插入行

  • 使用時間戳索引向 DataFrame 中添加行

  • 為不同的行填充缺失值

  • append, concat 和 combine_first 示例

  • 獲取行和列的平均值

  • 計算行和列的總和

  • 連接兩列

  • 過濾包含某字符串的行

  • 過濾索引中包含某字符串的行

  • 使用 AND 運算符過濾包含特定字符串值的行

  • 查找包含某字符串的所有行

  • 如果行中的值包含字符串,則創建與字符串相等的另一列

  • 計算 pandas group 中每組的行數

  • 檢查字符串是否在 DataFrme 中

  • 從 DataFrame 列中獲取唯一行值

  • 計算 DataFrame 列的不同值

  • 刪除具有重復索引的行

  • 刪除某些列具有重復值的行

  • 從 DataFrame 單元格中獲取值

  • 使用 DataFrame 中的條件索引獲取單元格上的標量值

  • 設置 DataFrame 的特定單元格值

  • 從 DataFrame 行獲取單元格值

  • 用字典替換 DataFrame 列中的值

  • 統計基于某一列的一列的數值

  • 處理 DataFrame 中的缺失值

  • 刪除包含任何缺失數據的行

  • 刪除 DataFrame 中缺失數據的列

  • 按降序對索引值進行排序

  • 按降序對列進行排序

  • 使用 rank 方法查找 DataFrame 中元素的排名

  • 在多列上設置索引

  • 確定 DataFrame 的周期索引和列

  • 導入 CSV 指定特定索引

  • 將 DataFrame 寫入 csv

  • 使用 Pandas 讀取 csv 文件的特定列

  • Pandas 獲取 CSV 列的列表

  • 找到列值最大的行

  • 使用查詢方法進行復雜條件選擇

  • 檢查 Pandas 中是否存在列

  • 為特定列從 DataFrame 中查找 n-smallest 和 n-largest 值

  • 從 DataFrame 中查找所有列的最小值和最大值

  • 在 DataFrame 中找到最小值和最大值所在的索引位置

  • 計算 DataFrame Columns 的累積乘積和累積總和

  • 匯總統計

  • 查找 DataFrame 的均值、中值和眾數

  • 測量 DataFrame 列的方差和標準偏差

  • 計算 DataFrame 列之間的協方差

  • 計算 Pandas 中兩個 DataFrame 對象之間的相關性

  • 計算 DataFrame 列的每個單元格的百分比變化

  • 在 Pandas 中向前和向后填充 DataFrame 列的缺失值

  • 在 Pandas 中使用非分層索引使用 Stacking

  • 使用分層索引對 Pandas 進行拆分

  • Pandas 獲取 HTML 頁面上 table 數據

1如何使用列表和字典創建 Series

使用列表創建 Series

import?pandas?as?pdser1?=?pd.Series([1.5,?2.5,?3,?4.5,?5.0,?6]) print(ser1)

Output:

0 1.5 1 2.5 2 3.0 3 4.5 4 5.0 5 6.0 dtype: float64

使用 name 參數創建 Series

import?pandas?as?pdser2?=?pd.Series(["India",?"Canada",?"Germany"],?name="Countries") print(ser2)

Output:

0 India 1 Canada 2 Germany Name: Countries, dtype: object

使用簡寫的列表創建 Series

import?pandas?as?pdser3?=?pd.Series(["A"]*4) print(ser3)

Output:

0 A 1 A 2 A 3 A dtype: object

使用字典創建 Series

import?pandas?as?pdser4?=?pd.Series({"India":?"New?Delhi","Japan":?"Tokyo","UK":?"London"}) print(ser4)

Output:

India New Delhi Japan Tokyo UK London dtype: object

2如何使用 Numpy 函數創建 Series

import?pandas?as?pd import?numpy?as?npser1?=?pd.Series(np.linspace(1,?10,?5)) print(ser1)ser2?=?pd.Series(np.random.normal(size=5)) print(ser2)

Output:

0 1.00 1 3.25 2 5.50 3 7.75 4 10.00 dtype: float64 0 -1.694452 1 -1.570006 2 1.713794 3 0.338292 4 0.803511 dtype: float64

3如何獲取 Series 的索引和值

import?pandas?as?pd import?numpy?as?npser1?=?pd.Series({"India":?"New?Delhi","Japan":?"Tokyo","UK":?"London"})print(ser1.values) print(ser1.index)print("\n")ser2?=?pd.Series(np.random.normal(size=5)) print(ser2.index) print(ser2.values)

Output:

['New Delhi' 'Tokyo' 'London'] Index(['India', 'Japan', 'UK'], dtype='object')RangeIndex(start=0, stop=5, step=1) [ 0.66265478 -0.72222211 0.3608642 1.40955436 1.3096732 ]

4如何在創建 Series 時指定索引

import?pandas?as?pdvalues?=?["India",?"Canada",?"Australia","Japan",?"Germany",?"France"]code?=?["IND",?"CAN",?"AUS",?"JAP",?"GER",?"FRA"]ser1?=?pd.Series(values,?index=code)print(ser1)

Output:

IND India CAN Canada AUS Australia JAP Japan GER Germany FRA France dtype: object

5如何獲取 Series 的大小和形狀

import?pandas?as?pdvalues?=?["India",?"Canada",?"Australia","Japan",?"Germany",?"France"]code?=?["IND",?"CAN",?"AUS",?"JAP",?"GER",?"FRA"]ser1?=?pd.Series(values,?index=code)print(len(ser1))print(ser1.shape)print(ser1.size)

Output:

6 (6,) 6

6如何獲取 Series 開始或末尾幾行數據

Head()

import?pandas?as?pdvalues?=?["India",?"Canada",?"Australia","Japan",?"Germany",?"France"]code?=?["IND",?"CAN",?"AUS",?"JAP",?"GER",?"FRA"]ser1?=?pd.Series(values,?index=code)print("-----Head()-----") print(ser1.head())print("\n\n-----Head(2)-----") print(ser1.head(2))

Output:

-----Head()----- IND India CAN Canada AUS Australia JAP Japan GER Germany dtype: object-----Head(2)----- IND India CAN Canada dtype: object

Tail()

import?pandas?as?pdvalues?=?["India",?"Canada",?"Australia","Japan",?"Germany",?"France"]code?=?["IND",?"CAN",?"AUS",?"JAP",?"GER",?"FRA"]ser1?=?pd.Series(values,?index=code)print("-----Tail()-----") print(ser1.tail())print("\n\n-----Tail(2)-----") print(ser1.tail(2))

Output:

-----Tail()----- CAN Canada AUS Australia JAP Japan GER Germany FRA France dtype: object-----Tail(2)----- GER Germany FRA France dtype: object

Take()

import?pandas?as?pdvalues?=?["India",?"Canada",?"Australia","Japan",?"Germany",?"France"]code?=?["IND",?"CAN",?"AUS",?"JAP",?"GER",?"FRA"]ser1?=?pd.Series(values,?index=code)print("-----Take()-----") print(ser1.take([2,?4,?5]))

Output:

-----Take()----- AUS Australia GER Germany FRA France dtype: object

7使用切片獲取 Series 子集

import pandas as pdnum = [000, 100, 200, 300, 400, 500, 600, 700, 800, 900]idx = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']series = pd.Series(num, index=idx)print("\n [2:2] \n") print(series[2:4])print("\n [1:6:2] \n") print(series[1:6:2])print("\n [:6] \n") print(series[:6])print("\n [4:] \n") print(series[4:])print("\n [:4:2] \n") print(series[:4:2])print("\n [4::2] \n") print(series[4::2])print("\n [::-1] \n") print(series[::-1])

Output

[2:2]C 200 D 300 dtype: int64[1:6:2]B 100 D 300 F 500 dtype: int64[:6]A 0 B 100 C 200 D 300 E 400 F 500 dtype: int64[4:]E 400 F 500 G 600 H 700 I 800 J 900 dtype: int64[:4:2]A 0 C 200 dtype: int64[4::2]E 400 G 600 I 800 dtype: int64[::-1]J 900 I 800 H 700 G 600 F 500 E 400 D 300 C 200 B 100 A 0 dtype: int64

8如何創建 DataFrame

import pandas as pdemployees = pd.DataFrame({'EmpCode': ['Emp001', 'Emp00'],'Name': ['John Doe', 'William Spark'],'Occupation': ['Chemist', 'Statistician'],'Date Of Join': ['2018-01-25', '2018-01-26'],'Age': [23, 24]})print(employees)

Output:

Age Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Doe Chemist 1 24 2018-01-26 Emp00 William Spark Statistician

9如何設置 DataFrame 的索引和列信息

import?pandas?as?pdemployees?=?pd.DataFrame(data={'Name':?['John?Doe',?'William?Spark'],'Occupation':?['Chemist',?'Statistician'],'Date?Of?Join':?['2018-01-25',?'2018-01-26'],'Age':?[23,?24]},index=['Emp001',?'Emp002'],columns=['Name',?'Occupation',?'Date?Of?Join',?'Age'])print(employees)

Output

Name Occupation Date Of Join Age Emp001 John Doe Chemist 2018-01-25 23 Emp002 William Spark Statistician 2018-01-26 24

10如何重命名 DataFrame 的列名稱

import?pandas?as?pdemployees?=?pd.DataFrame({'EmpCode':?['Emp001',?'Emp00'],'Name':?['John?Doe',?'William?Spark'],'Occupation':?['Chemist',?'Statistician'],'Date?Of?Join':?['2018-01-25',?'2018-01-26'],'Age':?[23,?24]})employees.columns?=?['EmpCode',?'EmpName',?'EmpOccupation',?'EmpDOJ',?'EmpAge']print(employees)

Output:

EmpCode EmpName EmpOccupation EmpDOJ EmpAge 0 23 2018-01-25 Emp001 John Doe Chemist 1 24 2018-01-26 Emp00 William Spark Statistician

11如何根據 Pandas 列中的值從 DataFrame 中選擇或過濾行

import?pandas?as?pdemployees?=?pd.DataFrame({'EmpCode':?['Emp001',?'Emp002',?'Emp003',?'Emp004',?'Emp005'],'Name':?['John',?'Doe',?'William',?'Spark',?'Mark'],'Occupation':?['Chemist',?'Statistician',?'Statistician','Statistician',?'Programmer'],'Date?Of?Join':?['2018-01-25',?'2018-01-26',?'2018-01-26',?'2018-02-26','2018-03-16'],'Age':?[23,?24,?34,?29,?40]})print("\nUse?==?operator\n") print(employees.loc[employees['Age']?==?23])print("\nUse?<?operator\n") print(employees.loc[employees['Age']?<?30])print("\nUse?!=?operator\n") print(employees.loc[employees['Occupation']?!=?'Statistician'])print("\nMultiple?Conditions\n") print(employees.loc[(employees['Occupation']?!=?'Statistician')?&(employees['Name']?==?'John')])

Output:

Use == operatorAge Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John ChemistUse < operatorAge Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist 1 24 2018-01-26 Emp002 Doe Statistician 3 29 2018-02-26 Emp004 Spark StatisticianUse != operatorAge Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist 4 40 2018-03-16 Emp005 Mark ProgrammerMultiple ConditionsAge Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist

12在 DataFrame 中使用“isin”過濾多行

import?pandas?as?pdemployees?=?pd.DataFrame({'EmpCode':?['Emp001',?'Emp002',?'Emp003',?'Emp004',?'Emp005'],'Name':?['John',?'Doe',?'William',?'Spark',?'Mark'],'Occupation':?['Chemist',?'Statistician',?'Statistician','Statistician',?'Programmer'],'Date?Of?Join':?['2018-01-25',?'2018-01-26',?'2018-01-26',?'2018-02-26','2018-03-16'],'Age':?[23,?24,?34,?29,?40]})print("\nUse?isin?operator\n") print(employees.loc[employees['Occupation'].isin(['Chemist','Programmer'])])print("\nMultiple?Conditions\n") print(employees.loc[(employees['Occupation']?==?'Chemist')?|(employees['Name']?==?'John')?&(employees['Age']?<?30)])

Output:

Use isin operatorAge Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist 4 40 2018-03-16 Emp005 Mark ProgrammerMultiple ConditionsAge Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist

13迭代 DataFrame 的行和列

import?pandas?as?pdemployees?=?pd.DataFrame({'EmpCode':?['Emp001',?'Emp002',?'Emp003',?'Emp004',?'Emp005'],'Name':?['John',?'Doe',?'William',?'Spark',?'Mark'],'Occupation':?['Chemist',?'Statistician',?'Statistician','Statistician',?'Programmer'],'Date?Of?Join':?['2018-01-25',?'2018-01-26',?'2018-01-26',?'2018-02-26','2018-03-16'],'Age':?[23,?24,?34,?29,?40]})print("\n?Example?iterrows?\n") for?index,?col?in?employees.iterrows():print(col['Name'],?"--",?col['Age'])print("\n?Example?itertuples?\n") for?row?in?employees.itertuples(index=True,?name='Pandas'):print(getattr(row,?"Name"),?"--",?getattr(row,?"Age"))

Output:

Example iterrowsJohn -- 23 Doe -- 24 William -- 34 Spark -- 29 Mark -- 40Example itertuplesJohn -- 23 Doe -- 24 William -- 34 Spark -- 29 Mark -- 40

14如何通過名稱或索引刪除 DataFrame 的列

import?pandas?as?pdemployees?=?pd.DataFrame({'EmpCode':?['Emp001',?'Emp002',?'Emp003',?'Emp004',?'Emp005'],'Name':?['John',?'Doe',?'William',?'Spark',?'Mark'],'Occupation':?['Chemist',?'Statistician',?'Statistician','Statistician',?'Programmer'],'Date?Of?Join':?['2018-01-25',?'2018-01-26',?'2018-01-26',?'2018-02-26','2018-03-16'],'Age':?[23,?24,?34,?29,?40]})print(employees)print("\n?Drop?Column?by?Name?\n") employees.drop('Age',?axis=1,?inplace=True) print(employees)print("\n?Drop?Column?by?Index?\n") employees.drop(employees.columns[[0,1]],?axis=1,?inplace=True) print(employees)

Output:

Age Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist 1 24 2018-01-26 Emp002 Doe Statistician 2 34 2018-01-26 Emp003 William Statistician 3 29 2018-02-26 Emp004 Spark Statistician 4 40 2018-03-16 Emp005 Mark ProgrammerDrop Column by NameDate Of Join EmpCode Name Occupation 0 2018-01-25 Emp001 John Chemist 1 2018-01-26 Emp002 Doe Statistician 2 2018-01-26 Emp003 William Statistician 3 2018-02-26 Emp004 Spark Statistician 4 2018-03-16 Emp005 Mark ProgrammerDrop Column by IndexName Occupation 0 John Chemist 1 Doe Statistician 2 William Statistician 3 Spark Statistician 4 Mark Programmer

15向 DataFrame 中新增列

import?pandas?as?pdemployees?=?pd.DataFrame({'EmpCode':?['Emp001',?'Emp002',?'Emp003',?'Emp004',?'Emp005'],'Name':?['John',?'Doe',?'William',?'Spark',?'Mark'],'Occupation':?['Chemist',?'Statistician',?'Statistician','Statistician',?'Programmer'],'Date?Of?Join':?['2018-01-25',?'2018-01-26',?'2018-01-26',?'2018-02-26','2018-03-16'],'Age':?[23,?24,?34,?29,?40]})employees['City']?=?['London',?'Tokyo',?'Sydney',?'London',?'Toronto']print(employees)

Output:

Age Date Of Join EmpCode Name Occupation City 0 23 2018-01-25 Emp001 John Chemist London 1 24 2018-01-26 Emp002 Doe Statistician Tokyo 2 34 2018-01-26 Emp003 William Statistician Sydney 3 29 2018-02-26 Emp004 Spark Statistician London 4 40 2018-03-16 Emp005 Mark Programmer Toronto

16如何從 DataFrame 中獲取列標題列表

import?pandas?as?pdemployees?=?pd.DataFrame({'EmpCode':?['Emp001',?'Emp002',?'Emp003',?'Emp004',?'Emp005'],'Name':?['John',?'Doe',?'William',?'Spark',?'Mark'],'Occupation':?['Chemist',?'Statistician',?'Statistician','Statistician',?'Programmer'],'Date?Of?Join':?['2018-01-25',?'2018-01-26',?'2018-01-26',?'2018-02-26','2018-03-16'],'Age':?[23,?24,?34,?29,?40]})print(list(employees))print(list(employees.columns.values))print(employees.columns.tolist())

Output:

['Age', 'Date Of Join', 'EmpCode', 'Name', 'Occupation'] ['Age', 'Date Of Join', 'EmpCode', 'Name', 'Occupation'] ['Age', 'Date Of Join', 'EmpCode', 'Name', 'Occupation']

17如何隨機生成 DataFrame

import?pandas?as?pd import?numpy?as?npnp.random.seed(5)df_random?=?pd.DataFrame(np.random.randint(100,?size=(10,?6)),columns=list('ABCDEF'),index=['Row-{}'.format(i)?for?i?in?range(10)])print(df_random)

Output:

A B C D E F Row-0 99 78 61 16 73 8 Row-1 62 27 30 80 7 76 Row-2 15 53 80 27 44 77 Row-3 75 65 47 30 84 86 Row-4 18 9 41 62 1 82 Row-5 16 78 5 58 0 80 Row-6 4 36 51 27 31 2 Row-7 68 38 83 19 18 7 Row-8 30 62 11 67 65 55 Row-9 3 91 78 27 29 33

18如何選擇 DataFrame 的多個列

import?pandas?as?pdemployees?=?pd.DataFrame({'EmpCode':?['Emp001',?'Emp002',?'Emp003',?'Emp004',?'Emp005'],'Name':?['John',?'Doe',?'William',?'Spark',?'Mark'],'Occupation':?['Chemist',?'Statistician',?'Statistician','Statistician',?'Programmer'],'Date?Of?Join':?['2018-01-25',?'2018-01-26',?'2018-01-26',?'2018-02-26','2018-03-16'],'Age':?[23,?24,?34,?29,?40]})df?=?employees[['EmpCode',?'Age',?'Name']] print(df)

Output:

EmpCode Age Name 0 Emp001 23 John 1 Emp002 24 Doe 2 Emp003 34 William 3 Emp004 29 Spark 4 Emp005 40 Mark

19如何將字典轉換為 DataFrame

import?pandas?as?pddata?=?({'Age':?[30,?20,?22,?40,?32,?28,?39],'Color':?['Blue',?'Green',?'Red',?'White',?'Gray',?'Black','Red'],'Food':?['Steak',?'Lamb',?'Mango',?'Apple',?'Cheese','Melon',?'Beans'],'Height':?[165,?70,?120,?80,?180,?172,?150],'Score':?[4.6,?8.3,?9.0,?3.3,?1.8,?9.5,?2.2],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']}) print(data)df?=?pd.DataFrame(data)print(df)

Output:

{'Height': [165, 70, 120, 80, 180, 172, 150], 'Food': ['Steak', 'Lamb', 'Mango','Apple', 'Cheese', 'Melon', 'Beans'], 'Age': [30, 20, 22, 40, 32, 28, 39], 'Sco re': [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2], 'Color': ['Blue', 'Green', 'Red', 'Whi te', 'Gray', 'Black', 'Red'], 'State': ['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX' ]}Age Color Food Height Score State 0 30 Blue Steak 165 4.6 NY 1 20 Green Lamb 70 8.3 TX 2 22 Red Mango 120 9.0 FL 3 40 White Apple 80 3.3 AL 4 32 Gray Cheese 180 1.8 AK 5 28 Black Melon 172 9.5 TX 6 39 Red Beans 150 2.2 TX

20使用 ioc 進行切片

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?20,?22,?40,?32,?28,?39],'Color':?['Blue',?'Green',?'Red',?'White',?'Gray',?'Black','Red'],'Food':?['Steak',?'Lamb',?'Mango',?'Apple',?'Cheese','Melon',?'Beans'],'Height':?[165,?70,?120,?80,?180,?172,?150],'Score':?[4.6,?8.3,?9.0,?3.3,?1.8,?9.5,?2.2],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Dean','Christina',?'Cornelia'])print("\n?--?Selecting?a?single?row?with?.loc?with?a?string?--?\n") print(df.loc['Penelope'])print("\n?--?Selecting?multiple?rows?with?.loc?with?a?list?of?strings?--?\n") print(df.loc[['Cornelia',?'Jane',?'Dean']])print("\n?--?Selecting?multiple?rows?with?.loc?with?slice?notation?--?\n") print(df.loc['Aaron':'Dean'])

Output:

-- Selecting a single row with .loc with a string --Age 40 Color White Food Apple Height 80 Score 3.3 State AL Name: Penelope, dtype: object-- Selecting multiple rows with .loc with a list of strings --Age Color Food Height Score State Cornelia 39 Red Beans 150 2.2 TX Jane 30 Blue Steak 165 4.6 NY Dean 32 Gray Cheese 180 1.8 AK-- Selecting multiple rows with .loc with slice notation --Age Color Food Height Score State Aaron 22 Red Mango 120 9.0 FL Penelope 40 White Apple 80 3.3 AL Dean 32 Gray Cheese 180 1.8 AK

21檢查 DataFrame 中是否是空的

import?pandas?as?pddf?=?pd.DataFrame()if?df.empty:print('DataFrame?is?empty!')

Output:

DataFrame is empty!

22在創建 DataFrame 時指定索引和列名稱

import?pandas?as?pdvalues?=?["India",?"Canada",?"Australia","Japan",?"Germany",?"France"]code?=?["IND",?"CAN",?"AUS",?"JAP",?"GER",?"FRA"]df?=?pd.DataFrame(values,?index=code,?columns=['Country'])print(df)

Output:

Country IND India CAN Canada AUS Australia JAP Japan GER Germany FRA France

23使用 iloc 進行切片

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?20,?22,?40,?32,?28,?39],'Color':?['Blue',?'Green',?'Red',?'White',?'Gray',?'Black','Red'],'Food':?['Steak',?'Lamb',?'Mango',?'Apple',?'Cheese','Melon',?'Beans'],'Height':?[165,?70,?120,?80,?180,?172,?150],'Score':?[4.6,?8.3,?9.0,?3.3,?1.8,?9.5,?2.2],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Dean','Christina',?'Cornelia'])print("\n?--?Selecting?a?single?row?with?.iloc?with?an?integer?--?\n") print(df.iloc[4])print("\n?--?Selecting?multiple?rows?with?.iloc?with?a?list?of?integers?--?\n") print(df.iloc[[2,?-2]])print("\n?--?Selecting?multiple?rows?with?.iloc?with?slice?notation?--?\n") print(df.iloc[:5:3])

Output:

-- Selecting a single row with .iloc with an integer --Age 32 Color Gray Food Cheese Height 180 Score 1.8 State AK Name: Dean, dtype: object-- Selecting multiple rows with .iloc with a list of integers --Age Color Food Height Score State Aaron 22 Red Mango 120 9.0 FL Christina 28 Black Melon 172 9.5 TX-- Selecting multiple rows with .iloc with slice notation --Age Color Food Height Score State Jane 30 Blue Steak 165 4.6 NY Penelope 40 White Apple 80 3.3 AL

24iloc 和 loc 的區別

  • loc 索引器還可以進行布爾選擇,例如,如果我們想查找 Age 小于 30 的所有行并僅返回 Color 和 Height 列,我們可以執行以下操作。我們可以用 iloc 復制它,但我們不能將它傳遞給一個布爾系列,必須將布爾系列轉換為 numpy 數組

  • loc 從索引中獲取具有特定標簽的行(或列)

  • iloc 在索引中的特定位置獲取行(或列)(因此它只需要整數)

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?20,?22,?40,?32,?28,?39],'Color':?['Blue',?'Green',?'Red',?'White',?'Gray',?'Black','Red'],'Food':?['Steak',?'Lamb',?'Mango',?'Apple',?'Cheese','Melon',?'Beans'],'Height':?[165,?70,?120,?80,?180,?172,?150],'Score':?[4.6,?8.3,?9.0,?3.3,?1.8,?9.5,?2.2],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Dean','Christina',?'Cornelia'])print("\n?--?loc?--?\n") print(df.loc[df['Age']?<?30,?['Color',?'Height']])print("\n?--?iloc?--?\n") print(df.iloc[(df['Age']?<?30).values,?[1,?3]])

Output:

-- loc --Color Height Nick Green 70 Aaron Red 120 Christina Black 172-- iloc --Color Height Nick Green 70 Aaron Red 120 Christina Black 172

25使用時間索引創建空 DataFrame

import?datetime import?pandas?as?pdtodays_date?=?datetime.datetime.now().date() index?=?pd.date_range(todays_date,?periods=10,?freq='D')columns?=?['A',?'B',?'C']df?=?pd.DataFrame(index=index,?columns=columns) df?=?df.fillna(0)print(df)

Output:

A B C 2018-09-30 0 0 0 2018-10-01 0 0 0 2018-10-02 0 0 0 2018-10-03 0 0 0 2018-10-04 0 0 0 2018-10-05 0 0 0 2018-10-06 0 0 0 2018-10-07 0 0 0 2018-10-08 0 0 0 2018-10-09 0 0 0

26如何改變 DataFrame 列的排序

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?20,?22,?40,?32,?28,?39],'Color':?['Blue',?'Green',?'Red',?'White',?'Gray',?'Black','Red'],'Food':?['Steak',?'Lamb',?'Mango',?'Apple',?'Cheese','Melon',?'Beans'],'Height':?[165,?70,?120,?80,?180,?172,?150],'Score':?[4.6,?8.3,?9.0,?3.3,?1.8,?9.5,?2.2],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Dean','Christina',?'Cornelia'])print("\n?--?Change?order?using?columns?--?\n") new_order?=?[3,?2,?1,?4,?5,?0] df?=?df[df.columns[new_order]] print(df)print("\n?--?Change?order?using?reindex?--?\n") df?=?df.reindex(['State',?'Color',?'Age',?'Food',?'Score',?'Height'],?axis=1) print(df)

Output:

-- Change order using columns --Height Food Color Score State Age Jane 165 Steak Blue 4.6 NY 30 Nick 70 Lamb Green 8.3 TX 20 Aaron 120 Mango Red 9.0 FL 22 Penelope 80 Apple White 3.3 AL 40 Dean 180 Cheese Gray 1.8 AK 32 Christina 172 Melon Black 9.5 TX 28 Cornelia 150 Beans Red 2.2 TX 39-- Change order using reindex --State Color Age Food Score Height Jane NY Blue 30 Steak 4.6 165 Nick TX Green 20 Lamb 8.3 70 Aaron FL Red 22 Mango 9.0 120 Penelope AL White 40 Apple 3.3 80 Dean AK Gray 32 Cheese 1.8 180 Christina TX Black 28 Melon 9.5 172 Cornelia TX Red 39 Beans 2.2 150

27檢查 DataFrame 列的數據類型

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?20,?22,?40,?32,?28,?39],'Color':?['Blue',?'Green',?'Red',?'White',?'Gray',?'Black','Red'],'Food':?['Steak',?'Lamb',?'Mango',?'Apple',?'Cheese','Melon',?'Beans'],'Height':?[165,?70,?120,?80,?180,?172,?150],'Score':?[4.6,?8.3,?9.0,?3.3,?1.8,?9.5,?2.2],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Dean','Christina',?'Cornelia'])print(df.dtypes)

Output:

Age int64 Color object Food object Height int64 Score float64 State object dtype: object

28更改 DataFrame 指定列的數據類型

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?20,?22,?40,?32,?28,?39],'Color':?['Blue',?'Green',?'Red',?'White',?'Gray',?'Black','Red'],'Food':?['Steak',?'Lamb',?'Mango',?'Apple',?'Cheese','Melon',?'Beans'],'Height':?[165,?70,?120,?80,?180,?172,?150],'Score':?[4.6,?8.3,?9.0,?3.3,?1.8,?9.5,?2.2],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Dean','Christina',?'Cornelia'])print(df.dtypes)df['Age']?=?df['Age'].astype(str)print(df.dtypes)

Output:

Age int64 Color object Food object Height int64 Score float64 State object dtype: object Age object Color object Food object Height int64 Score float64 State object dtype: object

29如何將列的數據類型轉換為 DateTime 類型

import?pandas?as?pddf?=?pd.DataFrame({'DateOFBirth':?[1349720105,?1349806505,?1349892905,1349979305,?1350065705,?1349792905,1349730105],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Dean','Christina',?'Cornelia'])print("\n----------------Before---------------\n") print(df.dtypes) print(df)df['DateOFBirth']?=?pd.to_datetime(df['DateOFBirth'],?unit='s')print("\n----------------After----------------\n") print(df.dtypes) print(df)

Output:

----------------Before---------------DateOFBirth int64 State object dtype: objectDateOFBirth State Jane 1349720105 NY Nick 1349806505 TX Aaron 1349892905 FL Penelope 1349979305 AL Dean 1350065705 AK Christina 1349792905 TX Cornelia 1349730105 TX----------------After----------------DateOFBirth datetime64[ns] State object dtype: objectDateOFBirth State Jane 2012-10-08 18:15:05 NY Nick 2012-10-09 18:15:05 TX Aaron 2012-10-10 18:15:05 FL Penelope 2012-10-11 18:15:05 AL Dean 2012-10-12 18:15:05 AK Christina 2012-10-09 14:28:25 TX Cornelia 2012-10-08 21:01:45 TX

30將 DataFrame 列從 floats 轉為 ints

import pandas as pddf = pd.DataFrame({'DailyExp': [75.7, 56.69, 55.69, 96.5, 84.9, 110.5,58.9],'State': ['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']},index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean','Christina', 'Cornelia'])print("\n----------------Before---------------\n") print(df.dtypes) print(df)df['DailyExp'] = df['DailyExp'].astype(int)print("\n----------------After----------------\n") print(df.dtypes) print(df)

Output:

----------------Before---------------DailyExp float64 State object dtype: objectDailyExp State Jane 75.70 NY Nick 56.69 TX Aaron 55.69 FL Penelope 96.50 AL Dean 84.90 AK Christina 110.50 TX Cornelia 58.90 TX----------------After----------------DailyExp int32 State object dtype: objectDailyExp State Jane 75 NY Nick 56 TX Aaron 55 FL Penelope 96 AL Dean 84 AK Christina 110 TX Cornelia 58 TX

31如何把 dates 列轉換為 DateTime 類型

import?pandas?as?pddf?=?pd.DataFrame({'DateOfBirth':?['1986-11-11',?'1999-05-12',?'1976-01-01','1986-06-01',?'1983-06-04',?'1990-03-07','1999-07-09'],???????????????????'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Dean','Christina',?'Cornelia'])print("\n----------------Before---------------\n") print(df.dtypes)df['DateOfBirth']?=?df['DateOfBirth'].astype('datetime64')print("\n----------------After----------------\n") print(df.dtypes)

Output:

----------------Before---------------DateOfBirth object State object dtype: object----------------After----------------DateOfBirth datetime64[ns] State object dtype: object

32兩個 DataFrame 相加

import?pandas?as?pddf1?=?pd.DataFrame({'Age':?[30,?20,?22,?40],?'Height':?[165,?70,?120,?80],'Score':?[4.6,?8.3,?9.0,?3.3],?'State':?['NY',?'TX','FL',?'AL']},index=['Jane',?'Nick',?'Aaron',?'Penelope'])df2?=?pd.DataFrame({'Age':?[32,?28,?39],?'Color':?['Gray',?'Black',?'Red'],'Food':?['Cheese',?'Melon',?'Beans'],'Score':?[1.8,?9.5,?2.2],?'State':?['AK',?'TX',?'TX']},index=['Dean',?'Christina',?'Cornelia'])df3?=?df1.append(df2,?sort=True)print(df3)

Output:

Age Color Food Height Score State Jane 30 NaN NaN 165.0 4.6 NY Nick 20 NaN NaN 70.0 8.3 TX Aaron 22 NaN NaN 120.0 9.0 FL Penelope 40 NaN NaN 80.0 3.3 AL Dean 32 Gray Cheese NaN 1.8 AK Christina 28 Black Melon NaN 9.5 TX Cornelia 39 Red Beans NaN 2.2 TX

33在 DataFrame 末尾添加額外的行

import?pandas?as?pdemployees?=?pd.DataFrame({'EmpCode':?['Emp001',?'Emp002',?'Emp003',?'Emp004',?'Emp005'],'Name':?['John',?'Doe',?'William',?'Spark',?'Mark'],'Occupation':?['Chemist',?'Statistician',?'Statistician','Statistician',?'Programmer'],'Date?Of?Join':?['2018-01-25',?'2018-01-26',?'2018-01-26',?'2018-02-26','2018-03-16'],'Age':?[23,?24,?34,?29,?40]})print("\n------------?BEFORE?----------------\n") print(employees)employees.loc[len(employees)]?=?[45,?'2018-01-25',?'Emp006',?'Sunny','Programmer']print("\n------------?AFTER?----------------\n") print(employees)

Output:

------------ BEFORE ----------------Age Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist 1 24 2018-01-26 Emp002 Doe Statistician 2 34 2018-01-26 Emp003 William Statistician 3 29 2018-02-26 Emp004 Spark Statistician 4 40 2018-03-16 Emp005 Mark Programmer------------ AFTER ----------------Age Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp001 John Chemist 1 24 2018-01-26 Emp002 Doe Statistician 2 34 2018-01-26 Emp003 William Statistician 3 29 2018-02-26 Emp004 Spark Statistician 4 40 2018-03-16 Emp005 Mark Programmer 5 45 2018-01-25 Emp006 Sunny Programmer

34為指定索引添加新行

import?pandas?as?pdemployees?=?pd.DataFrame(data={'Name':?['John?Doe',?'William?Spark'],'Occupation':?['Chemist',?'Statistician'],'Date?Of?Join':?['2018-01-25',?'2018-01-26'],'Age':?[23,?24]},index=['Emp001',?'Emp002'],columns=['Name',?'Occupation',?'Date?Of?Join',?'Age'])print("\n------------?BEFORE?----------------\n") print(employees)employees.loc['Emp003']?=?['Sunny',?'Programmer',?'2018-01-25',?45]print("\n------------?AFTER?----------------\n") print(employees)

Output:

------------ BEFORE ----------------Name Occupation Date Of Join Age Emp001 John Doe Chemist 2018-01-25 23 Emp002 William Spark Statistician 2018-01-26 24------------ AFTER ----------------Name Occupation Date Of Join Age Emp001 John Doe Chemist 2018-01-25 23 Emp002 William Spark Statistician 2018-01-26 24 Emp003 Sunny Programmer 2018-01-25 45

35如何使用 for 循環添加行

import?pandas?as?pdcols?=?['Zip'] lst?=?[] zip?=?32100for?a?in?range(10):lst.append([zip])zip?=?zip?+?1df?=?pd.DataFrame(lst,?columns=cols)print(df)

Output:

Zip 0 32100 1 32101 2 32102 3 32103 4 32104 5 32105 6 32106 7 32107 8 32108 9 32109

36在 DataFrame 頂部添加一行

import?pandas?as?pdemployees?=?pd.DataFrame({'EmpCode':?['Emp002',?'Emp003',?'Emp004'],'Name':?['John',?'Doe',?'William'],'Occupation':?['Chemist',?'Statistician',?'Statistician'],'Date?Of?Join':?['2018-01-25',?'2018-01-26',?'2018-01-26'],'Age':?[23,?24,?34]})print("\n------------?BEFORE?----------------\n") print(employees)#?New?line line?=?pd.DataFrame({'Name':?'Dean',?'Age':?45,?'EmpCode':?'Emp001','Date?Of?Join':?'2018-02-26',?'Occupation':?'Chemist'},?index=[0])#?Concatenate?two?dataframe employees?=?pd.concat([line,employees.ix[:]]).reset_index(drop=True)print("\n------------?AFTER?----------------\n") print(employees)

Output:

------------ BEFORE ----------------Age Date Of Join EmpCode Name Occupation 0 23 2018-01-25 Emp002 John Chemist 1 24 2018-01-26 Emp003 Doe Statistician 2 34 2018-01-26 Emp004 William Statistician------------ AFTER ----------------Age Date Of Join EmpCode Name Occupation 0 45 2018-02-26 Emp001 Dean Chemist 1 23 2018-01-25 Emp002 John Chemist 2 24 2018-01-26 Emp003 Doe Statistician 3 34 2018-01-26 Emp004 William Statistician

37如何向 DataFrame 中動態添加行

import?pandas?as?pddf?=?pd.DataFrame(columns=['Name',?'Age'])df.loc[1,?'Name']?=?'Rocky' df.loc[1,?'Age']?=?23df.loc[2,?'Name']?=?'Sunny'print(df)

Output:

Name Age 1 Rocky 23 2 Sunny NaN

38在任意位置插入行

import?pandas?as?pddf?=?pd.DataFrame(columns=['Name',?'Age'])df.loc[1,?'Name']?=?'Rocky' df.loc[1,?'Age']?=?21df.loc[2,?'Name']?=?'Sunny' df.loc[2,?'Age']?=?22df.loc[3,?'Name']?=?'Mark' df.loc[3,?'Age']?=?25df.loc[4,?'Name']?=?'Taylor' df.loc[4,?'Age']?=?28print("\n------------?BEFORE?----------------\n") print(df)line?=?pd.DataFrame({"Name":?"Jack",?"Age":?24},?index=[2.5]) df?=?df.append(line,?ignore_index=False) df?=?df.sort_index().reset_index(drop=True)df?=?df.reindex(['Name',?'Age'],?axis=1) print("\n------------?AFTER?----------------\n") print(df)

Output:

------------ BEFORE ----------------Name Age 1 Rocky 21 2 Sunny 22 3 Mark 25 4 Taylor 28------------ AFTER ----------------Name Age 0 Rocky 21 1 Sunny 22 2 Jack 24 3 Mark 25 4 Taylor 28

39使用時間戳索引向 DataFrame 中添加行

import?pandas?as?pddf?=?pd.DataFrame(columns=['Name',?'Age'])df.loc['2014-05-01?18:47:05',?'Name']?=?'Rocky' df.loc['2014-05-01?18:47:05',?'Age']?=?21df.loc['2014-05-02?18:47:05',?'Name']?=?'Sunny' df.loc['2014-05-02?18:47:05',?'Age']?=?22df.loc['2014-05-03?18:47:05',?'Name']?=?'Mark' df.loc['2014-05-03?18:47:05',?'Age']?=?25print("\n------------?BEFORE?----------------\n") print(df)line?=?pd.to_datetime("2014-05-01?18:50:05",?format="%Y-%m-%d?%H:%M:%S") new_row?=?pd.DataFrame([['Bunny',?26]],?columns=['Name',?'Age'],?index=[line]) df?=?pd.concat([df,?pd.DataFrame(new_row)],?ignore_index=False)print("\n------------?AFTER?----------------\n") print(df)

Output:

------------ BEFORE ----------------Name Age 2014-05-01 18:47:05 Rocky 21 2014-05-02 18:47:05 Sunny 22 2014-05-03 18:47:05 Mark 25------------ AFTER ----------------Name Age 2014-05-01 18:47:05 Rocky 21 2014-05-02 18:47:05 Sunny 22 2014-05-03 18:47:05 Mark 25 2014-05-01 18:50:05 Bunny 26

40為不同的行填充缺失值

import?pandas?as?pda?=?{'A':?10,?'B':?20} b?=?{'B':?30,?'C':?40,?'D':?50}df1?=?pd.DataFrame(a,?index=[0]) df2?=?pd.DataFrame(b,?index=[1])df?=?pd.DataFrame() df?=?df.append(df1) df?=?df.append(df2).fillna(0)print(df)

Output:

A B C D 0 10.0 20 0.0 0.0 1 0.0 30 40.0 50.0

41append, concat 和 combine_first 示例

import?pandas?as?pda?=?{'A':?10,?'B':?20} b?=?{'B':?30,?'C':?40,?'D':?50}df1?=?pd.DataFrame(a,?index=[0]) df2?=?pd.DataFrame(b,?index=[1])d1?=?pd.DataFrame() d1?=?d1.append(df1) d1?=?d1.append(df2).fillna(0) print("\n------------?append?----------------\n") print(d1)d2?=?pd.concat([df1,?df2]).fillna(0) print("\n------------?concat?----------------\n") print(d2)d3?=?pd.DataFrame() d3?=?d3.combine_first(df1).combine_first(df2).fillna(0) print("\n------------?combine_first?----------------\n") print(d3)

Output:

------------ append ----------------A B C D 0 10.0 20 0.0 0.0 1 0.0 30 40.0 50.0------------ concat ----------------A B C D 0 10.0 20 0.0 0.0 1 0.0 30 40.0 50.0------------ combine_first ----------------A B C D 0 10.0 20.0 0.0 0.0 1 0.0 30.0 40.0 50.0

42獲取行和列的平均值

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[5,?5,?0,?0]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3'])df['Mean?Basket']?=?df.mean(axis=1) df.loc['Mean?Fruit']?=?df.mean()print(df)

Output:

Apple Orange Banana Pear Mean Basket Basket1 10.000000 20.0 30.0 40.000000 25.0 Basket2 7.000000 14.0 21.0 28.000000 17.5 Basket3 5.000000 5.0 0.0 0.000000 2.5 Mean Fruit 7.333333 13.0 17.0 22.666667 15.0

43計算行和列的總和

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[5,?5,?0,?0]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3'])df['Sum?Basket']?=?df.sum(axis=1) df.loc['Sum?Fruit']?=?df.sum()print(df)

Output:

Apple Orange Banana Pear Sum Basket Basket1 10 20 30 40 100 Basket2 7 14 21 28 70 Basket3 5 5 0 0 10 Sum Fruit 22 39 51 68 180

44連接兩列

import?pandas?as?pddf?=?pd.DataFrame(columns=['Name',?'Age'])df.loc[1,?'Name']?=?'Rocky' df.loc[1,?'Age']?=?21df.loc[2,?'Name']?=?'Sunny' df.loc[2,?'Age']?=?22df.loc[3,?'Name']?=?'Mark' df.loc[3,?'Age']?=?25df.loc[4,?'Name']?=?'Taylor' df.loc[4,?'Age']?=?28print('\n------------?BEFORE?----------------\n') print(df)df['Employee']?=?df['Name'].map(str)?+?'?-?'?+?df['Age'].map(str) df?=?df.reindex(['Employee'],?axis=1)print('\n------------?AFTER?----------------\n') print(df)

Output:

------------ BEFORE ----------------Name Age 1 Rocky 21 2 Sunny 22 3 Mark 25 4 Taylor 28------------ AFTER ----------------Employee 1 Rocky - 21 2 Sunny - 22 3 Mark - 25 4 Taylor - 28

45過濾包含某字符串的行

import?pandas?as?pddf?=?pd.DataFrame({'DateOfBirth':?['1986-11-11',?'1999-05-12',?'1976-01-01','1986-06-01',?'1983-06-04',?'1990-03-07','1999-07-09'],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Dean','Christina',?'Cornelia']) print(df)print("\n----?Filter?with?State?contains?TX?----\n") df1?=?df[df['State'].str.contains("TX")]print(df1)

Output:

DateOfBirth State Jane 1986-11-11 NY Nick 1999-05-12 TX Aaron 1976-01-01 FL Penelope 1986-06-01 AL Dean 1983-06-04 AK Christina 1990-03-07 TX Cornelia 1999-07-09 TX---- Filter with State contains TX ----DateOfBirth State Nick 1999-05-12 TX Christina 1990-03-07 TX Cornelia 1999-07-09 TX

46過濾索引中包含某字符串的行

import?pandas?as?pddf?=?pd.DataFrame({'DateOfBirth':?['1986-11-11',?'1999-05-12',?'1976-01-01','1986-06-01',?'1983-06-04',?'1990-03-07','1999-07-09'],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Pane',?'Aaron',?'Penelope',?'Frane','Christina',?'Cornelia']) print(df) print("\n----?Filter?Index?contains?ane?----\n") df.index?=?df.index.astype('str') df1?=?df[df.index.str.contains('ane')]print(df1)

Output:

DateOfBirth State Jane 1986-11-11 NY Pane 1999-05-12 TX Aaron 1976-01-01 FL Penelope 1986-06-01 AL Frane 1983-06-04 AK Christina 1990-03-07 TX Cornelia 1999-07-09 TX---- Filter Index contains ane ----DateOfBirth State Jane 1986-11-11 NY Pane 1999-05-12 TX Frane 1983-06-04 AK

47使用 AND 運算符過濾包含特定字符串值的行

import?pandas?as?pddf?=?pd.DataFrame({'DateOfBirth':?['1986-11-11',?'1999-05-12',?'1976-01-01','1986-06-01',?'1983-06-04',?'1990-03-07','1999-07-09'],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Pane',?'Aaron',?'Penelope',?'Frane','Christina',?'Cornelia']) print(df)print("\n----?Filter?DataFrame?using?&?----\n")df.index?=?df.index.astype('str') df1?=?df[df.index.str.contains('ane')?&?df['State'].str.contains("TX")]print(df1)

Output:

DateOfBirth State Jane 1986-11-11 NY Pane 1999-05-12 TX Aaron 1976-01-01 FL Penelope 1986-06-01 AL Frane 1983-06-04 AK Christina 1990-03-07 TX Cornelia 1999-07-09 TX---- Filter DataFrame using & ----DateOfBirth State Pane 1999-05-12 TX

48查找包含某字符串的所有行

import?pandas?as?pddf?=?pd.DataFrame({'DateOfBirth':?['1986-11-11',?'1999-05-12',?'1976-01-01','1986-06-01',?'1983-06-04',?'1990-03-07','1999-07-09'],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Pane',?'Aaron',?'Penelope',?'Frane','Christina',?'Cornelia']) print(df)print("\n----?Filter?DataFrame?using?&?----\n")df.index?=?df.index.astype('str') df1?=?df[df.index.str.contains('ane')?|?df['State'].str.contains("TX")]print(df1)

Output:

DateOfBirth State Jane 1986-11-11 NY Pane 1999-05-12 TX Aaron 1976-01-01 FL Penelope 1986-06-01 AL Frane 1983-06-04 AK Christina 1990-03-07 TX Cornelia 1999-07-09 TX---- Filter DataFrame using & ----DateOfBirth State Jane 1986-11-11 NY Pane 1999-05-12 TX Frane 1983-06-04 AK Christina 1990-03-07 TX Cornelia 1999-07-09 TX

49如果行中的值包含字符串,則創建與字符串相等的另一列

import?pandas?as?pd import?numpy?as?npdf?=?pd.DataFrame({'EmpCode':?['Emp001',?'Emp002',?'Emp003',?'Emp004',?'Emp005'],'Name':?['John',?'Doe',?'William',?'Spark',?'Mark'],'Occupation':?['Chemist',?'Accountant',?'Statistician','Statistician',?'Programmer'],'Date?Of?Join':?['2018-01-25',?'2018-01-26',?'2018-01-26',?'2018-02-26','2018-03-16'],'Age':?[23,?24,?34,?29,?40]})df['Department']?=?pd.np.where(df.Occupation.str.contains("Chemist"),?"Science",pd.np.where(df.Occupation.str.contains("Statistician"),?"Economics",pd.np.where(df.Occupation.str.contains("Programmer"),?"Computer",?"General")))print(df)

Output:

Age Date Of Join EmpCode Name Occupation Department 0 23 2018-01-25 Emp001 John Chemist Science 1 24 2018-01-26 Emp002 Doe Accountant General 2 34 2018-01-26 Emp003 William Statistician Economics 3 29 2018-02-26 Emp004 Spark Statistician Economics 4 40 2018-03-16 Emp005 Mark Programmer Computer

50計算 pandas group 中每組的行數

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[5,?5,?0,?0],[6,?6,?6,?6],?[8,?8,?8,?8],?[5,?5,?0,?0]],columns=['Apple',?'Orange',?'Rice',?'Oil'],index=['Basket1',?'Basket2',?'Basket3','Basket4',?'Basket5',?'Basket6'])print(df) print("\n?-----------------------------?\n") print(df[['Apple',?'Orange',?'Rice',?'Oil']].groupby(['Apple']).agg(['mean',?'count']))

Output:

Apple Orange Rice Oil Basket1 10 20 30 40 Basket2 7 14 21 28 Basket3 5 5 0 0 Basket4 6 6 6 6 Basket5 8 8 8 8 Basket6 5 5 0 0-----------------------------Orange Rice Oilmean count mean count mean count Apple 5 5 2 0 2 0 2 6 6 1 6 1 6 1 7 14 1 21 1 28 1 8 8 1 8 1 8 1 10 20 1 30 1 40 1

51檢查字符串是否在 DataFrme 中

import?pandas?as?pddf?=?pd.DataFrame({'DateOfBirth':?['1986-11-11',?'1999-05-12',?'1976-01-01','1986-06-01',?'1983-06-04',?'1990-03-07','1999-07-09'],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Pane',?'Aaron',?'Penelope',?'Frane','Christina',?'Cornelia'])if?df['State'].str.contains('TX').any():print("TX?is?there")

Output:

TX is there

52從 DataFrame 列中獲取唯一行值

import?pandas?as?pddf?=?pd.DataFrame({'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Dean','Christina',?'Cornelia'])print(df) print("\n----------------\n")print(df["State"].unique())

Output:

State Jane NY Nick TX Aaron FL Penelope AL Dean AK Christina TX Cornelia TX----------------['NY' 'TX' 'FL' 'AL' 'AK']

53計算 DataFrame 列的不同值

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?20,?22,?40,?20,?30,?20,?25],'Height':?[165,?70,?120,?80,?162,?72,?124,?81],'Score':?[4.6,?8.3,?9.0,?3.3,?4,?8,?9,?3],'State':?['NY',?'TX',?'FL',?'AL',?'NY',?'TX',?'FL',?'AL']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Jaane',?'Nicky',?'Armour',?'Ponting'])print(df.Age.value_counts())

Output:

20 3 30 2 25 1 22 1 40 1 Name: Age, dtype: int64

54刪除具有重復索引的行

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?30,?22,?40,?20,?30,?20,?25],'Height':?[165,?165,?120,?80,?162,?72,?124,?81],'Score':?[4.6,?4.6,?9.0,?3.3,?4,?8,?9,?3],'State':?['NY',?'NY',?'FL',?'AL',?'NY',?'TX',?'FL',?'AL']},index=['Jane',?'Jane',?'Aaron',?'Penelope',?'Jaane',?'Nicky','Armour',?'Ponting'])print("\n?--------?Duplicate?Rows?-----------?\n") print(df)df1?=?df.reset_index().drop_duplicates(subset='index',keep='first').set_index('index')print("\n?-------?Unique?Rows?------------?\n") print(df1)

Output:

-------- Duplicate Rows -----------Age Height Score State Jane 30 165 4.6 NY Jane 30 165 4.6 NY Aaron 22 120 9.0 FL Penelope 40 80 3.3 AL Jaane 20 162 4.0 NY Nicky 30 72 8.0 TX Armour 20 124 9.0 FL Ponting 25 81 3.0 AL------- Unique Rows ------------Age Height Score State index Jane 30 165 4.6 NY Aaron 22 120 9.0 FL Penelope 40 80 3.3 AL Jaane 20 162 4.0 NY Nicky 30 72 8.0 TX Armour 20 124 9.0 FL Ponting 25 81 3.0 AL

55刪除某些列具有重復值的行

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?40,?30,?40,?30,?30,?20,?25],'Height':?[120,?162,?120,?120,?120,?72,?120,?81],'Score':?[4.6,?4.6,?9.0,?3.3,?4,?8,?9,?3],'State':?['NY',?'NY',?'FL',?'AL',?'NY',?'TX',?'FL',?'AL']},index=['Jane',?'Jane',?'Aaron',?'Penelope',?'Jaane',?'Nicky','Armour',?'Ponting'])print("\n?--------?Duplicate?Rows?-----------?\n") print(df)df1?=?df.reset_index().drop_duplicates(subset=['Age','Height'],keep='first').set_index('index')print("\n?-------?Unique?Rows?------------?\n") print(df1)

Output:

-------- Duplicate Rows -----------Age Height Score State Jane 30 120 4.6 NY Jane 40 162 4.6 NY Aaron 30 120 9.0 FL Penelope 40 120 3.3 AL Jaane 30 120 4.0 NY Nicky 30 72 8.0 TX Armour 20 120 9.0 FL Ponting 25 81 3.0 AL------- Unique Rows ------------Age Height Score State index Jane 30 120 4.6 NY Jane 40 162 4.6 NY Penelope 40 120 3.3 AL Nicky 30 72 8.0 TX Armour 20 120 9.0 FL Ponting 25 81 3.0 AL

56從 DataFrame 單元格中獲取值

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?40,?30,?40,?30,?30,?20,?25],'Height':?[120,?162,?120,?120,?120,?72,?120,?81],'Score':?[4.6,?4.6,?9.0,?3.3,?4,?8,?9,?3],'State':?['NY',?'NY',?'FL',?'AL',?'NY',?'TX',?'FL',?'AL']},index=['Jane',?'Jane',?'Aaron',?'Penelope',?'Jaane',?'Nicky','Armour',?'Ponting'])print(df.loc['Nicky',?'Age'])

Output:

30

57使用 DataFrame 中的條件索引獲取單元格上的標量值

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?40,?30,?40,?30,?30,?20,?25],'Height':?[120,?162,?120,?120,?120,?72,?120,?81],'Score':?[4.6,?4.6,?9.0,?3.3,?4,?8,?9,?3],'State':?['NY',?'NY',?'FL',?'AL',?'NY',?'TX',?'FL',?'AL']},index=['Jane',?'Jane',?'Aaron',?'Penelope',?'Jaane',?'Nicky','Armour',?'Ponting'])print("\nGet?Height?where?Age?is?20") print(df.loc[df['Age']?==?20,?'Height'].values[0])print("\nGet?State?where?Age?is?30") print(df.loc[df['Age']?==?30,?'State'].values[0])

Output:

Get Height where Age is 20 120Get State where Age is 30 NY

58設置 DataFrame 的特定單元格值

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?40,?30,?40,?30,?30,?20,?25],'Height':?[120,?162,?120,?120,?120,?72,?120,?81]},index=['Jane',?'Jane',?'Aaron',?'Penelope',?'Jaane',?'Nicky','Armour',?'Ponting']) print("\n--------------Before------------\n") print(df)df.iat[0,?0]?=?90 df.iat[0,?1]?=?91 df.iat[1,?1]?=?92 df.iat[2,?1]?=?93 df.iat[7,?1]?=?99print("\n--------------After------------\n") print(df)

Output:

--------------Before------------Age Height Jane 30 120 Jane 40 162 Aaron 30 120 Penelope 40 120 Jaane 30 120 Nicky 30 72 Armour 20 120 Ponting 25 81--------------After------------Age Height Jane 90 91 Jane 40 92 Aaron 30 93 Penelope 40 120 Jaane 30 120 Nicky 30 72 Armour 20 120 Ponting 25 99

59從 DataFrame 行獲取單元格值

import?pandas?as?pddf?=?pd.DataFrame({'Age':?[30,?40,?30,?40,?30,?30,?20,?25],'Height':?[120,?162,?120,?120,?120,?72,?120,?81]},index=['Jane',?'Jane',?'Aaron',?'Penelope',?'Jaane',?'Nicky','Armour',?'Ponting'])print(df.loc[df.Age?==?30,'Height'].tolist())

Output:

[120, 120, 120, 72]

60用字典替換 DataFrame 列中的值

import?pandas?as?pddf?=?pd.DataFrame({'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Dean','Christina',?'Cornelia'])print(df)dict?=?{"NY":?1,?"TX":?2,?"FL":?3,?"AL":?4,?"AK":?5} df1?=?df.replace({"State":?dict})print("\n\n") print(df1)

Output:

State Jane NY Nick TX Aaron FL Penelope AL Dean AK Christina TX Cornelia TXState Jane 1 Nick 2 Aaron 3 Penelope 4 Dean 5 Christina 2 Cornelia 2

61統計基于某一列的一列的數值

import?pandas?as?pddf?=?pd.DataFrame({'DateOfBirth':?['1986-11-11',?'1999-05-12',?'1976-01-01','1986-06-01',?'1983-06-04',?'1990-03-07','1999-07-09'],???????????????????'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Nick',?'Aaron',?'Penelope',?'Dean','Christina',?'Cornelia'])print(df.groupby('State').DateOfBirth.nunique())

Output:

State AK 1 AL 1 FL 1 NY 1 TX 3 Name: DateOfBirth, dtype: int64

62處理 DataFrame 中的缺失值

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[5,]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3'])print("\n---------?DataFrame?---------\n") print(df)print("\n---------?Use?of?isnull()?---------\n") print(df.isnull())print("\n---------?Use?of?notnull()?---------\n") print(df.notnull())

Output:

--------- DataFrame ---------Apple Orange Banana Pear Basket1 10 20.0 30.0 40.0 Basket2 7 14.0 21.0 28.0 Basket3 5 NaN NaN NaN--------- Use of isnull() ---------Apple Orange Banana Pear Basket1 False False False False Basket2 False False False False Basket3 False True True True--------- Use of notnull() ---------Apple Orange Banana Pear Basket1 True True True True Basket2 True True True True Basket3 True False False False

63刪除包含任何缺失數據的行

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[5,]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3'])print("\n---------?DataFrame?---------\n") print(df)print("\n---------?Use?of?dropna()?---------\n") print(df.dropna())

Output:

--------- DataFrame ---------Apple Orange Banana Pear Basket1 10 20.0 30.0 40.0 Basket2 7 14.0 21.0 28.0 Basket3 5 NaN NaN NaN--------- Use of dropna() ---------Apple Orange Banana Pear Basket1 10 20.0 30.0 40.0 Basket2 7 14.0 21.0 28.0

64刪除 DataFrame 中缺失數據的列

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[5,]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3'])print("\n---------?DataFrame?---------\n") print(df)print("\n---------?Drop?Columns)?---------\n") print(df.dropna(1))

Output:

--------- DataFrame ---------Apple Orange Banana Pear Basket1 10 20.0 30.0 40.0 Basket2 7 14.0 21.0 28.0 Basket3 5 NaN NaN NaN--------- Drop Columns) ---------Apple Basket1 10 Basket2 7 Basket3 5

65按降序對索引值進行排序

import?pandas?as?pddf?=?pd.DataFrame({'DateOfBirth':?['1986-11-11',?'1999-05-12',?'1976-01-01','1986-06-01',?'1983-06-04',?'1990-03-07','1999-07-09'],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Pane',?'Aaron',?'Penelope',?'Frane','Christina',?'Cornelia'])print(df.sort_index(ascending=False))

Output:

DateOfBirth State Penelope 1986-06-01 AL Pane 1999-05-12 TX Jane 1986-11-11 NY Frane 1983-06-04 AK Cornelia 1999-07-09 TX Christina 1990-03-07 TX Aaron 1976-01-01 FL

66按降序對列進行排序

import?pandas?as?pdemployees?=?pd.DataFrame({'EmpCode':?['Emp001',?'Emp002',?'Emp003',?'Emp004',?'Emp005'],'Name':?['John',?'Doe',?'William',?'Spark',?'Mark'],'Occupation':?['Chemist',?'Statistician',?'Statistician','Statistician',?'Programmer'],'Date?Of?Join':?['2018-01-25',?'2018-01-26',?'2018-01-26',?'2018-02-26','2018-03-16'],'Age':?[23,?24,?34,?29,?40]})print(employees.sort_index(axis=1,?ascending=False))

Output:

Occupation Name EmpCode Date Of Join Age 0 Chemist John Emp001 2018-01-25 23 1 Statistician Doe Emp002 2018-01-26 24 2 Statistician William Emp003 2018-01-26 34 3 Statistician Spark Emp004 2018-02-26 29 4 Programmer Mark Emp005 2018-03-16 40

67使用 rank 方法查找 DataFrame 中元素的排名

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[5,?5,?0,?0]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3'])print("\n---------?DataFrame?Values--------\n") print(df)print("\n---------?DataFrame?Values?by?Rank--------\n") print(df.rank())

Output:

--------- DataFrame Values--------Apple Orange Banana Pear Basket1 10 20 30 40 Basket2 7 14 21 28 Basket3 5 5 0 0--------- DataFrame Values by Rank--------Apple Orange Banana Pear Basket1 3.0 3.0 3.0 3.0 Basket2 2.0 2.0 2.0 2.0 Basket3 1.0 1.0 1.0 1.0

68在多列上設置索引

import?pandas?as?pdemployees?=?pd.DataFrame({'EmpCode':?['Emp001',?'Emp002',?'Emp003',?'Emp004',?'Emp005'],'Name':?['John',?'Doe',?'William',?'Spark',?'Mark'],'Occupation':?['Chemist',?'Statistician',?'Statistician','Statistician',?'Programmer'],'Date?Of?Join':?['2018-01-25',?'2018-01-26',?'2018-01-26',?'2018-02-26','2018-03-16'],'Age':?[23,?24,?34,?29,?40]})print("\n?---------?Before?Index?-----------?\n") print(employees)print("\n?---------?Multiple?Indexing?-----------?\n") print(employees.set_index(['Occupation',?'Age']))

Output:

Date Of Join EmpCode Name Occupation Age Chemist 23 2018-01-25 Emp001 John Statistician 24 2018-01-26 Emp002 Doe34 2018-01-26 Emp003 William29 2018-02-26 Emp004 Spark Programmer 40 2018-03-16 Emp005 Mark

69確定 DataFrame 的周期索引和列

import?pandas?as?pdvalues?=?["India",?"Canada",?"Australia","Japan",?"Germany",?"France"]pidx?=?pd.period_range('2015-01-01',?periods=6)df?=?pd.DataFrame(values,?index=pidx,?columns=['Country'])print(df)

Output:

Country 2015-01-01 India 2015-01-02 Canada 2015-01-03 Australia 2015-01-04 Japan 2015-01-05 Germany 2015-01-06 France

70導入 CSV 指定特定索引

import?pandas?as?pddf?=?pd.read_csv('test.csv',?index_col="DateTime") print(df)

Output:

Wheat Rice Oil DateTime 10/10/2016 10.500 12.500 16.500 10/11/2016 11.250 12.750 17.150 10/12/2016 10.000 13.150 15.500 10/13/2016 12.000 14.500 16.100 10/14/2016 13.000 14.825 15.600 10/15/2016 13.075 15.465 15.315 10/16/2016 13.650 16.105 15.030 10/17/2016 14.225 16.745 14.745 10/18/2016 14.800 17.385 14.460 10/19/2016 15.375 18.025 14.175

71將 DataFrame 寫入 csv

import?pandas?as?pddf?=?pd.DataFrame({'DateOfBirth':?['1986-11-11',?'1999-05-12',?'1976-01-01','1986-06-01',?'1983-06-04',?'1990-03-07','1999-07-09'],'State':?['NY',?'TX',?'FL',?'AL',?'AK',?'TX',?'TX']},index=['Jane',?'Pane',?'Aaron',?'Penelope',?'Frane','Christina',?'Cornelia'])df.to_csv('test.csv',?encoding='utf-8',?index=True)

Output:

檢查本地文件

72使用 Pandas 讀取 csv 文件的特定列

import?pandas?as?pddf?=?pd.read_csv("test.csv",?usecols?=?['Wheat','Oil']) print(df)

73Pandas 獲取 CSV 列的列表

import?pandas?as?pdcols?=?list(pd.read_csv("test.csv",?nrows?=1)) print(cols)

Output:

['DateTime', 'Wheat', 'Rice', 'Oil']

74找到列值最大的行

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3'])print(df.ix[df['Apple'].idxmax()])

Output:

Apple 55 Orange 15 Banana 8 Pear 12 Name: Basket3, dtype: int64

75使用查詢方法進行復雜條件選擇

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3'])print(df)print("\n?-----------?Filter?data?using?query?method?-------------?\n") df1?=?df.ix[df.query('Apple?>?50?&?Orange?<=?15?&?Banana?<?15?&?Pear?==?12').index] print(df1)

Output:

Apple Orange Banana Pear Basket1 10 20 30 40 Basket2 7 14 21 28 Basket3 55 15 8 12----------- Filter data using query method -------------Apple Orange Banana Pear Basket3 55 15 8 12

76檢查 Pandas 中是否存在列

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3'])if?'Apple'?in?df.columns:print("Yes") else:print("No")if?set(['Apple','Orange']).issubset(df.columns):print("Yes") else:print("No")

77為特定列從 DataFrame 中查找 n-smallest 和 n-largest 值

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12],[15,?14,?1,?8],?[7,?1,?1,?8],?[5,?4,?9,?2]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n-----------?nsmallest?-----------\n") print(df.nsmallest(2,?['Apple']))print("\n-----------?nlargest?-----------\n") print(df.nlargest(2,?['Apple']))

Output:

----------- nsmallest -----------Apple Orange Banana Pear Basket6 5 4 9 2 Basket2 7 14 21 28----------- nlargest -----------Apple Orange Banana Pear Basket3 55 15 8 12 Basket4 15 14 1 8

78從 DataFrame 中查找所有列的最小值和最大值

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12],[15,?14,?1,?8],?[7,?1,?1,?8],?[5,?4,?9,?2]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n-----------?Minimum?-----------\n") print(df[['Apple',?'Orange',?'Banana',?'Pear']].min())print("\n-----------?Maximum?-----------\n") print(df[['Apple',?'Orange',?'Banana',?'Pear']].max())

Output:

----------- Minimum -----------Apple 5 Orange 1 Banana 1 Pear 2 dtype: int64----------- Maximum -----------Apple 55 Orange 20 Banana 30 Pear 40 dtype: int64

79在 DataFrame 中找到最小值和最大值所在的索引位置

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12],[15,?14,?1,?8],?[7,?1,?1,?8],?[5,?4,?9,?2]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n-----------?Minimum?-----------\n") print(df[['Apple',?'Orange',?'Banana',?'Pear']].idxmin())print("\n-----------?Maximum?-----------\n") print(df[['Apple',?'Orange',?'Banana',?'Pear']].idxmax())

Output:

----------- Minimum -----------Apple Basket6 Orange Basket5 Banana Basket4 Pear Basket6 dtype: object----------- Maximum -----------Apple Basket3 Orange Basket1 Banana Basket1 Pear Basket1 dtype: object

80計算 DataFrame Columns 的累積乘積和累積總和

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12],[15,?14,?1,?8],?[7,?1,?1,?8],?[5,?4,?9,?2]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n-----------?Cumulative?Product?-----------\n") print(df[['Apple',?'Orange',?'Banana',?'Pear']].cumprod())print("\n-----------?Cumulative?Sum?-----------\n") print(df[['Apple',?'Orange',?'Banana',?'Pear']].cumsum())

Output:

----------- Cumulative Product -----------Apple Orange Banana Pear Basket1 10 20 30 40 Basket2 70 280 630 1120 Basket3 3850 4200 5040 13440 Basket4 57750 58800 5040 107520 Basket5 404250 58800 5040 860160 Basket6 2021250 235200 45360 1720320----------- Cumulative Sum -----------Apple Orange Banana Pear Basket1 10 20 30 40 Basket2 17 34 51 68 Basket3 72 49 59 80 Basket4 87 63 60 88 Basket5 94 64 61 96 Basket6 99 68 70 98

81匯總統計

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12],[15,?14,?1,?8],?[7,?1,?1,?8],?[5,?4,?9,?2]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n-----------?Describe?DataFrame?-----------\n") print(df.describe())print("\n-----------?Describe?Column?-----------\n") print(df[['Apple']].describe())

Output:

----------- Describe DataFrame -----------Apple Orange Banana Pear count 6.000000 6.000000 6.000000 6.000000 mean 16.500000 11.333333 11.666667 16.333333 std 19.180719 7.257180 11.587349 14.555640 min 5.000000 1.000000 1.000000 2.000000 25% 7.000000 6.500000 2.750000 8.000000 50% 8.500000 14.000000 8.500000 10.000000 75% 13.750000 14.750000 18.000000 24.000000 max 55.000000 20.000000 30.000000 40.000000----------- Describe Column -----------Apple count 6.000000 mean 16.500000 std 19.180719 min 5.000000 25% 7.000000 50% 8.500000 75% 13.750000 max 55.000000

82查找 DataFrame 的均值、中值和眾數

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12],[15,?14,?1,?8],?[7,?1,?1,?8],?[5,?4,?9,?2]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n-----------?Calculate?Mean?-----------\n") print(df.mean())print("\n-----------?Calculate?Median?-----------\n") print(df.median())print("\n-----------?Calculate?Mode?-----------\n") print(df.mode())

Output:

----------- Calculate Mean -----------Apple 16.500000 Orange 11.333333 Banana 11.666667 Pear 16.333333 dtype: float64----------- Calculate Median -----------Apple 8.5 Orange 14.0 Banana 8.5 Pear 10.0 dtype: float64----------- Calculate Mode -----------Apple Orange Banana Pear 0 7 14 1 8

83測量 DataFrame 列的方差和標準偏差

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12],[15,?14,?1,?8],?[7,?1,?1,?8],?[5,?4,?9,?2]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n-----------?Calculate?Mean?-----------\n") print(df.mean())print("\n-----------?Calculate?Median?-----------\n") print(df.median())print("\n-----------?Calculate?Mode?-----------\n") print(df.mode())

Output:

----------- Measure Variance -----------Apple 367.900000 Orange 52.666667 Banana 134.266667 Pear 211.866667 dtype: float64----------- Standard Deviation -----------Apple 19.180719 Orange 7.257180 Banana 11.587349 Pear 14.555640 dtype: float64

84計算 DataFrame 列之間的協方差

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12],[15,?14,?1,?8],?[7,?1,?1,?8],?[5,?4,?9,?2]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n-----------?Calculating?Covariance?-----------\n") print(df.cov())print("\n-----------?Between?2?columns?-----------\n") #?Covariance?of?Apple?vs?Orange print(df.Apple.cov(df.Orange))

Output:

----------- Calculating Covariance -----------Apple Orange Banana Pear Apple 367.9 47.600000 -40.200000 -35.000000 Orange 47.6 52.666667 54.333333 77.866667 Banana -40.2 54.333333 134.266667 154.933333 Pear -35.0 77.866667 154.933333 211.866667----------- Between 2 columns -----------47.60000000000001

85計算 Pandas 中兩個 DataFrame 對象之間的相關性

import?pandas?as?pddf1?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12],[15,?14,?1,?8],?[7,?1,?1,?8],?[5,?4,?9,?2]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n------?Calculating?Correlation?of?one?DataFrame?Columns?-----\n") print(df1.corr())df2?=?pd.DataFrame([[52,?54,?58,?41],?[14,?24,?51,?78],?[55,?15,?8,?12],[15,?14,?1,?8],?[7,?17,?18,?98],?[15,?34,?29,?52]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n-----?Calculating?correlation?between?two?DataFrame?-------\n") print(df2.corrwith(other=df1))

Output:

------ Calculating Correlation of one DataFrame Columns -----Apple Orange Banana Pear Apple 1.000000 0.341959 -0.180874 -0.125364 Orange 0.341959 1.000000 0.646122 0.737144 Banana -0.180874 0.646122 1.000000 0.918606 Pear -0.125364 0.737144 0.918606 1.000000----- Calculating correlation between two DataFrame -------Apple 0.678775 Orange 0.354993 Banana 0.920872 Pear 0.076919 dtype: float64

86計算 DataFrame 列的每個單元格的百分比變化

import?pandas?as?pddf?=?pd.DataFrame([[10,?20,?30,?40],?[7,?14,?21,?28],?[55,?15,?8,?12],[15,?14,?1,?8],?[7,?1,?1,?8],?[5,?4,?9,?2]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n------?Percent?change?at?each?cell?of?a?Column?-----\n") print(df[['Apple']].pct_change()[:3])print("\n------?Percent?change?at?each?cell?of?a?DataFrame?-----\n") print(df.pct_change()[:5])

Output:

------ Percent change at each cell of a Column -----Apple Basket1 NaN Basket2 -0.300000 Basket3 6.857143------ Percent change at each cell of a DataFrame -----Apple Orange Banana Pear Basket1 NaN NaN NaN NaN Basket2 -0.300000 -0.300000 -0.300000 -0.300000 Basket3 6.857143 0.071429 -0.619048 -0.571429 Basket4 -0.727273 -0.066667 -0.875000 -0.333333 Basket5 -0.533333 -0.928571 0.000000 0.000000

87在 Pandas 中向前和向后填充 DataFrame 列的缺失值

import?pandas?as?pddf?=?pd.DataFrame([[10,?30,?40],?[],?[15,?8,?12],[15,?14,?1,?8],?[7,?8],?[5,?4,?1]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n------?DataFrame?with?NaN?-----\n") print(df)print("\n------?DataFrame?with?Forward?Filling?-----\n") print(df.ffill())print("\n------?DataFrame?with?Forward?Filling?-----\n") print(df.bfill())

Output:

------ DataFrame with NaN -----Apple Orange Banana Pear Basket1 10.0 30.0 40.0 NaN Basket2 NaN NaN NaN NaN Basket3 15.0 8.0 12.0 NaN Basket4 15.0 14.0 1.0 8.0 Basket5 7.0 8.0 NaN NaN Basket6 5.0 4.0 1.0 NaN------ DataFrame with Forward Filling -----Apple Orange Banana Pear Basket1 10.0 30.0 40.0 NaN Basket2 10.0 30.0 40.0 NaN Basket3 15.0 8.0 12.0 NaN Basket4 15.0 14.0 1.0 8.0 Basket5 7.0 8.0 1.0 8.0 Basket6 5.0 4.0 1.0 8.0------ DataFrame with Forward Filling -----Apple Orange Banana Pear Basket1 10.0 30.0 40.0 8.0 Basket2 15.0 8.0 12.0 8.0 Basket3 15.0 8.0 12.0 8.0 Basket4 15.0 14.0 1.0 8.0 Basket5 7.0 8.0 1.0 NaN Basket6 5.0 4.0 1.0 NaN

88在 Pandas 中使用非分層索引使用 Stacking

import?pandas?as?pddf?=?pd.DataFrame([[10,?30,?40],?[],?[15,?8,?12],[15,?14,?1,?8],?[7,?8],?[5,?4,?1]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n------?DataFrame-----\n") print(df)print("\n------?Stacking?DataFrame?-----\n") print(df.stack(level=-1))

Output:

------ DataFrame-----Apple Orange Banana Pear Basket1 10.0 30.0 40.0 NaN Basket2 NaN NaN NaN NaN Basket3 15.0 8.0 12.0 NaN Basket4 15.0 14.0 1.0 8.0 Basket5 7.0 8.0 NaN NaN Basket6 5.0 4.0 1.0 NaN------ Stacking DataFrame -----Basket1 Apple 10.0Orange 30.0Banana 40.0 Basket3 Apple 15.0Orange 8.0Banana 12.0 Basket4 Apple 15.0Orange 14.0Banana 1.0Pear 8.0 Basket5 Apple 7.0Orange 8.0 Basket6 Apple 5.0Orange 4.0Banana 1.0 dtype: float64

89使用分層索引對 Pandas 進行拆分

import?pandas?as?pddf?=?pd.DataFrame([[10,?30,?40],?[],?[15,?8,?12],[15,?14,?1,?8],?[7,?8],?[5,?4,?1]],columns=['Apple',?'Orange',?'Banana',?'Pear'],index=['Basket1',?'Basket2',?'Basket3',?'Basket4','Basket5',?'Basket6'])print("\n------?DataFrame-----\n") print(df)print("\n------?Unstacking?DataFrame?-----\n") print(df.unstack(level=-1))

Output:

------ DataFrame-----Apple Orange Banana Pear Basket1 10.0 30.0 40.0 NaN Basket2 NaN NaN NaN NaN Basket3 15.0 8.0 12.0 NaN Basket4 15.0 14.0 1.0 8.0 Basket5 7.0 8.0 NaN NaN Basket6 5.0 4.0 1.0 NaN------ Unstacking DataFrame -----Apple Basket1 10.0Basket2 NaNBasket3 15.0Basket4 15.0Basket5 7.0Basket6 5.0 Orange Basket1 30.0Basket2 NaNBasket3 8.0Basket4 14.0Basket5 8.0Basket6 4.0 Banana Basket1 40.0Basket2 NaNBasket3 12.0Basket4 1.0Basket5 NaNBasket6 1.0 Pear Basket1 NaNBasket2 NaNBasket3 NaNBasket4 8.0Basket5 NaNBasket6 NaN dtype: float64

90Pandas 獲取 HTML 頁面上 table 數據

import?pandas?as?pd df?pd.read_html("url")

-?END -

對比Excel系列圖書累積銷量達15w冊,讓你輕松掌握數據分析技能,可以在全網搜索書名進行了解:

總結

以上是生活随笔為你收集整理的整理了90个Pandas案例,强烈建议收藏!的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。