當前位置：首頁 >

Pandas入门1（DataFrame+Series读写/Index+Select+Assign）

發布時間：2024/7/5 34 豆豆

生活随笔收集整理的這篇文章主要介紹了 Pandas入门1（DataFrame+Series读写/Index+Select+Assign）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

- 1. Creating, Reading and Writing
- - 1.1 DataFrame 數據框架
  - 1.2 Series 序列
  - 1.3 Reading 讀取數據
- 2. Indexing, Selecting, Assigning
- - 2.1 類python方式的訪問
  - 2.2 Pandas特有的訪問方式
  - - 2.2.1 iloc 基于index訪問
    - 2.2.2 loc 基于label標簽訪問
  - 2.3 set_index() 設置索引列
  - 2.4 Conditional selection 按條件選擇
  - - 2.4.1 布爾符號 `&，|，==`
    - 2.4.2 Pandas內置符號 `isin，isnull、notnull`
  - 2.5 Assigning data 賦值
  - - 2.5.1 賦值常量
    - 2.5.2 賦值迭代的序列

learn from https://www.kaggle.com/learn/pandas

下一篇：Pandas入門2（DataFunctions+Maps+groupby+sort_values）

1. Creating, Reading and Writing

1.1 DataFrame 數據框架

創建DataFrame，它是一張表，內部是字典，key ：[value_1,...,value_n]

#%% # -*- coding:utf-8 -*- # @Python Version: 3.7 # @Time: 2020/5/16 21:10 # @Author: Michael Ming # @Website: https://michael.blog.csdn.net/ # @File: pandasExercise.ipynb # @Reference: https://www.kaggle.com/learn/pandas import pandas as pd#%% pd.DataFrame({'Yes':[50,22],"No":[131,2]})

fruits = pd.DataFrame([[30, 21],[40, 22]], columns=['Apples', 'Bananas'])

字典內的value也可以是：字符串

pd.DataFrame({"Michael":['handsome','good'],"Ming":['love basketball','coding']})

給數據加索引index，index=['index1','index2',...]

pd.DataFrame({"Michael":['handsome','good'],"Ming":['love basketball','coding']},index=['people1 say','people2 say'])

1.2 Series 序列

Series 是一系列的數據，可以看成是 list

pd.Series([5,2,0,1,3,1,4])0 5 1 2 2 0 3 1 4 3 5 1 6 4 dtype: int64

也可以把數據賦值給Series，只是Series沒有列名稱，只有總的名稱
DataFrame本質上是多個Series粘在一起

pd.Series([30,40,50],index=['2018銷量','2019銷量','2020銷量'],name='博客訪問量')2018銷量 30 2019銷量 40 2020銷量 50 Name: 博客訪問量, dtype: int64

1.3 Reading 讀取數據

讀取csv（"Comma-Separated Values"）文件，pd.read_csv('file')，存入一個DataFrame

wine_rev = pd.read_csv("winemag-data-130k-v2.csv") wine_rev.shape # 大小 (129971, 14) wine_rev.head() # 查看頭部5行

可以自定義索引列，index_col=, 可以是列的序號，或者是列的 name

wine_rev = pd.read_csv("winemag-data-130k-v2.csv", index_col=0) wine_rev.head()

（下圖比上面少了一列，因為定義了index列為0列）

保存，to_csv('xxx.csv')

wine_rev.to_csv('XXX.csv')

2. Indexing, Selecting, Assigning

2.1 類python方式的訪問

item.col_name # 缺點，不能訪問帶有空格的名稱的列，[]操作可以 item['col_name'] wine_rev.country wine_rev['country']0 Italy 1 Portugal 2 US 3 US 4 US... 129966 Germany 129967 US 129968 France 129969 France 129970 France Name: country, Length: 129971, dtype: object wine_rev['country'][0] # 'Italy',先取列，再取行 wine_rev.country[1] # 'Portugal'

2.2 Pandas特有的訪問方式

2.2.1 iloc 基于index訪問

要選擇DataFrame中的第一行數據，我們可以使用以下代碼：
wine_rev.iloc[0]

country Italy description Aromas include tropical fruit, broom, brimston... designation Vulkà Bianco points 87 price NaN province Sicily & Sardinia region_1 Etna region_2 NaN taster_name Kerin O’Keefe taster_twitter_handle @kerinokeefe title Nicosia 2013 Vulkà Bianco (Etna) variety White Blend winery Nicosia Name: 0, dtype: object

loc和iloc都是行第一，列第二，跟上面python操作是相反的

wine_rev.iloc[:,0]，獲取第一列，: 表示所有的

0 Italy 1 Portugal 2 US 3 US 4 US... 129966 Germany 129967 US 129968 France 129969 France 129970 France Name: country, Length: 129971, dtype: object

wine_rev.iloc[:3,0]，:3 表示 [0:3)行 0,1,2

0 Italy 1 Portugal 2 US Name: country, dtype: object

也可以用離散的list，來取行，wine_rev.iloc[[1,2],0]

1 Portugal 2 US Name: country, dtype: object

取最后幾行，wine_rev.iloc[-5:]，倒數第5行到結束

2.2.2 loc 基于label標簽訪問

wine_rev.loc[0, 'country']，行也可以使用 [0,1]表示離散行，列不能使用index

'Italy'

wine_rev.loc[ : 3, 'country']，跟iloc不一樣，這里包含了3號行，loc包含末尾的

0 Italy 1 Portugal 2 US 3 US Name: country, dtype: object

wine_rev.loc[ 1 : 3, ['country','points']]，多列用 list 括起來

loc 的優勢，例如有用字符串 index 的行，df.loc['Apples':'Potatoes']可以選取

2.3 set_index() 設置索引列

set_index() 可以重新設置索引，wine_rev.set_index("title")

2.4 Conditional selection 按條件選擇

2.4.1 布爾符號 &，|，==

wine_rev.country == 'US'，按國家查找，生成了Series of True/False，可用于 loc

0 False 1 False 2 True 3 True 4 True... 129966 False 129967 True 129968 False 129969 False 129970 False Name: country, Length: 129971, dtype: bool

wine_rev.loc[wine_rev.country == 'US']，把 US 的行全部選出來

wine_rev.loc[(wine_rev.country == 'US') & (wine_rev.points >= 90)]，US的&且得分90以上的
還可以用 | 表示或（像C++的位運算符號）

2.4.2 Pandas內置符號 isin，isnull、notnull

wine_rev.loc[wine_rev.country.isin(['US','Italy'])]，只選 US 和 Italy 的行

wine_rev.loc[wine_rev.price.notnull()]，價格不為空的
wine_rev.loc[wine_rev.price.isnull()]，價格為NaN的

2.5 Assigning data 賦值

2.5.1 賦值常量

wine_rev['critic'] = 'Michael'，新加了一列
wine_rev.country = 'Ming'，已有的列的value會直接被覆蓋

2.5.2 賦值迭代的序列

wine_rev['test_id'] = range(len(wine_rev),0,-1)