當前位置：首頁 > 编程语言 > python >内容正文

python

Python 数据分析三剑客之 Pandas（二）：Index 索引对象以及各种索引操作

發(fā)布時間：2023/12/10 python 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python 数据分析三剑客之 Pandas（二）：Index 索引对象以及各种索引操作小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

CSDN 課程推薦：《邁向數(shù)據(jù)科學家：帶你玩轉(zhuǎn)Python數(shù)據(jù)分析》，講師齊偉，蘇州研途教育科技有限公司CTO，蘇州大學應用統(tǒng)計專業(yè)碩士生指導委員會委員；已出版《跟老齊學Python：輕松入門》《跟老齊學Python：Django實戰(zhàn)》、《跟老齊學Python：數(shù)據(jù)分析》和《Python大學實用教程》暢銷圖書。

Pandas 系列文章：

Python 數(shù)據(jù)分析三劍客之 Pandas（一）：認識 Pandas 及其 Series、DataFrame 對象
Python 數(shù)據(jù)分析三劍客之 Pandas（二）：Index 索引對象以及各種索引操作
Python 數(shù)據(jù)分析三劍客之 Pandas（三）：算術(shù)運算與缺失值的處理
Python 數(shù)據(jù)分析三劍客之 Pandas（四）：函數(shù)應用、映射、排序和層級索引
Python 數(shù)據(jù)分析三劍客之 Pandas（五）：統(tǒng)計計算與統(tǒng)計描述
Python 數(shù)據(jù)分析三劍客之 Pandas（六）：GroupBy 數(shù)據(jù)分裂、應用與合并
Python 數(shù)據(jù)分析三劍客之 Pandas（七）：合并數(shù)據(jù)集
Python 數(shù)據(jù)分析三劍客之 Pandas（八）：數(shù)據(jù)重塑、重復數(shù)據(jù)處理與數(shù)據(jù)替換
Python 數(shù)據(jù)分析三劍客之 Pandas（九）：時間序列
Python 數(shù)據(jù)分析三劍客之 Pandas（十）：數(shù)據(jù)讀寫

另有 NumPy、Matplotlib 系列文章已更新完畢，歡迎關注：

NumPy 系列文章：https://itrhx.blog.csdn.net/category_9780393.html
Matplotlib 系列文章：https://itrhx.blog.csdn.net/category_9780418.html

推薦學習資料與網(wǎng)站（博主參與部分文檔翻譯）：

NumPy 官方中文網(wǎng)：https://www.numpy.org.cn/
Pandas 官方中文網(wǎng)：https://www.pypandas.cn/
Matplotlib 官方中文網(wǎng)：https://www.matplotlib.org.cn/
NumPy、Matplotlib、Pandas 速查表：https://github.com/TRHX/Python-quick-reference-table

文章目錄

【1】Index 索引對象
【2】Pandas 一般索引
- 【2.1】Series 索引
- - 【2.1.1】head() / tail()
  - 【2.1.2】行索引
  - 【2.1.3】切片索引
  - 【2.1.4】花式索引
  - 【2.1.5】布爾索引
- 【2.2】DataFrame 索引
- - 【2.2.1】head() / tail()
  - 【2.2.2】列索引
  - 【2.2.3】切片索引
  - 【2.2.4】花式索引
  - 【2.2.5】布爾索引
【3】索引器：loc 和 iloc
- 【3.1】loc 標簽索引
- - 【3.1.1】Series.loc
  - 【3.1.2】DataFrame.loc
- 【3.2】iloc 位置索引
- - 【3.2.1】Series.iloc
  - 【3.2.2】DataFrame.iloc
【4】Pandas 重新索引

這里是一段防爬蟲文本，請讀者忽略。本文原創(chuàng)首發(fā)于 CSDN，作者 TRHX。博客首頁：https://itrhx.blog.csdn.net/ 本文鏈接：https://itrhx.blog.csdn.net/article/details/106698307 未經(jīng)授權(quán)，禁止轉(zhuǎn)載！惡意轉(zhuǎn)載，后果自負！尊重原創(chuàng)，遠離剽竊！

【1】Index 索引對象

Series 和 DataFrame 中的索引都是 Index 對象，為了保證數(shù)據(jù)的安全，索引對象是不可變的，如果嘗試更改索引就會報錯；常見的 Index 種類有：索引（Index），整數(shù)索引（Int64Index），層級索引（MultiIndex），時間戳類型（DatetimeIndex）。

一下代碼演示了 Index 索引對象和其不可變的性質(zhì)：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj.index Index(['a', 'b', 'c', 'd'], dtype='object') >>> type(obj.index) <class 'pandas.core.indexes.base.Index'> >>> obj.index[0] = 'e' Traceback (most recent call last):File "<pyshell#28>", line 1, in <module>obj.index[0] = 'e'File "C:\Users\...\base.py", line 3909, in __setitem__raise TypeError("Index does not support mutable operations") TypeError: Index does not support mutable operations

index 索引對象常用屬性

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.Index.html

屬性描述

T	轉(zhuǎn)置
array	index 的數(shù)組形式，常見官方文檔
dtype	返回基礎數(shù)據(jù)的 dtype 對象
hasnans	是否有 NaN（缺失值）
inferred_type	返回一個字符串，表示 index 的類型
is_monotonic	判斷 index 是否是遞增的
is_monotonic_decreasing	判斷 index 是否單調(diào)遞減
is_monotonic_increasing	判斷 index 是否單調(diào)遞增
is_unique	index 是否沒有重復值
nbytes	返回 index 中的字節(jié)數(shù)
ndim	index 的維度
nlevels	Number of levels.
shape	返回一個元組，表示 index 的形狀
size	index 的大小
values	返回 index 中的值 / 數(shù)組

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj.index Index(['a', 'b', 'c', 'd'], dtype='object') >>> >>> obj.index.array <PandasArray> ['a', 'b', 'c', 'd'] Length: 4, dtype: object >>> >>> obj.index.dtype dtype('O') >>> >>> obj.index.hasnans False >>> >>> obj.index.inferred_type 'string' >>> >>> obj.index.is_monotonic True >>> >>> obj.index.is_monotonic_decreasing False >>> >>> obj.index.is_monotonic_increasing True >>> >>> obj.index.is_unique True >>> >>> obj.index.nbytes 16 >>> >>> obj.index.ndim 1 >>> >>> obj.index.nlevels 1 >>> >>> obj.index.shape (4,) >>> >>> obj.index.size 4 >>> >>> obj.index.values array(['a', 'b', 'c', 'd'], dtype=object)

index 索引對象常用方法

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.Index.html

方法描述

all(self, args, *kwargs)	判斷所有元素是否為真，有 0 會被視為 False
any(self, args, *kwargs)	判斷是否至少有一個元素為真，均為 0 會被視為 False
append(self, other)	連接另一個 index，產(chǎn)生一個新的 index
argmax(self[, axis, skipna])	返回 index 中最大值的索引值
argmin(self[, axis, skipna])	返回 index 中最小值的索引值
argsort(self, args, *kwargs)	對 index 從小到大排序，返回排序后的元素在原 index 中的索引值
delete(self, loc)	刪除指定索引位置的元素，返回刪除后的新 index
difference(self, other[, sort])	在第一個 index 中刪除第二個 index 中的元素，即差集
drop(self, labels[, errors])	在原 index 中刪除傳入的值
drop_duplicates(self[, keep])	刪除重復值，keep 參數(shù)可選值如下： ‘first’：保留第一次出現(xiàn)的重復項； ‘last’：保留最后一次出現(xiàn)的重復項； False：不保留重復項
duplicated(self[, keep])	判斷是否為重復值，keep 參數(shù)可選值如下： ‘first’：第一次重復的為 False，其他為 True； ‘last’：最后一次重復的為 False，其他為 True； False：所有重復的均為 True
dropna(self[, how])	刪除缺失值，即 NaN
fillna(self[, value, downcast])	用指定值填充缺失值，即 NaN
equals(self, other)	判斷兩個 index 是否相同
insert(self, loc, item)	將元素插入到指定索引處，返回新的 index
intersection(self, other[, sort])	返回兩個 index 的交集
isna(self)	檢測 index 元素是否為缺失值，即 NaN
isnull(self)	檢測 index 元素是否為缺失值，即 NaN
max(self[, axis, skipna])	返回 index 的最大值
min(self[, axis, skipna])	返回 index 的最小值
union(self, other[, sort])	返回兩個 index 的并集
unique(self[, level])	返回 index 中的唯一值，相當于去除重復值

all(self, *args, **kwargs) 【官方文檔】

>>> import pandas as pd >>> pd.Index([1, 2, 3]).all() True >>> >>> pd.Index([0, 1, 2]).all() False

any(self, *args, **kwargs) 【官方文檔】

>>> import pandas as pd >>> pd.Index([0, 0, 1]).any() True >>> >>> pd.Index([0, 0, 0]).any() False

append(self, other) 【官方文檔】

>>> import pandas as pd >>> pd.Index(['a', 'b', 'c']).append(pd.Index([1, 2, 3])) Index(['a', 'b', 'c', 1, 2, 3], dtype='object')

argmax(self[, axis, skipna]) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).argmax() 3

argmin(self[, axis, skipna]) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).argmin() 4

argsort(self, *args, **kwargs) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).argsort() array([4, 1, 2, 0, 3], dtype=int32)

delete(self, loc) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).delete(0) Int64Index([2, 3, 9, 1], dtype='int64')

difference(self, other[, sort]) 【官方文檔】

>>> import pandas as pd >>> idx1 = pd.Index([2, 1, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')

drop(self, labels[, errors]) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).drop([2, 1]) Int64Index([5, 3, 9], dtype='int64')

drop_duplicates(self[, keep]) 【官方文檔】

>>> import pandas as pd >>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo']) >>> idx.drop_duplicates(keep='first') Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object') >>> idx.drop_duplicates(keep='last') Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object') >>> idx.drop_duplicates(keep=False) Index(['cow', 'beetle', 'hippo'], dtype='object')

duplicated(self[, keep]) 【官方文檔】

>>> import pandas as pd >>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama']) >>> idx.duplicated() array([False, False, True, False, True]) >>> idx.duplicated(keep='first') array([False, False, True, False, True]) >>> idx.duplicated(keep='last') array([ True, False, True, False, False]) >>> idx.duplicated(keep=False) array([ True, False, True, False, True])

dropna(self[, how]) 【官方文檔】

>>> import numpy as np >>> import pandas as pd >>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).dropna() Float64Index([2.0, 5.0, 6.0], dtype='float64')

fillna(self[, value, downcast]) 【官方文檔】

>>> import numpy as np >>> import pandas as pd >>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).fillna(5) Float64Index([2.0, 5.0, 5.0, 6.0, 5.0, 5.0], dtype='float64')

equals(self, other) 【官方文檔】

>>> import pandas as pd >>> idx1 = pd.Index([5, 2, 3, 9, 1]) >>> idx2 = pd.Index([5, 2, 3, 9, 1]) >>> idx1.equals(idx2) True >>> >>> idx1 = pd.Index([5, 2, 3, 9, 1]) >>> idx2 = pd.Index([5, 2, 4, 9, 1]) >>> idx1.equals(idx2) False

intersection(self, other[, sort]) 【官方文檔】

>>> import pandas as pd >>> idx1 = pd.Index([1, 2, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.intersection(idx2) Int64Index([3, 4], dtype='int64')

insert(self, loc, item) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).insert(2, 'A') Index([5, 2, 'A', 3, 9, 1], dtype='object')

isna(self) 【官方文檔】、isnull(self) 【官方文檔】

>>> import numpy as np >>> import pandas as pd >>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).isna() array([False, False, True, False, True, True]) >>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).isnull() array([False, False, True, False, True, True])

max(self[, axis, skipna]) 【官方文檔】、min(self[, axis, skipna]) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).max() 9 >>> pd.Index([5, 2, 3, 9, 1]).min() 1

union(self, other[, sort]) 【官方文檔】

>>> import pandas as pd >>> idx1 = pd.Index([1, 2, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.union(idx2) Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')

unique(self[, level]) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 1, 3, 5, 1]).unique() Int64Index([5, 1, 3], dtype='int64')

【2】Pandas 一般索引

由于在 Pandas 中，由于有一些更高級的索引操作，比如重新索引，層級索引等，因此將一般的切片索引、花式索引、布爾索引等歸納為一般索引。

【2.1】Series 索引

【2.1.1】head() / tail()

Series.head() 和 Series.tail() 方法可以獲取的前五行和后五行數(shù)據(jù)，如果向 head() / tail() 里面?zhèn)魅雲(yún)?shù)，則會獲取指定行：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.Series(np.random.randn(8)) >>> obj 0 -0.643437 1 -0.365652 2 -0.966554 3 -0.036127 4 1.046095 5 -2.048362 6 -1.865551 7 1.344728 dtype: float64 >>> >>> obj.head() 0 -0.643437 1 -0.365652 2 -0.966554 3 -0.036127 4 1.046095 dtype: float64 >>> >>> obj.head(3) 0 -0.643437 1 -0.365652 2 -0.966554 dtype: float64 >>> >>> obj.tail() 3 1.221221 4 -1.373496 5 1.032843 6 0.029734 7 -1.861485 dtype: float64 >>> >>> obj.tail(3) 5 1.032843 6 0.029734 7 -1.861485 dtype: float64

【2.1.2】行索引

Pandas 中可以按照位置進行索引，也可以按照索引名（index）進行索引，也可以用 Python 字典的表達式和方法來獲取值：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> obj['c'] -8 >>> obj[2] -8 >>> 'b' in obj True >>> obj.keys() Index(['a', 'b', 'c', 'd'], dtype='object') >>> list(obj.items()) [('a', 1), ('b', 5), ('c', -8), ('d', 2)]

【2.1.3】切片索引

切片的方法有兩種：按位置切片和按索引名（index）切片，注意：按位置切片時，不包含終止索引；按索引名（index）切片時，包含終止索引。

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> >>> obj[1:3] b 5 c -8 dtype: int64 >>> >>> obj[0:3:2] a 1 c -8 dtype: int64 >>> >>> obj['b':'d'] b 5 c -8 d 2 dtype: int64

【2.1.4】花式索引

所謂的花式索引，就是間隔索引、不連續(xù)的索引，傳遞一個由索引名（index）或者位置參數(shù)組成的列表來一次性獲得多個元素：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> >>> obj[[0, 2]] a 1 c -8 dtype: int64 >>> >>> obj[['a', 'c', 'd']] a 1 c -8 d 2 dtype: int64

【2.1.5】布爾索引

可以通過一個布爾數(shù)組來索引目標數(shù)組，即通過布爾運算（如：比較運算符）來獲取符合指定條件的元素的數(shù)組。

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2, -3], index=['a', 'b', 'c', 'd', 'e']) >>> obj a 1 b 5 c -8 d 2 e -3 dtype: int64 >>> >>> obj[obj > 0] a 1 b 5 d 2 dtype: int64 >>> >>> obj > 0 a True b True c False d True e False dtype: bool

【2.2】DataFrame 索引

【2.2.1】head() / tail()

和 Series 一樣，DataFrame.head() 和 DataFrame.tail() 方法同樣可以獲取 DataFrame 的前五行和后五行數(shù)據(jù)，如果向 head() / tail() 里面?zhèn)魅雲(yún)?shù)，則會獲取指定行：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.random.randn(8,4), columns = ['a', 'b', 'c', 'd']) >>> obja b c d 0 -1.399390 0.521596 -0.869613 0.506621 1 -0.748562 -0.364952 0.188399 -1.402566 2 1.378776 -1.476480 0.361635 0.451134 3 -0.206405 -1.188609 3.002599 0.563650 4 0.993289 1.133748 1.177549 -2.562286 5 -0.482157 1.069293 1.143983 -1.303079 6 -1.199154 0.220360 0.801838 -0.104533 7 -1.359816 -2.092035 2.003530 -0.151812 >>> >>> obj.head()a b c d 0 -1.399390 0.521596 -0.869613 0.506621 1 -0.748562 -0.364952 0.188399 -1.402566 2 1.378776 -1.476480 0.361635 0.451134 3 -0.206405 -1.188609 3.002599 0.563650 4 0.993289 1.133748 1.177549 -2.562286 >>> >>> obj.head(3)a b c d 0 -1.399390 0.521596 -0.869613 0.506621 1 -0.748562 -0.364952 0.188399 -1.402566 2 1.378776 -1.476480 0.361635 0.451134 >>> >>> obj.tail()a b c d 3 -0.206405 -1.188609 3.002599 0.563650 4 0.993289 1.133748 1.177549 -2.562286 5 -0.482157 1.069293 1.143983 -1.303079 6 -1.199154 0.220360 0.801838 -0.104533 7 -1.359816 -2.092035 2.003530 -0.151812 >>> >>> obj.tail(3)a b c d 5 -0.482157 1.069293 1.143983 -1.303079 6 -1.199154 0.220360 0.801838 -0.104533 7 -1.359816 -2.092035 2.003530 -0.151812

【2.2.2】列索引

DataFrame 可以按照列標簽（columns）來進行列索引：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.random.randn(7,2), columns = ['a', 'b']) >>> obja b 0 -1.198795 0.928378 1 -2.878230 0.014650 2 2.267475 0.370952 3 0.639340 -1.301041 4 -1.953444 0.148934 5 -0.445225 0.459632 6 0.097109 -2.592833 >>> >>> obj['a'] 0 -1.198795 1 -2.878230 2 2.267475 3 0.639340 4 -1.953444 5 -0.445225 6 0.097109 Name: a, dtype: float64 >>> >>> obj[['a']]a 0 -1.198795 1 -2.878230 2 2.267475 3 0.639340 4 -1.953444 5 -0.445225 6 0.097109 >>> >>> type(obj['a']) <class 'pandas.core.series.Series'> >>> type(obj[['a']]) <class 'pandas.core.frame.DataFrame'>

【2.2.3】切片索引

DataFrame 中的切片索引是針對行來操作的，切片的方法有兩種：按位置切片和按索引名（index）切片，注意：按位置切片時，不包含終止索引；按索引名（index）切片時，包含終止索引。

>>> import pandas as pd >>> import numpy as np >>> data = np.random.randn(5,4) >>> index = ['I1', 'I2', 'I3', 'I4', 'I5'] >>> columns = ['a', 'b', 'c', 'd'] >>> obj = pd.DataFrame(data, index, columns) >>> obja b c d I1 0.828676 -1.663337 1.753632 1.432487 I2 0.368138 0.222166 0.902764 -1.436186 I3 2.285615 -2.415175 -1.344456 -0.502214 I4 3.224288 -0.500268 1.293596 -1.235549 I5 -0.938833 -0.804433 -0.170047 -0.566766 >>> >>> obj[0:3]a b c d I1 0.828676 -1.663337 1.753632 1.432487 I2 0.368138 0.222166 0.902764 -1.436186 I3 2.285615 -2.415175 -1.344456 -0.502214 >>> >>> obj[0:4:2]a b c d I1 -0.042168 1.437354 -1.114545 0.830790 I3 0.241506 0.018984 -0.499151 -1.190143 >>> >>> obj['I2':'I4']a b c d I2 0.368138 0.222166 0.902764 -1.436186 I3 2.285615 -2.415175 -1.344456 -0.502214 I4 3.224288 -0.500268 1.293596 -1.235549

【2.2.4】花式索引

和 Series 一樣，所謂的花式索引，就是間隔索引、不連續(xù)的索引，傳遞一個由列名（columns）組成的列表來一次性獲得多列元素：

>>> import pandas as pd >>> import numpy as np >>> data = np.random.randn(5,4) >>> index = ['I1', 'I2', 'I3', 'I4', 'I5'] >>> columns = ['a', 'b', 'c', 'd'] >>> obj = pd.DataFrame(data, index, columns) >>> obja b c d I1 -1.083223 -0.182874 -0.348460 -1.572120 I2 -0.205206 -0.251931 1.180131 0.847720 I3 -0.980379 0.325553 -0.847566 -0.882343 I4 -0.638228 -0.282882 -0.624997 -0.245980 I5 -0.229769 1.002930 -0.226715 -0.916591 >>> >>> obj[['a', 'd']]a d I1 -1.083223 -1.572120 I2 -0.205206 0.847720 I3 -0.980379 -0.882343 I4 -0.638228 -0.245980 I5 -0.229769 -0.916591

【2.2.5】布爾索引

可以通過一個布爾數(shù)組來索引目標數(shù)組，即通過布爾運算（如：比較運算符）來獲取符合指定條件的元素的數(shù)組。

>>> import pandas as pd >>> import numpy as np >>> data = np.random.randn(5,4) >>> index = ['I1', 'I2', 'I3', 'I4', 'I5'] >>> columns = ['a', 'b', 'c', 'd'] >>> obj = pd.DataFrame(data, index, columns) >>> obja b c d I1 -0.602984 -0.135716 0.999689 -0.339786 I2 0.911130 -0.092485 -0.914074 -0.279588 I3 0.849606 -0.420055 -1.240389 -0.179297 I4 0.249986 -1.250668 0.329416 -1.105774 I5 -0.743816 0.430647 -0.058126 -0.337319 >>> >>> obj[obj > 0]a b c d I1 NaN NaN 0.999689 NaN I2 0.911130 NaN NaN NaN I3 0.849606 NaN NaN NaN I4 0.249986 NaN 0.329416 NaN I5 NaN 0.430647 NaN NaN >>> >>> obj > 0a b c d I1 False False True False I2 True False False False I3 True False False False I4 True False True False I5 False True False False

【3】索引器：loc 和 iloc

loc 是標簽索引、iloc 是位置索引，注意：在 Pandas1.0.0 之前還有 ix 方法（即可按標簽也可按位置索引），在 Pandas1.0.0 之后已被移除。

【3.1】loc 標簽索引

loc 標簽索引，即根據(jù) index 和 columns 來選擇數(shù)據(jù)。

【3.1.1】Series.loc

在 Series 中，允許輸入：

單個標簽，例如 5 或 'a'，（注意，5 是 index 的名稱，而不是位置索引）；
標簽列表或數(shù)組，例如 ['a', 'b', 'c']；
帶有標簽的切片對象，例如 'a':'f'。

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.Series.loc.html

>>> import pandas as np >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> >>> obj.loc['a'] 1 >>> >>> obj.loc['a':'c'] a 1 b 5 c -8 dtype: int64 >>> >>> obj.loc[['a', 'd']] a 1 d 2 dtype: int64

【3.1.2】DataFrame.loc

在 DataFrame 中，第一個參數(shù)索引行，第二個參數(shù)是索引列，允許輸入的格式和 Series 大同小異。

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

>>> import pandas as pd >>> obj = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['a', 'b', 'c'], columns=['A', 'B', 'C']) >>> objA B C a 1 2 3 b 4 5 6 c 7 8 9 >>> >>> obj.loc['a'] A 1 B 2 C 3 Name: a, dtype: int64 >>> >>> obj.loc['a':'c']A B C a 1 2 3 b 4 5 6 c 7 8 9 >>> >>> obj.loc[['a', 'c']]A B C a 1 2 3 c 7 8 9 >>> >>> obj.loc['b', 'B'] 5 >>> obj.loc['b', 'A':'C'] A 4 B 5 C 6 Name: b, dtype: int64

【3.2】iloc 位置索引

作用和 loc 一樣，不過是基于索引的編號來索引，即根據(jù) index 和 columns 的位置編號來選擇數(shù)據(jù)。

【3.2.1】Series.iloc

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.Series.iloc.html

在 Series 中，允許輸入：

整數(shù)，例如 5；
整數(shù)列表或數(shù)組，例如 [4, 3, 0]；
具有整數(shù)的切片對象，例如 1:7。

>>> import pandas as np >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> >>> obj.iloc[1] 5 >>> >>> obj.iloc[0:2] a 1 b 5 dtype: int64 >>> >>> obj.iloc[[0, 1, 3]] a 1 b 5 d 2 dtype: int64

【3.2.2】DataFrame.iloc

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

在 DataFrame 中，第一個參數(shù)索引行，第二個參數(shù)是索引列，允許輸入的格式和 Series 大同小異：

>>> import pandas as pd >>> obj = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['a', 'b', 'c'], columns=['A', 'B', 'C']) >>> objA B C a 1 2 3 b 4 5 6 c 7 8 9 >>> >>> obj.iloc[1] A 4 B 5 C 6 Name: b, dtype: int64 >>> >>> obj.iloc[0:2]A B C a 1 2 3 b 4 5 6 >>> >>> obj.iloc[[0, 2]]A B C a 1 2 3 c 7 8 9 >>> >>> obj.iloc[1, 2] 6 >>> >>> obj.iloc[1, 0:2] A 4 B 5 Name: b, dtype: int64

【4】Pandas 重新索引

Pandas 對象的一個重要方法是 reindex，其作用是創(chuàng)建一個新對象，它的數(shù)據(jù)符合新的索引。以 DataFrame.reindex 為例（Series 類似），基本語法如下：

DataFrame.reindex(self, labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)

部分參數(shù)描述如下：（完整參數(shù)解釋參見官方文檔）

參數(shù)描述

index	用作索引的新序列，既可以是 index 實例，也可以是其他序列型的 Python 數(shù)據(jù)結(jié)構(gòu)
method	插值（填充）方式，取值如下： None：不填補空白； pad / ffill：將上一個有效的觀測值向前傳播到下一個有效的觀測值； backfill / bfill：使用下一個有效觀察值來填補空白； nearest：使用最近的有效觀測值來填補空白。
fill_value	在重新索引的過程中，需要引入缺失值時使用的替代值
limit	前向或后向填充時的最大填充量
tolerance	向前或向后填充時，填充不準確匹配項的最大間距（絕對值距離）
level	在 Multilndex 的指定級別上匹配簡單索引，否則選其子集
copy	默認為 True，無論如何都復制；如果為 False，則新舊相等就不復制

reindex 將會根據(jù)新索引進行重排。如果某個索引值當前不存在，就引入缺失值：

>>> import pandas as pd >>> obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c']) >>> obj d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64 >>> >>> obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e']) >>> obj2 a -5.3 b 7.2 c 3.6 d 4.5 e NaN dtype: float64

對于時間序列這樣的有序數(shù)據(jù)，重新索引時可能需要做一些插值處理。method 選項即可達到此目的，例如，使用 ffill 可以實現(xiàn)前向值填充：

>>> import pandas as pd >>> obj = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4]) >>> obj 0 blue 2 purple 4 yellow dtype: object >>> >>> obj2 = obj.reindex(range(6), method='ffill') >>> obj2 0 blue 1 blue 2 purple 3 purple 4 yellow 5 yellow dtype: object

借助 DataFrame，reindex可以修改（行）索引和列。只傳遞一個序列時，會重新索引結(jié)果的行：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California']) >>> objOhio Texas California a 0 1 2 c 3 4 5 d 6 7 8 >>> >>> obj2 = obj.reindex(['a', 'b', 'c', 'd']) >>> obj2Ohio Texas California a 0.0 1.0 2.0 b NaN NaN NaN c 3.0 4.0 5.0 d 6.0 7.0 8.0

列可以用 columns 關鍵字重新索引：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California']) >>> objOhio Texas California a 0 1 2 c 3 4 5 d 6 7 8 >>> >>> states = ['Texas', 'Utah', 'California'] >>> obj.reindex(columns=states)Texas Utah California a 1 NaN 2 c 4 NaN 5 d 7 NaN 8

總結(jié)

以上是生活随笔為你收集整理的Python 数据分析三剑客之 Pandas（二）：Index 索引对象以及各种索引操作的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：兴业爱奇艺联名信用卡怎么样兴业爱奇艺信
下一篇： Python3 已经安装相关库，Pych