當前位置：首頁 > 编程语言 > python >内容正文

python

Python 数据分析三剑客之 Pandas（一）：认识 Pandas 及其 Series、DataFrame 对象

發(fā)布時間：2023/12/10 python 44 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python 数据分析三剑客之 Pandas（一）：认识 Pandas 及其 Series、DataFrame 对象小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

CSDN 課程推薦：《邁向數(shù)據(jù)科學家：帶你玩轉Python數(shù)據(jù)分析》，講師齊偉，蘇州研途教育科技有限公司CTO，蘇州大學應用統(tǒng)計專業(yè)碩士生指導委員會委員；已出版《跟老齊學Python：輕松入門》《跟老齊學Python：Django實戰(zhàn)》、《跟老齊學Python：數(shù)據(jù)分析》和《Python大學實用教程》暢銷圖書。

Pandas 系列文章：

Python 數(shù)據(jù)分析三劍客之 Pandas（一）：認識 Pandas 及其 Series、DataFrame 對象
Python 數(shù)據(jù)分析三劍客之 Pandas（二）：Index 索引對象以及各種索引操作
Python 數(shù)據(jù)分析三劍客之 Pandas（三）：算術運算與缺失值的處理
Python 數(shù)據(jù)分析三劍客之 Pandas（四）：函數(shù)應用、映射、排序和層級索引
Python 數(shù)據(jù)分析三劍客之 Pandas（五）：統(tǒng)計計算與統(tǒng)計描述
Python 數(shù)據(jù)分析三劍客之 Pandas（六）：GroupBy 數(shù)據(jù)分裂、應用與合并
Python 數(shù)據(jù)分析三劍客之 Pandas（七）：合并數(shù)據(jù)集
Python 數(shù)據(jù)分析三劍客之 Pandas（八）：數(shù)據(jù)重塑、重復數(shù)據(jù)處理與數(shù)據(jù)替換
Python 數(shù)據(jù)分析三劍客之 Pandas（九）：時間序列
Python 數(shù)據(jù)分析三劍客之 Pandas（十）：數(shù)據(jù)讀寫

另有 NumPy、Matplotlib 系列文章已更新完畢，歡迎關注：

NumPy 系列文章：https://itrhx.blog.csdn.net/category_9780393.html
Matplotlib 系列文章：https://itrhx.blog.csdn.net/category_9780418.html

推薦學習資料與網站（博主參與部分文檔翻譯）：

NumPy 官方中文網：https://www.numpy.org.cn/
Pandas 官方中文網：https://www.pypandas.cn/
Matplotlib 官方中文網：https://www.matplotlib.org.cn/
NumPy、Matplotlib、Pandas 速查表：https://github.com/TRHX/Python-quick-reference-table

文章目錄

- 【01x00】了解 Pandas
- 【02x00】Pandas 數(shù)據(jù)結構
- 【03x00】Series 對象
- - 【03x01】通過 list 構建 Series
  - 【03x02】通過 dict 構建 Series
  - 【03x03】獲取其數(shù)據(jù)和索引
  - 【03x04】通過索引獲取數(shù)據(jù)
  - 【03x05】使用函數(shù)運算
  - 【03x06】name 屬性
- 【04x00】DataFrame 對象
- - 【03x01】通過 ndarray 構建 DataFrame
  - 【03x02】通過 dict 構建 DataFrame
  - 【03x03】獲取其數(shù)據(jù)和索引
  - 【03x04】通過索引獲取數(shù)據(jù)
  - 【03x05】修改列的值
  - 【03x06】增加 / 刪除列
  - 【03x07】name 屬性

這里是一段防爬蟲文本，請讀者忽略。本文原創(chuàng)首發(fā)于 CSDN，作者 TRHX。博客首頁：https://itrhx.blog.csdn.net/ 本文鏈接：https://itrhx.blog.csdn.net/article/details/106676693 未經授權，禁止轉載！惡意轉載，后果自負！尊重原創(chuàng)，遠離剽竊！

【01x00】了解 Pandas

Pandas 是 Python 的一個數(shù)據(jù)分析包，是基于 NumPy 構建的，最初由 AQR Capital Management 于 2008 年 4 月開發(fā)，并于 2009 年底開源出來，目前由專注于 Python 數(shù)據(jù)包開發(fā)的 PyData 開發(fā)團隊繼續(xù)開發(fā)和維護，屬于 PyData 項目的一部分。

Pandas 最初被作為金融數(shù)據(jù)分析工具而開發(fā)出來，因此，Pandas 為時間序列分析提供了很好的支持。Pandas 的名稱來自于面板數(shù)據(jù)（panel data）和 Python 數(shù)據(jù)分析（data analysis）。panel data 是經濟學中關于多維數(shù)據(jù)集的一個術語，在 Pandas 中也提供了 panel 的數(shù)據(jù)類型。

Pandas 經常和其它工具一同使用，如數(shù)值計算工具 NumPy 和 SciPy，分析庫 statsmodels 和 scikit-learn，數(shù)據(jù)可視化庫 Matplotlib 等，雖然 Pandas 采用了大量的 NumPy 編碼風格，但二者最大的不同是 Pandas 是專門為處理表格和混雜數(shù)據(jù)設計的。而 NumPy 更適合處理統(tǒng)一的數(shù)值數(shù)組數(shù)據(jù)。

【以下對 Pandas 的解釋翻譯自官方文檔：https://pandas.pydata.org/docs/getting_started/overview.html#package-overview】

Pandas 是 Python 的核心數(shù)據(jù)分析支持庫，提供了快速、靈活、明確的數(shù)據(jù)結構，旨在簡單、直觀地處理關系型、標記型數(shù)據(jù)。Pandas 的目標是成為 Python 數(shù)據(jù)分析實踐與實戰(zhàn)的必備高級工具，其長遠目標是成為最強大、最靈活、可以支持任何語言的開源數(shù)據(jù)分析工具。經過多年不懈的努力，Pandas 離這個目標已經越來越近了。

Pandas 適用于處理以下類型的數(shù)據(jù)：

與 SQL 或 Excel 表類似的，含異構列的表格數(shù)據(jù);
有序和無序（非固定頻率）的時間序列數(shù)據(jù);
帶行列標簽的矩陣數(shù)據(jù)，包括同構或異構型數(shù)據(jù);
任意其它形式的觀測、統(tǒng)計數(shù)據(jù)集, 數(shù)據(jù)轉入 Pandas 數(shù)據(jù)結構時不必事先標記。

Pandas 的主要數(shù)據(jù)結構是 Series（一維數(shù)據(jù)）與 DataFrame（二維數(shù)據(jù)），這兩種數(shù)據(jù)結構足以處理- 金融、統(tǒng)計、社會科學、工程等領域里的大多數(shù)典型用例。對于 R 語言用戶，DataFrame 提供了比 R 語言 data.frame 更豐富的功能。Pandas 基于 NumPy 開發(fā)，可以與其它第三方科學計算支持庫完美集成。

Pandas 就像一把萬能瑞士軍刀，下面僅列出了它的部分優(yōu)勢：

處理浮點與非浮點數(shù)據(jù)里的缺失數(shù)據(jù)，表示為 NaN；
大小可變：插入或刪除 DataFrame 等多維對象的列；
自動、顯式數(shù)據(jù)對齊：顯式地將對象與一組標簽對齊，也可以忽略標簽，在 Series、DataFrame 計算時自動與數(shù)據(jù)對齊；
強大、靈活的分組（group by）功能：拆分-應用-組合數(shù)據(jù)集，聚合、轉換數(shù)據(jù)；
把 Python 和 NumPy 數(shù)據(jù)結構里不規(guī)則、不同索引的數(shù)據(jù)輕松地轉換為 DataFrame 對象；
基于智能標簽，對大型數(shù)據(jù)集進行切片、花式索引、子集分解等操作；
直觀地合并和連接數(shù)據(jù)集；
靈活地重塑和旋轉數(shù)據(jù)集；
軸支持分層標簽（每個刻度可能有多個標簽）；
強大的 IO 工具，讀取平面文件（CSV 等支持分隔符的文件）、Excel 文件、數(shù)據(jù)庫等來源的數(shù)據(jù)，以及從超快 HDF5 格式保存 / 加載數(shù)據(jù)；
時間序列：支持日期范圍生成、頻率轉換、移動窗口統(tǒng)計、移動窗口線性回歸、日期位移等時間序列功能。

這些功能主要是為了解決其它編程語言、科研環(huán)境的痛點。處理數(shù)據(jù)一般分為幾個階段：數(shù)據(jù)整理與清洗、數(shù)據(jù)分析與建模、數(shù)據(jù)可視化與制表，Pandas 是處理數(shù)據(jù)的理想工具。

其它說明：

Pandas 速度很快。Pandas 的很多底層算法都用 Cython 優(yōu)化過。然而，為了保持通用性，必然要犧牲一些性能，如果專注某一功能，完全可以開發(fā)出比 Pandas 更快的專用工具。
Pandas 是 statsmodels 的依賴項，因此，Pandas 也是 Python 中統(tǒng)計計算生態(tài)系統(tǒng)的重要組成部分。
Pandas 已廣泛應用于金融領域。

【02x00】Pandas 數(shù)據(jù)結構

Pandas 的主要數(shù)據(jù)結構是 Series（帶標簽的一維同構數(shù)組）與 DataFrame（帶標簽的，大小可變的二維異構表格）。

Pandas 數(shù)據(jù)結構就像是低維數(shù)據(jù)的容器。比如，DataFrame 是 Series 的容器，Series 則是標量的容器。使用這種方式，可以在容器中以字典的形式插入或刪除對象。

此外，通用 API 函數(shù)的默認操作要顧及時間序列與截面數(shù)據(jù)集的方向。當使用 Ndarray 存儲二維或三維數(shù)據(jù)時，編寫函數(shù)要注意數(shù)據(jù)集的方向，這對用戶來說是一種負擔；如果不考慮 C 或 Fortran 中連續(xù)性對性能的影響，一般情況下，不同的軸在程序里其實沒有什么區(qū)別。Pandas 里，軸的概念主要是為了給數(shù)據(jù)賦予更直觀的語義，即用更恰當?shù)姆绞奖硎緮?shù)據(jù)集的方向。這樣做可以讓用戶編寫數(shù)據(jù)轉換函數(shù)時，少費點腦子。

處理 DataFrame 等表格數(shù)據(jù)時，對比 Numpy，index（行）或 columns（列）比 axis 0 和 axis 1 更直觀。用這種方式迭代 DataFrame 的列，代碼更易讀易懂：

for col in df.columns:series = df[col]# do something with series

【03x00】Series 對象

Series 是帶標簽的一維數(shù)組，可存儲整數(shù)、浮點數(shù)、字符串、Python 對象等類型的數(shù)據(jù)。軸標簽統(tǒng)稱為索引。調用 pandas.Series 函數(shù)即可創(chuàng)建 Series，基本語法如下：

pandas.Series(data=None[, index=None, dtype=None, name=None, copy=False, fastpath=False])

參數(shù)描述

data	數(shù)組類型，可迭代的，字典或標量值，存儲在序列中的數(shù)據(jù)
index	索引（數(shù)據(jù)標簽），值必須是可哈希的，并且具有與數(shù)據(jù)相同的長度，允許使用非唯一索引值。如果未提供，將默認為RangeIndex（0，1，2，…，n）
dtype	輸出系列的數(shù)據(jù)類型。可選項，如果未指定，則將從數(shù)據(jù)中推斷，具體參考官網 dtypes 介紹
name	str 類型，可選項，給 Series 命名
copy	bool 類型，可選項，默認 False，是否復制輸入數(shù)據(jù)

【03x01】通過 list 構建 Series

一般情況下我們只會用到 data 和 index 參數(shù)，可以通過 list（列表）構建 Series，示例如下：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2]) >>> obj 0 1 1 5 2 -8 3 2 dtype: int64

由于我們沒有為數(shù)據(jù)指定索引，于是會自動創(chuàng)建一個 0 到 N-1（N 為數(shù)據(jù)的長度）的整數(shù)型索引，左邊一列是自動創(chuàng)建的索引（index），右邊一列是數(shù)據(jù)（data）。

此外，還可以自定義索引（index）：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64

索引（index）也可以通過賦值的方式就地修改：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan'] >>> obj Bob 1 Steve 5 Jeff -8 Ryan 2 dtype: int64

【03x02】通過 dict 構建 Series

通過字典（dict）構建 Series，字典的鍵（key）會作為索引（index），字典的值（value）會作為數(shù)據(jù)（data），示例如下：

>>> import pandas as pd >>> data = {'Beijing': 21530000, 'Shanghai': 24280000, 'Wuhan': 11210000, 'Zhejiang': 58500000} >>> obj = pd.Series(data) >>> obj Beijing 21530000 Shanghai 24280000 Wuhan 11210000 Zhejiang 58500000 dtype: int64

如果你想按照某個特定的順序輸出結果，可以傳入排好序的字典的鍵以改變順序：

>>> import pandas as pd >>> data = {'Beijing': 21530000, 'Shanghai': 24280000, 'Wuhan': 11210000, 'Zhejiang': 58500000} >>> cities = ['Guangzhou', 'Wuhan', 'Zhejiang', 'Shanghai'] >>> obj = pd.Series(data, index=cities) >>> obj Guangzhou NaN Wuhan 11210000.0 Zhejiang 58500000.0 Shanghai 24280000.0 dtype: float64

注意：data 為字典，且未設置 index 參數(shù)時：

如果 Python >= 3.6 且 Pandas >= 0.23，Series 按字典的插入順序排序索引。
如果 Python < 3.6 或 Pandas < 0.23，Series 按字母順序排序索引。

【03x03】獲取其數(shù)據(jù)和索引

我們可以通過 Series 的 values 和 index 屬性獲取其數(shù)據(jù)和索引對象：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj.values array([ 1, 5, -8, 2], dtype=int64) >>> obj.index Index(['a', 'b', 'c', 'd'], dtype='object')

【03x04】通過索引獲取數(shù)據(jù)

與普通 NumPy 數(shù)組相比，Pandas 可以通過索引的方式選取 Series 中的單個或一組值，獲取一組值時，傳入的是一個列表，列表中的元素是索引值，另外還可以通過索引來修改其對應的值：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> obj['a'] 1 >>> obj['a'] = 3 >>> obj[['a', 'b', 'c']] a 3 b 5 c -8 dtype: int64

【03x05】使用函數(shù)運算

在 Pandas 中可以使用 NumPy 函數(shù)或類似 NumPy 的運算（如根據(jù)布爾型數(shù)組進行過濾、標量乘法、應用數(shù)學函數(shù)等）：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj[obj > 0] a 1 b 5 d 2 dtype: int64 >>> obj * 2 a 2 b 10 c -16 d 4 dtype: int64 >>> np.exp(obj) a 2.718282 b 148.413159 c 0.000335 d 7.389056 dtype: float64

除了這些運算函數(shù)以外，還可以將 Series 看成是一個定長的有序字典，因為它是索引值到數(shù)據(jù)值的一個映射。它可以用在許多原本需要字典參數(shù)的函數(shù)中：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> 'a' in obj True >>> 'e' in obj False

和 NumPy 類似，Pandas 中也有 NaN（即非數(shù)字，not a number），在 Pandas 中，它用于表示缺失值，Pandas 的 isnull 和 notnull 函數(shù)可用于檢測缺失數(shù)據(jù)：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.Series([np.NaN, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a NaN b 5.0 c -8.0 d 2.0 dtype: float64 >>> pd.isnull(obj) a True b False c False d False dtype: bool >>> pd.notnull(obj) a False b True c True d True dtype: bool >>> obj.isnull() a True b False c False d False dtype: bool >>> obj.notnull() a False b True c True d True dtype: bool

【03x06】name 屬性

可以在 pandas.Series 方法中為 Series 對象指定一個 name：

>>> import pandas as pd >>> data = {'Beijing': 21530000, 'Shanghai': 24280000, 'Wuhan': 11210000, 'Zhejiang': 58500000} >>> obj = pd.Series(data, name='population') >>> obj Beijing 21530000 Shanghai 24280000 Wuhan 11210000 Zhejiang 58500000 Name: population, dtype: int64

也可以通過 name 和 index.name 屬性為 Series 對象和其索引指定 name：

>>> import pandas as pd >>> data = {'Beijing': 21530000, 'Shanghai': 24280000, 'Wuhan': 11210000, 'Zhejiang': 58500000} >>> obj = pd.Series(data) >>> obj.name = 'population' >>> obj.index.name = 'cities' >>> obj cities Beijing 21530000 Shanghai 24280000 Wuhan 11210000 Zhejiang 58500000 Name: population, dtype: int64

【04x00】DataFrame 對象

DataFrame 是一個表格型的數(shù)據(jù)結構，它含有一組有序的列，每列可以是不同的值類型（數(shù)值、字符串、布爾值等）。DataFrame 既有行索引也有列索引，它可以被看做由 Series 組成的字典（共用同一個索引）。DataFrame 中的數(shù)據(jù)是以一個或多個二維塊存放的（而不是列表、字典或別的一維數(shù)據(jù)結構）。

類似多維數(shù)組/表格數(shù)據(jù) (如Excel、R 語言中的 data.frame)；
每列數(shù)據(jù)可以是不同的類型；
索引包括列索引和行索引

基本語法如下：

pandas.DataFrame(data=None, index: Optional[Collection] = None, columns: Optional[Collection] = None, dtype: Union[str, numpy.dtype, ExtensionDtype, None] = None, copy: bool = False)

參數(shù)描述

data	ndarray 對象（結構化或同類的）、可迭代的或者字典形式，存儲在序列中的數(shù)據(jù)
index	數(shù)組類型，索引（數(shù)據(jù)標簽），如果未提供，將默認為 RangeIndex（0，1，2，…，n）
columns	列標簽。如果未提供，則將默認為 RangeIndex（0、1、2、…、n）
dtype	輸出系列的數(shù)據(jù)類型。可選項，如果未指定，則將從數(shù)據(jù)中推斷，具體參考官網 dtypes 介紹
copy	bool 類型，可選項，默認 False，是否復制輸入數(shù)據(jù)，僅影響 DataFrame/2d ndarray 輸入

【03x01】通過 ndarray 構建 DataFrame

>>> import numpy as np >>> import pandas as pd >>> data = np.random.randn(5,3) >>> data array([[-2.16231157, 0.44967198, -0.73131523],[ 1.18982913, 0.94670798, 0.82973421],[-1.57680831, -0.99732066, 0.96432 ],[-0.77483149, -1.23802881, 0.44061227],[ 1.77666419, 0.24931983, -1.12960153]]) >>> obj = pd.DataFrame(data) >>> obj0 1 2 0 -2.162312 0.449672 -0.731315 1 1.189829 0.946708 0.829734 2 -1.576808 -0.997321 0.964320 3 -0.774831 -1.238029 0.440612 4 1.776664 0.249320 -1.129602

指定索引（index）和列標簽（columns），和 Series 對象類似，可以在構建的時候添加索引和標簽，也可以直接通過賦值的方式就地修改：

>>> import numpy as np >>> import pandas as pd >>> data = np.random.randn(5,3) >>> index = ['a', 'b', 'c', 'd', 'e'] >>> columns = ['A', 'B', 'C'] >>> obj = pd.DataFrame(data, index, columns) >>> objA B C a -1.042909 -0.238236 -1.050308 b 0.587079 0.739683 -0.233624 c -0.451254 -0.638496 1.708807 d -0.620158 -1.875929 -0.432382 e -1.093815 0.396965 -0.759479 >>> >>> obj.index = ['A1', 'A2', 'A3', 'A4', 'A5'] >>> obj.columns = ['B1', 'B2', 'B3'] >>> objB1 B2 B3 A1 -1.042909 -0.238236 -1.050308 A2 0.587079 0.739683 -0.233624 A3 -0.451254 -0.638496 1.708807 A4 -0.620158 -1.875929 -0.432382 A5 -1.093815 0.396965 -0.759479

【03x02】通過 dict 構建 DataFrame

通過字典（dict）構建 DataFrame，字典的鍵（key）會作為列標簽（columns），字典的值（value）會作為數(shù)據(jù)（data），示例如下：

如果指定了列序列，則 DataFrame 的列就會按照指定順序進行排列，如果傳入的列在數(shù)據(jù)中找不到，就會在結果中產生缺失值（NaN）：

>>> import pandas as pd >>> data = {'city': ['Wuhan', 'Wuhan', 'Wuhan', 'Beijing', 'Beijing', 'Beijing'],'year': [2017, 2018, 2019, 2017, 2018, 2019],'people': [10892900, 11081000, 11212000, 21707000, 21542000, 21536000]} >>> pd.DataFrame(data)city year people 0 Wuhan 2017 10892900 1 Wuhan 2018 11081000 2 Wuhan 2019 11212000 3 Beijing 2017 21707000 4 Beijing 2018 21542000 5 Beijing 2019 21536000 >>> pd.DataFrame(data, columns=['year', 'city', 'people'])year city people 0 2017 Wuhan 10892900 1 2018 Wuhan 11081000 2 2019 Wuhan 11212000 3 2017 Beijing 21707000 4 2018 Beijing 21542000 5 2019 Beijing 21536000 >>> pd.DataFrame(data, columns=['year', 'city', 'people', 'money'])year city people money 0 2017 Wuhan 10892900 NaN 1 2018 Wuhan 11081000 NaN 2 2019 Wuhan 11212000 NaN 3 2017 Beijing 21707000 NaN 4 2018 Beijing 21542000 NaN 5 2019 Beijing 21536000 NaN

注意：data 為字典，且未設置 columns 參數(shù)時：

Python > = 3.6 且 Pandas > = 0.23，DataFrame 的列按字典的插入順序排序。
Python < 3.6 或 Pandas < 0.23，DataFrame 的列按字典鍵的字母排序。

【03x03】獲取其數(shù)據(jù)和索引

和 Series 一樣，DataFrame 也可以通過其 values 和 index 屬性獲取其數(shù)據(jù)和索引對象：

【03x04】通過索引獲取數(shù)據(jù)

通過類似字典標記的方式或屬性的方式，可以將 DataFrame 的列獲取為一個 Series 對象；

行也可以通過位置或名稱的方式進行獲取，比如用 loc 屬性；

對于特別大的 DataFrame，有一個 head 方法可以選取前五行數(shù)據(jù)。

用法示例：

>>> import numpy as np >>> import pandas as pd >>> data = {'city': ['Wuhan', 'Wuhan', 'Wuhan', 'Beijing', 'Beijing', 'Beijing'],'year': [2017, 2018, 2019, 2017, 2018, 2019],'people': [10892900, 11081000, 11212000, 21707000, 21542000, 21536000]} >>> obj = pd.DataFrame(data) >>> objcity year people 0 Wuhan 2017 10892900 1 Wuhan 2018 11081000 2 Wuhan 2019 11212000 3 Beijing 2017 21707000 4 Beijing 2018 21542000 5 Beijing 2019 21536000 >>> >>> obj['city'] 0 Wuhan 1 Wuhan 2 Wuhan 3 Beijing 4 Beijing 5 Beijing Name: city, dtype: object >>> >>> obj.year 0 2017 1 2018 2 2019 3 2017 4 2018 5 2019 Name: year, dtype: int64 >>> >>> type(obj.year) <class 'pandas.core.series.Series'> >>> >>> obj.loc[2] city Wuhan year 2019 people 11212000 Name: 2, dtype: object >>> >>> obj.head()city year people 0 Wuhan 2017 10892900 1 Wuhan 2018 11081000 2 Wuhan 2019 11212000 3 Beijing 2017 21707000 4 Beijing 2018 21542000

【03x05】修改列的值

列可以通過賦值的方式進行修改。在下面示例中，分別給"money"列賦上一個標量值和一組值：

>>> import pandas as pd >>> import numpy as np >>> data = {'city': ['Wuhan', 'Wuhan', 'Wuhan', 'Beijing', 'Beijing', 'Beijing'],'year': [2017, 2018, 2019, 2017, 2018, 2019],'people': [10892900, 11081000, 11212000, 21707000, 21542000, 21536000],'money':[np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN]} >>> obj = pd.DataFrame(data, index=['A', 'B', 'C', 'D', 'E', 'F']) >>> objcity year people money A Wuhan 2017 10892900 NaN B Wuhan 2018 11081000 NaN C Wuhan 2019 11212000 NaN D Beijing 2017 21707000 NaN E Beijing 2018 21542000 NaN F Beijing 2019 21536000 NaN >>> >>> obj['money'] = 6666666666 >>> objcity year people money A Wuhan 2017 10892900 6666666666 B Wuhan 2018 11081000 6666666666 C Wuhan 2019 11212000 6666666666 D Beijing 2017 21707000 6666666666 E Beijing 2018 21542000 6666666666 F Beijing 2019 21536000 6666666666 >>> >>> obj['money'] = np.arange(100000000, 700000000, 100000000) >>> objcity year people money A Wuhan 2017 10892900 100000000 B Wuhan 2018 11081000 200000000 C Wuhan 2019 11212000 300000000 D Beijing 2017 21707000 400000000 E Beijing 2018 21542000 500000000 F Beijing 2019 21536000 600000000

將列表或數(shù)組賦值給某個列時，其長度必須跟 DataFrame 的長度相匹配。如果賦值的是一個 Series，就會精確匹配 DataFrame 的索引：

>>> import pandas as pd >>> import numpy as np >>> data = {'city': ['Wuhan', 'Wuhan', 'Wuhan', 'Beijing', 'Beijing', 'Beijing'],'year': [2017, 2018, 2019, 2017, 2018, 2019],'people': [10892900, 11081000, 11212000, 21707000, 21542000, 21536000],'money':[np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN]} >>> obj = pd.DataFrame(data, index=['A', 'B', 'C', 'D', 'E', 'F']) >>> objcity year people money A Wuhan 2017 10892900 NaN B Wuhan 2018 11081000 NaN C Wuhan 2019 11212000 NaN D Beijing 2017 21707000 NaN E Beijing 2018 21542000 NaN F Beijing 2019 21536000 NaN >>> >>> new_data = pd.Series([5670000000, 6890000000, 7890000000], index=['A', 'C', 'E']) >>> obj['money'] = new_data >>> objcity year people money A Wuhan 2017 10892900 5.670000e+09 B Wuhan 2018 11081000 NaN C Wuhan 2019 11212000 6.890000e+09 D Beijing 2017 21707000 NaN E Beijing 2018 21542000 7.890000e+09 F Beijing 2019 21536000 NaN

【03x06】增加 / 刪除列

為不存在的列賦值會創(chuàng)建出一個新列，關鍵字 del 用于刪除列：

>>> import pandas as pd >>> data = {'city': ['Wuhan', 'Wuhan', 'Wuhan', 'Beijing', 'Beijing', 'Beijing'],'year': [2017, 2018, 2019, 2017, 2018, 2019],'people': [10892900, 11081000, 11212000, 21707000, 21542000, 21536000]} >>> obj = pd.DataFrame(data) >>> objcity year people 0 Wuhan 2017 10892900 1 Wuhan 2018 11081000 2 Wuhan 2019 11212000 3 Beijing 2017 21707000 4 Beijing 2018 21542000 5 Beijing 2019 21536000 >>> >>> obj['northern'] = obj['city'] == 'Beijing' >>> objcity year people northern 0 Wuhan 2017 10892900 False 1 Wuhan 2018 11081000 False 2 Wuhan 2019 11212000 False 3 Beijing 2017 21707000 True 4 Beijing 2018 21542000 True 5 Beijing 2019 21536000 True >>> >>> del obj['northern'] >>> objcity year people 0 Wuhan 2017 10892900 1 Wuhan 2018 11081000 2 Wuhan 2019 11212000 3 Beijing 2017 21707000 4 Beijing 2018 21542000 5 Beijing 2019 21536000

【03x07】name 屬性

可以通過 index.name 和 columns.name 屬性設置索引（index）和列標簽（columns）的 name，注意 DataFrame 對象是沒有 name 屬性的：

>>> import pandas as pd >>> data = {'city': ['Wuhan', 'Wuhan', 'Wuhan', 'Beijing', 'Beijing', 'Beijing'],'year': [2017, 2018, 2019, 2017, 2018, 2019],'people': [10892900, 11081000, 11212000, 21707000, 21542000, 21536000]} >>> obj = pd.DataFrame(data) >>> obj.index.name = 'index' >>> obj.columns.name = 'columns' >>> obj columns city year people index 0 Wuhan 2017 10892900 1 Wuhan 2018 11081000 2 Wuhan 2019 11212000 3 Beijing 2017 21707000 4 Beijing 2018 21542000 5 Beijing 2019 21536000

創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎勵來咯，堅持創(chuàng)作打卡瓜分現(xiàn)金大獎

總結

以上是生活随笔為你收集整理的Python 数据分析三剑客之 Pandas（一）：认识 Pandas 及其 Series、DataFrame 对象的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：浦发万用随借金怎么转他行浦发万用随借金
下一篇：【Python CheckiO 题解】T