當前位置：首頁 >

Python 数据分析与展示笔记4 -- Pandas 库基础

發布時間：2025/3/12 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python 数据分析与展示笔记4 -- Pandas 库基础小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Python 數據分析與展示筆記4 – Pandas 庫基礎

Python 數據分析與展示系列筆記是筆者學習、實踐Python 數據分析與展示的相關筆記

課程鏈接： Python 數據分析與展示

參考文檔：
Numpy 官方文檔（英文）
Numpy 官方文檔（中文）
PIL 官方文檔
Matplotlib 官方文檔
Pandas 官方文檔（英文）
Pandas 官方文檔（中文）
Pandas 官方文檔PDF下載

一、Pandas

1、安裝、導入 Pandas

# 安裝 pip3 install pandas# 導入 import pandas as pd

2、Pandas 簡介

Pandas 是 Python 第三方庫，提供高性能易用數據類型和分析工具
Pandas 基于 NumPy 實現，提供更多樣的索引，除了自動索引，可以添加自定義索引
使用索引時只能使用自動索引/自定義索引，不能混合使用

二、 Pandas 數據類型

1、Series 類型

Series是帶索引標簽的一維數組

創建 Series 類型： 可以基于以下類型創建

標量值
Python 列表
Python 字典
一維 ndarray 數組
其他函數

import pandas as pd import numpy as np# 基于標量值創建 s = pd.Series(10, index=['a', 'b']) print(s)>>> a 10b 10dtype: int64# 基于列表創建 s = pd.Series([1, 2], index=['a', 'b']) print(s)>>> a 1b 2dtype: int64# 基于字典創建，index 可以從字典的鍵值選擇需要的，沒有的話用 NaN填充 s = pd.Series({'b': 1, 'a': 2, 'c': 3}, index=['a', 'b', 'd']) print(s)>>> a 2.0b 1.0d NaNdtype: float64# 基于 ndarray 數組創建，index 也可以是 ndarray 數組 s = pd.Series(np.arange(3), index=np.arange(5, 0, -2)) print(s)>>> 5 03 11 2dtype: int32

Series 類型的基本操作：

Series 類型包括 index 和 values 兩部分，.index 獲得索引
.values 獲得數據
Series 類型的操作類似 ndarray 類型，索引、切片、使用 numpy 函數
Series 類型的操作類似 Python 字典類型，in、.get()
Series 對象和索引都可以有一個名字，存儲在屬性.name中，可以修改
Series 類型在運算中會自動對齊不同索引的數據，各自沒有的索引其值賦為 NaN 再運算

import pandas as pd import numpy as np# 創建一個 Series 數組 s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])# 獲得索引 s.index >>> Index(['a', 'b', 'c'], dtype='object')# 獲得數據 s.values >>> [1 2 3]# 索引訪問數據，自動索引和自定義索引并存，但兩套索引并存，但不能混用 s[['a', 'b']] >>> a 1b 2s[0] >>> 1# 切片 s[:2] >>> a 1b 2# numpy 運算函數 np.exp(s) >>> a 2.718282b 7.389056c 20.085537dtype: float64# in 操作 'b' in s >>> True# .get() 操作 s.get('d', 4) >>> 4# 修改Series 對象和索引名稱 print(s.name) print(s.index.name) s.name = 'series name' s.index.name = 'index name' print(s.name) print(s.index.name) >>> NoneNoneseries nameindex name# Series 運算對齊索引 a = pd.Series([1, 2], ['a', 'b']) b = pd.Series([2, 3], ['b', 'd']) a + b >>> a NaNb 4.0d NaNdtype: float64

2、DataFrame

DataFrame 是一個表格型的數據類型，每列值類型可以不同，既有行索引、也有列索引，常用于表達二維數據，但可以表達多維數據

創建 Series 類型： 可以基于以下類型創建

二維 ndarray 對象
由一維 ndarray、列表、字典、元組或 Series 構成的字典
Series 類型
其他的 DataFrame 類型

import pandas as pd import numpy as np# 基于二維 ndarray 對象創建 d = pd.DataFrame(np.arange(4).reshape(2, 2)) print(d) >>> 0 10 0 11 2 3# 基于 Series 字典 d = pd.DataFrame({'one': pd.Series([1, 2], ['a', 'b']),'two': pd.Series([3, 4], ['a', 'b'])}) print(d) >>> one twoa 1 3b 2 4# 基于列表類型的字典創建 d = pd.DataFrame({'one': [1, 2], 'two': [3, 4]}) print(d) >>> one two0 1 31 2 4

DataFrame 索引類型的基本操作：

方法說明

.reindex()	改變或重排Series和DataFrame索引
.drop()	刪除Series和DataFrame指定行或列索引
.delete(loc)	刪除loc位置處的元素
.insert(loc,e)	在loc位置增加一個元素e
.append(idx)	連接另一個Index對象，產生新的Index對象
.diff(idx)	計算差集，產生新的Index對象
.intersection(idx)	計算交集
.union(idx)	計算并集

DataFrame 算術運算的基本操作：