當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

pandas教程：series和dataframe

發(fā)布時間：2025/3/20 编程问答 44 豆豆

生活随笔收集整理的這篇文章主要介紹了 pandas教程：series和dataframe 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

起步

pandas是一種Python數(shù)據(jù)分析的利器，是一個開源的數(shù)據(jù)分析包，最初是應(yīng)用于金融數(shù)據(jù)分析工具而開發(fā)出來的，因此pandas為時間序列分析提供了很好的支持。pandas是PyData項目的一部分。

官網(wǎng)：http://pandas.pydata.org/
官方文檔：http://pandas.pydata.org/pandas-docs/stable/

安裝與導(dǎo)入

安裝方式
Python的Anaconda發(fā)行版，已經(jīng)安裝好pandas庫，不需要另外安裝
使用Anaconda界面安裝，選擇對應(yīng)的pandas進(jìn)行勾選安裝即可
使用Anaconda命令安裝：conda install pandas
使用PyPi安裝命令安裝：pip install pandas

導(dǎo)入：

from pandas import Series, DataFrame import pandas as pd

Pandas的數(shù)據(jù)類型

Pandas基于兩種數(shù)據(jù)類型： series 與 dataframe 。

**Series：**一種類似于一維數(shù)組的對象，是由一組數(shù)據(jù)(各種NumPy數(shù)據(jù)類型)以及一組與之相關(guān)的數(shù)據(jù)標(biāo)簽(即索引)組成。僅由一組數(shù)據(jù)也可產(chǎn)生簡單的Series對象。注意：Series中的索引值是可以重復(fù)的。
**DataFrame：**一個表格型的數(shù)據(jù)結(jié)構(gòu)，包含有一組有序的列，每列可以是不同的值類型(數(shù)值、字符串、布爾型等)，DataFrame即有行索引也有列索引，可以被看做是由Series組成的字典。

Series

一個series是一個一維的數(shù)據(jù)類型，其中每一個元素都有一個標(biāo)簽。類似于Numpy中元素帶標(biāo)簽的數(shù)組。其中，標(biāo)簽可以是數(shù)字或者字符串。

series屬性

編號屬性或方法描述

1	axes	返回行軸標(biāo)簽列表。
2	dtype	返回對象的數(shù)據(jù)類型(dtype)。
3	empty	如果系列為空，則返回True。
4	ndim	返回底層數(shù)據(jù)的維數(shù)，默認(rèn)定義：1。
5	size	返回基礎(chǔ)數(shù)據(jù)中的元素數(shù)。
6	values	將系列作為ndarray返回。
7	head()	返回前n行。
8	tail()	返回最后n行。

pandas.Series( data, index, dtype, copy)

編號參數(shù)描述

1	data	數(shù)據(jù)采取各種形式，如：ndarray，list，constants
2	index	索引值必須是唯一的和散列的，與數(shù)據(jù)的長度相同。默認(rèn)np.arange(n)如果沒有索引被傳遞。
3	dtype	dtype用于數(shù)據(jù)類型。如果沒有，將推斷數(shù)據(jù)類型
4	copy	復(fù)制數(shù)據(jù)，默認(rèn)為false。

創(chuàng)建series方式

通過一維數(shù)組方式創(chuàng)建

import numpy as np import pandas as pd s = pd.Series([1, 2, 5, np.nan, 6, 8]) print(s)

輸出：

0 1.0 1 2.0 2 5.0 3 NaN 4 6.0 5 8.0 dtype: float64

從ndarray創(chuàng)建一個系列

data = np.array(['a','b','c','d']) ser02 = pd.Series(data) ser02#指定索引 data = np.array(['a','b','c','d']) # ser02 = pd.Series(data,index=[100,101,102,103]) ser02 = pd.Series(data,index=['name','age','sex','address']) ser02

輸出：

0 a 1 b 2 c 3 d dtype: objectname a age b sex c address d dtype: object

從字典創(chuàng)建一個系列

字典(dict)可以作為輸入傳遞，如果沒有指定索引，則按排序順序取得字典鍵以構(gòu)造索引。如果傳遞了索引，索引中與標(biāo)簽對應(yīng)的數(shù)據(jù)中的值將被拉出。

data = {'a':1,'b':2,'c':3} ser03 = pd.Series(data) ser03#指定索引 data = {'a':1,'b':2,'c':3} ser03 = pd.Series(data,index = ['a','b','c','d']) ser03#標(biāo)量創(chuàng)建 ser04 = pd.Series(5,index = [0,1,2,3]) ser04

輸出：

a 1 b 2 c 3 dtype: int64a 1.0 b 2.0 c 3.0 d NaN dtype: float640 5 1 5 2 5 3 5 dtype: int64

Series值的獲取

Series值的獲取主要有兩種方式：

通過方括號+索引的方式讀取對應(yīng)索引的數(shù)據(jù)，有可能返回多條數(shù)據(jù)
通過方括號+下標(biāo)值的方式讀取對應(yīng)下標(biāo)值的數(shù)據(jù)，下標(biāo)值的取值范圍為：[0，len(Series.values))；另外下標(biāo)值也可以是負(fù)數(shù)，表示從右往左獲取數(shù)據(jù)

Series獲取多個值的方式類似NumPy中的ndarray的切片操作，通過方括號+下標(biāo)值/索引值+冒號(:)的形式來截取series對象中的一部分?jǐn)?shù)

#引入模塊 import pandas as pd import numpy as np #檢索第一個元素。 ser05 = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print(ser05[1]) print(ser05['a']) print(ser05['d'])

輸出：

2 1 4 #檢索系列中的前三個元素 ser05 = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #通過索引來獲取數(shù)據(jù) print(ser05[:3]) print(ser05[::2]) print(ser05[4:2:-1]) #通過標(biāo)簽（下標(biāo)值）來獲取數(shù)據(jù) print(ser05['b':'d']) ser05['a':'d':2] ser05['e':'c':-1] ser05[['a','b']]

輸出：

a 1 b 2 c 3 dtype: int64 a 1 c 3 e 5 dtype: int64 e 5 d 4 dtype: int64 b 2 c 3 d 4 dtype: int64a 1 b 2 dtype: int64

Series的運算

#引入模塊 import pandas as pd import numpy as npseries = pd.Series({'a':941,'b':431,'c':9327}) series#輸出大于500的值 series[series>500]#計算加 series+10#計算減 series-100#計算乘 series*10#兩個系列相加 ser01 = pd.Series([1,2,3]) ser02 = pd.Series([4,5,6]) ser01+ser02#)計算各個元素的指數(shù)e的x次方 e 約等于 2.71828 np.exp(series)np.abs(series)#sign()計算各個元素的正負(fù)號: 1 正數(shù)，0：零，-1：負(fù)數(shù) np.sign(series)

Series自動對齊

當(dāng)多個series對象之間進(jìn)行運算的時候，如果不同series之間具有不同的索引值，那么運算會自動對齊不同索引值的數(shù)據(jù)，如果某個series沒有某個索引值，那么最終結(jié)果會賦值為NaN。

#引入模塊 import pandas as pd import numpy as np serA = pd.Series([1,2,3],index = ['a','b','c']) serB = pd.Series([4,5,6],index = ['b','c','d']) print('---------serA+serB---------') print(serA) serA+serB

輸出：

---------serA+serB--------- a 1 b 2 c 3 dtype: int64a NaN b 6.0 c 8.0 d NaN dtype: float64

Series及其索引的name屬性

Series對象本身以及索引都具有一個name屬性，默認(rèn)為空，根據(jù)需要可以進(jìn)行賦值操作

DataFrame

一個dataframe是一個二維的表結(jié)構(gòu)。Pandas的dataframe可以存儲許多種不同的數(shù)據(jù)類型，并且每一個坐標(biāo)軸都有自己的標(biāo)簽。你可以把它想象成一個series的字典項。

dataFrame屬性

編號屬性或方法描述

1	T	轉(zhuǎn)置行和列。
2	axes	返回一個列，行軸標(biāo)簽和列軸標(biāo)簽作為唯一的成員
3	dtypes	返回此對象中的數(shù)據(jù)類型(dtypes)。
4	empty	如果NDFrame完全為空[無項目]，則返回為True; 如果任何軸的長度為0。
5	ndim	軸/數(shù)組維度大小。
6	shape	返回表示DataFrame的維度的元組。
7	size	NDFrame中的元素數(shù)。
8	values	NDFrame的Numpy表示。
9	head()	返回開頭前n行。
10	tail()	返回最后n行。

####dataframe創(chuàng)建方式
pandas中的DataFrame可以使用以下構(gòu)造函數(shù)創(chuàng)建

pandas.DataFrame( data, index, columns, dtype, copy)

編號參數(shù)描述

1	data	數(shù)據(jù)采取各種形式，如:ndarray，series，map，lists，dict，constant和另一個DataFrame。
2	index	對于行標(biāo)簽，要用于結(jié)果幀的索引是可選缺省值np.arrange(n)，如果沒有傳遞索引值。
3	columns	對于列標(biāo)簽，可選的默認(rèn)語法是 - np.arange(n)。這只有在沒有索引傳遞的情況下才是這樣。
4	dtype	每列的數(shù)據(jù)類型。
5	copy	如果默認(rèn)值為False，則此命令(或任何它)用于復(fù)制數(shù)據(jù)。

創(chuàng)建一個 DateFrame：

#創(chuàng)建日期索引序列 dates =pd.date_range('20130101', periods=6) print(type(dates)) #創(chuàng)建Dataframe，其中 index 決定索引序列，columns 決定列名 df =pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD')) print(df)

輸出：

<class 'pandas.core.indexes.datetimes.DatetimeIndex'> A B C D 2013-01-01 0.406575 -1.356139 0.188997 -1.308049 2013-01-02 -0.412154 0.123879 0.907458 0.201024 2013-01-03 0.576566 -1.875753 1.967512 -1.044405 2013-01-04 1.116106 -0.796381 0.432589 0.764339 2013-01-05 -1.851676 0.378964 -0.282481 0.296629 2013-01-06 -1.051984 0.960433 -1.313190 -0.093666

字典創(chuàng)建 DataFrame

df2 =pd.DataFrame({'A' : 1., 'B': pd.Timestamp('20130102'), 'C': pd.Series(1,index=list(range(4)),dtype='float32'), 'D': np.array([3]*4,dtype='int32'), 'E': pd.Categorical(["test","train","test","train"]), 'F':'foo' }) print(df2)

輸出：

A B C D E F 0 1.0 2013-01-02 1.0 3 test foo 1 1.0 2013-01-02 1.0 3 train foo 2 1.0 2013-01-02 1.0 3 test foo 3 1.0 2013-01-02 1.0 3 train foo

從列表創(chuàng)建DataFrame

data = [1,2,3,4] df02 = pd.DataFrame(data) df02

輸出：

0 0 1 1 2 2 3 3 4

從列表字典來創(chuàng)建DataFrame

data = {'Name':['Tom','Jack','Steve'],'Age':[19,18,20]} # df04 = pd.DataFrame(data) #指定行索引和列索引 df04 = pd.DataFrame(data,index = ['rank1','rank2','rank3'],columns = ['Name','Age','Sex']) df04

輸出：

Name Age Sex rank1 Tom 19 NaN rank2 Jack 18 NaN rank3 Steve 20 NaN

從字典列表創(chuàng)建數(shù)據(jù)幀DataFrame

data = [{'a':1,'b':2},{'a':1,'b':2,'c':3}] # df05 = pd.DataFrame(data) #傳遞字典列表指定行索引 # df05 = pd.DataFrame(data,index = ['first','second']) #傳遞字典列表指定行索引，列索引 df05 = pd.DataFrame(data,index = ['first','second'],columns = ['a','b','c','d']) df05

輸出：

a b c d first 1 2 NaN NaN second 1 2 3.0 NaN

從系列的字典來創(chuàng)建DataFrame

data = {'one':pd.Series([1,2,3],index = ['a','b','c']),'two':pd.Series([1,2,3,4],index = ['a','b','c','d']) } df06 = pd.DataFrame(data) df06

輸出：

one two a 1.0 1 b 2.0 2 c 3.0 3 d NaN 4

dataFrame數(shù)據(jù)操作

列選擇

#直接通過列索引來獲取某一列的值 data = {'one':pd.Series([1,2,3],index = ['a','b','c']),'two':pd.Series([1,2,3,4],index = ['a','b','c','d']) } df06 = pd.DataFrame(data) df06df06['one'] # df06.one # df06.ix[:,'one'] # df06.loc[:,'one'] # df06.iloc[:,0]

列添加

data = {'one':pd.Series([1,2,3],index = ['a','b','c']),'two':pd.Series([1,2,3,4],index = ['a','b','c','d']) } df06 = pd.DataFrame(data) df06['three'] = pd.Series([10,20,30],index = ['a','b','c']) df06

列修改

#直接通過列名進(jìn)行修改 df06['three'] = [7,8,9,10] df06

列刪除

行選擇

data = {'one':pd.Series([1,2,3],index = ['a','b','c']),'two':pd.Series([1,2,3,4],index = ['a','b','c','d']),'three':pd.Series([10,20,30],index = ['a','b','c']) } df06 = pd.DataFrame(data) df06#可以通過將行標(biāo)簽傳遞給loc函數(shù)或者ix函數(shù)來選擇行 # df06.loc['a'] df06.loc[:,'two'] # df06.ix['a']# 按整數(shù)位置選擇 # 可以通過將整數(shù)位置傳遞給iloc函數(shù)來選擇行。參考以下示例代碼 - df06.iloc[2]# 行切片 # 可以使用:運算符選擇多行。參考以下示例代碼 - df06[2:4]

行添加

# df06.ix['e'] = [22,33,444] df06.loc['e'] = [22,33,444] df06# 添加加行 # 使用append()函數(shù)將新行添加到DataFrame。此功能將附加行結(jié)束。 #創(chuàng)建一行數(shù)據(jù) # data2 = pd.DataFrame([{'one':22,'two':33,'three':44}],index = ['e']) data2 = pd.DataFrame([[22,33,44]],columns = ['one','two','three'],index = ['f']) # data2 df06 = df06.append(data2) df06

行刪除

df06 = df06.drop('e') df06

文章有不當(dāng)之處，歡迎指正，如果喜歡微信閱讀，你也可以關(guān)注我的微信公眾號：cplus人工智能算法后端技術(shù)，獲取優(yōu)質(zhì)學(xué)習(xí)資源。

總結(jié)

以上是生活随笔為你收集整理的pandas教程：series和dataframe的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： numpy学习4：NumPy基本操作
下一篇：带你学python基础：变量和基本数据类