當(dāng)前位置：首頁(yè) >

Pandas基础复习-DataFrame

發(fā)布時(shí)間：2025/6/15 46 豆豆

生活随笔收集整理的這篇文章主要介紹了 Pandas基础复习-DataFrame 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

數(shù)據(jù)類型-DataFrame

DataFrame是由多個(gè)Series數(shù)據(jù)列組成的表格數(shù)據(jù)類型，每行Series值都增加了一個(gè)共用的索引
既有行索引，又有列索引
- 行索引，表明不同行，橫向索引，叫index，0軸，axis=0
- 列索引，表名不同列，縱向索引，叫columns，1軸，axis=1
DataFrame數(shù)據(jù)類型可視為：二維帶標(biāo)簽數(shù)組
每列值的類型可以不同
基本操作類似Series，依據(jù)行列索引操作
常用于表達(dá)二維數(shù)據(jù)，但也可以表達(dá)多維數(shù)據(jù)(Dataframe嵌套，極少用)

DataFrame數(shù)據(jù)類型創(chuàng)建

Python list列表 創(chuàng)建DataFrame

import pandas as pddf = pd.DataFrame([True, 1, 2.3, 'a', '你好']) # 1維 df 001234

True
1
2.3
a
你好

True

2.3

你好

df = pd.DataFrame([[True,1,2.3,'a','你好'],[1,2,3,4,5]]) #2維 df 0123401

True	1	2.3	a	你好
1	2	3.0	4	5

# 3維，不建議 df = pd.DataFrame([[[True,1,2.3,'a','你好'],[1,2,3,4,5]],[[True,1,2.3,'a','你好'],[1,2,3,4,5]]]) df 0101

[True, 1, 2.3, a, 你好]	[1, 2, 3, 4, 5]
[True, 1, 2.3, a, 你好]	[1, 2, 3, 4, 5]

Python 字典 創(chuàng)建DataFrame

df = pd.DataFrame({'one':[1,2,3,4],'two':[9,8,7,6]}) df onetwo0123

1	9
2	8
3	7
4	6

# 自定義行索引 df = pd.DataFrame({'one':[1,2,3,4],'two':[9,8,7,6]},index = ['a','b','c','d']) df onetwoabcd

1	9
2	8
3	7
4	6

df = pd.DataFrame({'A' : 1,'B' : 2.3,'C' : ['x','y',5] #需要多行 }) df ABC012

1	2.3	x
1	2.3	y
1	2.3	5

dt = {'one' : pd.Series([1,2,3],index=['a','b','c']),'two' : pd.Series([9,8,7,6],index=['a','b','c','d',]) } dt {'one': a 1b 2c 3dtype: int64, 'two': a 9b 8c 7d 6dtype: int64} # one two自動(dòng)列索引,abcd自動(dòng)行索引.每個(gè)元素對(duì)應(yīng)DataFrame的一列,每個(gè)元素內(nèi)的鍵值對(duì)應(yīng)一行 d = pd.DataFrame(dt) d onetwoabcd

1.0	9
2.0	8
3.0	7
NaN	6

# 數(shù)據(jù)根據(jù)行列索引自動(dòng)補(bǔ)齊 d_2 = pd.DataFrame(dt,index=['b','c','d'],columns=['two','three']) d_2 twothreebcd

8	NaN
7	NaN
6	NaN

ndarray數(shù)組 創(chuàng)建DataFrame

import numpy as npdf = pd.DataFrame(np.arange(10).reshape(2,5)) # 自動(dòng)生成行/列索引 df 0123401

0	1	2	3	4
5	6	7	8	9

# 自定義行列索引 df = pd.DataFrame(np.random.randn(6,4),index=[1,2,3,4,5,6],columns=['a','b','c','d']) df abcd123456

0.274340	0.296507	0.751198	0.763512
0.181134	0.675380	0.553695	0.632163
-0.059765	0.347702	1.138297	-0.143998
-1.370677	-0.951640	0.135964	-0.665875
1.490610	0.420539	0.628784	2.119896
-1.669737	1.167765	1.254722	-0.948624

Series 創(chuàng)建DataFrame

e = pd.DataFrame([pd.Series([1,2,3]),pd.Series([9,8,7,6])],index=['a','b']) e 0123ab

1.0	2.0	3.0	NaN
9.0	8.0	7.0	6.0

DataFrame屬性

di = {'姓名':['張三','李四','王五','趙六'],'性別':['男','女','女','男'],'年齡':[12,22,32,42],'地址':['北京','上海','廣州','深圳'] } di {'地址': ['北京', '上海', '廣州', '深圳'],'姓名': ['張三', '李四', '王五', '趙六'],'年齡': [12, 22, 32, 42],'性別': ['男', '女', '女', '男']} d = pd.DataFrame(di,index=['d1','d2','d3','d4']) d 地址姓名年齡性別d1d2d3d4

北京	張三	12	男
上海	李四	22	女
廣州	王五	32	女
深圳	趙六	42	男

d.head() # 顯示頭部幾行地址姓名年齡性別d1d2d3d4

北京	張三	12	男
上海	李四	22	女
廣州	王五	32	女
深圳	趙六	42	男

d.tail(3) # 顯示末尾幾行地址姓名年齡性別d2d3d4

上海	李四	22	女
廣州	王五	32	女
深圳	趙六	42	男

d.info() # 相關(guān)信息概覽 <class 'pandas.core.frame.DataFrame'> Index: 4 entries, d1 to d4 Data columns (total 4 columns): 地址 4 non-null object 姓名 4 non-null object 年齡 4 non-null int64 性別 4 non-null object dtypes: int64(1), object(3) memory usage: 160.0+ bytes d.shape # 行數(shù) 列數(shù) (4, 4) d.dtypes # 列數(shù)據(jù)類型地址 object 姓名 object 年齡 int64 性別 object dtype: object d.index # 獲取行索引 Index(['d1', 'd2', 'd3', 'd4'], dtype='object') d.columns # 獲取列索引 Index(['地址', '姓名', '年齡', '性別'], dtype='object') d.values # 獲取值 array([['北京', '張三', 12, '男'],['上海', '李四', 22, '女'],['廣州', '王五', 32, '女'],['深圳', '趙六', 42, '男']], dtype=object)

DataFrame查增改刪

查 Read

類list/ndarray數(shù)據(jù)訪問方式

dates = pd.date_range('20130101',periods=10) dates DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04','2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08','2013-01-09', '2013-01-10'],dtype='datetime64[ns]', freq='D') df = pd.DataFrame(np.random.randn(10,4),index=dates,columns=['A','B','C','D']) df ABCD2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202	-0.557050	0.778106
0.103394	-1.051044	-0.413054	0.268955
0.174730	2.056007	1.781379	1.643397
-0.950517	-0.226887	-0.097138	-0.442010
0.076178	-0.518970	1.142290	-0.952401
1.371702	-1.028873	-1.470106	-0.113098
0.126720	-0.251519	-2.212507	1.050036
-1.246918	1.530266	1.761499	0.940741
0.941099	-2.420932	1.927863	-0.549143
1.951555	-0.264012	-0.171690	0.869293

#索引 df['A'] 2013-01-01 0.754077 2013-01-02 0.103394 2013-01-03 0.174730 2013-01-04 -0.950517 2013-01-05 0.076178 2013-01-06 1.371702 2013-01-07 0.126720 2013-01-08 -1.246918 2013-01-09 0.941099 2013-01-10 1.951555 Freq: D, Name: A, dtype: float64 df.A 2013-01-01 0.754077 2013-01-02 0.103394 2013-01-03 0.174730 2013-01-04 -0.950517 2013-01-05 0.076178 2013-01-06 1.371702 2013-01-07 0.126720 2013-01-08 -1.246918 2013-01-09 0.941099 2013-01-10 1.951555 Freq: D, Name: A, dtype: float64 df['A']['2013-01-01'] # 先列后行 0.75407705661157032 df.A['2013-01-01'] 0.75407705661157032 df[['A','C']] AC2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.557050
0.103394	-0.413054
0.174730	1.781379
-0.950517	-0.097138
0.076178	1.142290
1.371702	-1.470106
0.126720	-2.212507
-1.246918	1.761499
0.941099	1.927863
1.951555	-0.171690

Pandas專用的數(shù)據(jù)訪問方式 — .loc 通過(guò)自定義索引獲取數(shù)據(jù) #選取某行 df.loc['2013-01-01'] A 0.754077 B -0.346202 C -0.557050 D 0.778106 Name: 2013-01-01 00:00:00, dtype: float64 #選取某列 df.loc[:,'A'] 2013-01-01 0.754077 2013-01-02 0.103394 2013-01-03 0.174730 2013-01-04 -0.950517 2013-01-05 0.076178 2013-01-06 1.371702 2013-01-07 0.126720 2013-01-08 -1.246918 2013-01-09 0.941099 2013-01-10 1.951555 Freq: D, Name: A, dtype: float64 # 選取特定值 df.loc['2013-01-01','A'] # 先行后列 0.75407705661157032 # 選取指定的行/列 df.loc[[dates[0],dates[2]],:] # 指定行 ABCD2013-01-012013-01-03

0.754077	-0.346202	-0.557050	0.778106
0.174730	2.056007	1.781379	1.643397

df.loc[:,['A','B']] # 指定列 AB2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202
0.103394	-1.051044
0.174730	2.056007
-0.950517	-0.226887
0.076178	-0.518970
1.371702	-1.028873
0.126720	-0.251519
-1.246918	1.530266
0.941099	-2.420932
1.951555	-0.264012

df.loc[[dates[0],dates[2]],['A','B']] # 指定行列 AB2013-01-012013-01-03

0.754077	-0.346202
0.174730	2.056007

# 切片 df.loc['2013-01-01':'2013-01-04',:] # 對(duì)行切片 ABCD2013-01-012013-01-022013-01-032013-01-04

0.754077	-0.346202	-0.557050	0.778106
0.103394	-1.051044	-0.413054	0.268955
0.174730	2.056007	1.781379	1.643397
-0.950517	-0.226887	-0.097138	-0.442010

df.loc[:,'A':'C'] # 對(duì)列切片 ABC2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202	-0.557050
0.103394	-1.051044	-0.413054
0.174730	2.056007	1.781379
-0.950517	-0.226887	-0.097138
0.076178	-0.518970	1.142290
1.371702	-1.028873	-1.470106
0.126720	-0.251519	-2.212507
-1.246918	1.530266	1.761499
0.941099	-2.420932	1.927863
1.951555	-0.264012	-0.171690

# 切片選取連續(xù)區(qū)塊。行，列。左開右閉 df.loc['2013-01-01':'2013-01-04','A':'C'] ABC2013-01-012013-01-022013-01-032013-01-04

0.754077	-0.346202	-0.557050
0.103394	-1.051044	-0.413054
0.174730	2.056007	1.781379
-0.950517	-0.226887	-0.097138

.iloc 通過(guò)默認(rèn)索引獲取數(shù)據(jù)

# 選取某行 df.iloc[3] A -0.950517 B -0.226887 C -0.097138 D -0.442010 Name: 2013-01-04 00:00:00, dtype: float64 # 選取某列 df.iloc[:,2] 2013-01-01 -0.557050 2013-01-02 -0.413054 2013-01-03 1.781379 2013-01-04 -0.097138 2013-01-05 1.142290 2013-01-06 -1.470106 2013-01-07 -2.212507 2013-01-08 1.761499 2013-01-09 1.927863 2013-01-10 -0.171690 Freq: D, Name: C, dtype: float64 # 選取特定值: df.iloc[1,2] -0.41305425875508139 # 選取指定的行/列 df.iloc[[1,2,4],:] # 指定行 ABCD2013-01-022013-01-032013-01-05

0.103394	-1.051044	-0.413054	0.268955
0.174730	2.056007	1.781379	1.643397
0.076178	-0.518970	1.142290	-0.952401

df.iloc[:,[0,2]] # 指定列 AC2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.557050
0.103394	-0.413054
0.174730	1.781379
-0.950517	-0.097138
0.076178	1.142290
1.371702	-1.470106
0.126720	-2.212507
-1.246918	1.761499
0.941099	1.927863
1.951555	-0.171690

df.iloc[[1,2,4],[0,2]] # 指定行列，先行后列 AC2013-01-022013-01-032013-01-05

0.103394	-0.413054
0.174730	1.781379
0.076178	1.142290

# 切片 df.iloc[1:3,:] # 對(duì)行切片: ABCD2013-01-022013-01-03

0.103394	-1.051044	-0.413054	0.268955
0.174730	2.056007	1.781379	1.643397

df.iloc[:,1:3] # 對(duì)列切片: BC2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

-0.346202	-0.557050
-1.051044	-0.413054
2.056007	1.781379
-0.226887	-0.097138
-0.518970	1.142290
-1.028873	-1.470106
-0.251519	-2.212507
1.530266	1.761499
-2.420932	1.927863
-0.264012	-0.171690

df.iloc[3:5,0:2] # 切片選取連續(xù)區(qū)塊。行，列。左開右閉 AB2013-01-042013-01-05

-0.950517	-0.226887
0.076178	-0.518970

Boolean索引

# 通過(guò)某列選擇數(shù)據(jù): df[df.A > 0] ABCD2013-01-012013-01-022013-01-032013-01-052013-01-062013-01-072013-01-092013-01-10

0.754077	-0.346202	-0.557050	0.778106
0.103394	-1.051044	-0.413054	0.268955
0.174730	2.056007	1.781379	1.643397
0.076178	-0.518970	1.142290	-0.952401
1.371702	-1.028873	-1.470106	-0.113098
0.126720	-0.251519	-2.212507	1.050036
0.941099	-2.420932	1.927863	-0.549143
1.951555	-0.264012	-0.171690	0.869293

# 通過(guò)where選擇數(shù)據(jù): b = df[df > 0] b ABCD2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	NaN	NaN	0.778106
0.103394	NaN	NaN	0.268955
0.174730	2.056007	1.781379	1.643397
NaN	NaN	NaN	NaN
0.076178	NaN	1.142290	NaN
1.371702	NaN	NaN	NaN
0.126720	NaN	NaN	1.050036
NaN	1.530266	1.761499	0.940741
0.941099	NaN	1.927863	NaN
1.951555	NaN	NaN	0.869293

type(b['A']['2013-01-01']) numpy.float64 # 通過(guò) isin() 過(guò)濾數(shù)據(jù): df2 = df.copy() df2['E'] = ['one', 'one','two','three','four','three','five','four','three','five'] df2 ABCDE2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202	-0.557050	0.778106	one
0.103394	-1.051044	-0.413054	0.268955	one
0.174730	2.056007	1.781379	1.643397	two
-0.950517	-0.226887	-0.097138	-0.442010	three
0.076178	-0.518970	1.142290	-0.952401	four
1.371702	-1.028873	-1.470106	-0.113098	three
0.126720	-0.251519	-2.212507	1.050036	five
-1.246918	1.530266	1.761499	0.940741	four
0.941099	-2.420932	1.927863	-0.549143	three
1.951555	-0.264012	-0.171690	0.869293	five

df2['E'].isin(['one','four']) 2013-01-01 True 2013-01-02 True 2013-01-03 False 2013-01-04 False 2013-01-05 True 2013-01-06 False 2013-01-07 False 2013-01-08 True 2013-01-09 False 2013-01-10 False Freq: D, Name: E, dtype: bool df2[df2['E'].isin(['one','four'])] ABCDE2013-01-012013-01-022013-01-052013-01-08

0.754077	-0.346202	-0.557050	0.778106	one
0.103394	-1.051044	-0.413054	0.268955	one
0.076178	-0.518970	1.142290	-0.952401	four
-1.246918	1.530266	1.761499	0.940741	four

增 Create

s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6)) s1 2013-01-02 1 2013-01-03 2 2013-01-04 3 2013-01-05 4 2013-01-06 5 2013-01-07 6 Freq: D, dtype: int64 # 新增一列數(shù)據(jù) df2['F'] = s1 df2 ABCDEF2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202	-0.557050	0.778106	one	NaN
0.103394	-1.051044	-0.413054	0.268955	one	1.0
0.174730	2.056007	1.781379	1.643397	two	2.0
-0.950517	-0.226887	-0.097138	-0.442010	three	3.0
0.076178	-0.518970	1.142290	-0.952401	four	4.0
1.371702	-1.028873	-1.470106	-0.113098	three	5.0
0.126720	-0.251519	-2.212507	1.050036	five	6.0
-1.246918	1.530266	1.761499	0.940741	four	NaN
0.941099	-2.420932	1.927863	-0.549143	three	NaN
1.951555	-0.264012	-0.171690	0.869293	five	NaN

改 Update

# 更新一列值 df2.loc[:,'D'] 2013-01-01 0.778106 2013-01-02 0.268955 2013-01-03 1.643397 2013-01-04 -0.442010 2013-01-05 -0.952401 2013-01-06 -0.113098 2013-01-07 1.050036 2013-01-08 0.940741 2013-01-09 -0.549143 2013-01-10 0.869293 Freq: D, Name: D, dtype: float64 df2.loc[:,'D'] = 5 df2 ABCDEF2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202	-0.557050	5	one	NaN
0.103394	-1.051044	-0.413054	5	one	1.0
0.174730	2.056007	1.781379	5	two	2.0
-0.950517	-0.226887	-0.097138	5	three	3.0
0.076178	-0.518970	1.142290	5	four	4.0
1.371702	-1.028873	-1.470106	5	three	5.0
0.126720	-0.251519	-2.212507	5	five	6.0
-1.246918	1.530266	1.761499	5	four	NaN
0.941099	-2.420932	1.927863	5	three	NaN
1.951555	-0.264012	-0.171690	5	five	NaN

df2.iloc[1,3] 5 df2.iloc[1,3] = 10.1 df2 ABCDEF2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202	-0.557050	5.0	one	NaN
0.103394	-1.051044	-0.413054	10.1	one	1.0
0.174730	2.056007	1.781379	5.0	two	2.0
-0.950517	-0.226887	-0.097138	5.0	three	3.0
0.076178	-0.518970	1.142290	5.0	four	4.0
1.371702	-1.028873	-1.470106	5.0	three	5.0
0.126720	-0.251519	-2.212507	5.0	five	6.0
-1.246918	1.530266	1.761499	5.0	four	NaN
0.941099	-2.420932	1.927863	5.0	three	NaN
1.951555	-0.264012	-0.171690	5.0	five	NaN

# 通過(guò)where更新 df3 = df.copy() df3[df3 > 0] = -df3 df3 ABCD2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

-0.754077	-0.346202	-0.557050	-0.778106
-0.103394	-1.051044	-0.413054	-0.268955
-0.174730	-2.056007	-1.781379	-1.643397
-0.950517	-0.226887	-0.097138	-0.442010
-0.076178	-0.518970	-1.142290	-0.952401
-1.371702	-1.028873	-1.470106	-0.113098
-0.126720	-0.251519	-2.212507	-1.050036
-1.246918	-1.530266	-1.761499	-0.940741
-0.941099	-2.420932	-1.927863	-0.549143
-1.951555	-0.264012	-0.171690	-0.869293

總結(jié)

以上是生活随笔為你收集整理的Pandas基础复习-DataFrame的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： Java中的泛型方法
下一篇： Codeforces 861 B Whi