日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Pandas基础复习-DataFrame

發布時間:2025/6/15 编程问答 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Pandas基础复习-DataFrame 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

數據類型-DataFrame

  • DataFrame是由多個Series數據列組成的表格數據類型,每行Series值都增加了一個共用的索引
  • 既有行索引,又有列索引
    • 行索引,表明不同行,橫向索引,叫index,0軸,axis=0
    • 列索引,表名不同列,縱向索引,叫columns,1軸,axis=1
  • DataFrame數據類型可視為:二維 帶標簽 數組
  • 每列值的類型可以不同
  • 基本操作類似Series,依據行列索引操作
  • 常用于表達二維數據,但也可以表達多維數據(Dataframe嵌套,極少用)

DataFrame數據類型創建

Python list列表 創建DataFrame

import pandas as pddf = pd.DataFrame([True, 1, 2.3, 'a', '你好']) # 1維 df 001234
True
1
2.3
a
你好
df = pd.DataFrame([[True,1,2.3,'a','你好'],[1,2,3,4,5]]) #2維 df 0123401
True12.3a你好
123.045
# 3維,不建議 df = pd.DataFrame([[[True,1,2.3,'a','你好'],[1,2,3,4,5]],[[True,1,2.3,'a','你好'],[1,2,3,4,5]]]) df 0101
[True, 1, 2.3, a, 你好][1, 2, 3, 4, 5]
[True, 1, 2.3, a, 你好][1, 2, 3, 4, 5]

Python 字典 創建DataFrame

df = pd.DataFrame({'one':[1,2,3,4],'two':[9,8,7,6]}) df onetwo0123
19
28
37
46
# 自定義行索引 df = pd.DataFrame({'one':[1,2,3,4],'two':[9,8,7,6]},index = ['a','b','c','d']) df onetwoabcd
19
28
37
46
df = pd.DataFrame({'A' : 1,'B' : 2.3,'C' : ['x','y',5] #需要多行 }) df ABC012
12.3x
12.3y
12.35
dt = {'one' : pd.Series([1,2,3],index=['a','b','c']),'two' : pd.Series([9,8,7,6],index=['a','b','c','d',]) } dt {'one': a 1b 2c 3dtype: int64, 'two': a 9b 8c 7d 6dtype: int64} # one two自動列索引,abcd自動行索引.每個元素對應DataFrame的一列,每個元素內的鍵值對應一行 d = pd.DataFrame(dt) d onetwoabcd
1.09
2.08
3.07
NaN6
# 數據根據行列索引自動補齊 d_2 = pd.DataFrame(dt,index=['b','c','d'],columns=['two','three']) d_2 twothreebcd
8NaN
7NaN
6NaN

ndarray數組 創建DataFrame

import numpy as npdf = pd.DataFrame(np.arange(10).reshape(2,5)) # 自動生成行/列索引 df 0123401
01234
56789
# 自定義行列索引 df = pd.DataFrame(np.random.randn(6,4),index=[1,2,3,4,5,6],columns=['a','b','c','d']) df abcd123456
0.2743400.2965070.7511980.763512
0.1811340.6753800.5536950.632163
-0.0597650.3477021.138297-0.143998
-1.370677-0.9516400.135964-0.665875
1.4906100.4205390.6287842.119896
-1.6697371.1677651.254722-0.948624

Series 創建DataFrame

e = pd.DataFrame([pd.Series([1,2,3]),pd.Series([9,8,7,6])],index=['a','b']) e 0123ab
1.02.03.0NaN
9.08.07.06.0

DataFrame屬性

di = {'姓名':['張三','李四','王五','趙六'],'性別':['男','女','女','男'],'年齡':[12,22,32,42],'地址':['北京','上海','廣州','深圳'] } di {'地址': ['北京', '上海', '廣州', '深圳'],'姓名': ['張三', '李四', '王五', '趙六'],'年齡': [12, 22, 32, 42],'性別': ['男', '女', '女', '男']} d = pd.DataFrame(di,index=['d1','d2','d3','d4']) d 地址姓名年齡性別d1d2d3d4
北京張三12
上海李四22
廣州王五32
深圳趙六42
d.head() # 顯示頭部幾行 地址姓名年齡性別d1d2d3d4
北京張三12
上海李四22
廣州王五32
深圳趙六42
d.tail(3) # 顯示末尾幾行 地址姓名年齡性別d2d3d4
上海李四22
廣州王五32
深圳趙六42
d.info() # 相關信息概覽 <class 'pandas.core.frame.DataFrame'> Index: 4 entries, d1 to d4 Data columns (total 4 columns): 地址 4 non-null object 姓名 4 non-null object 年齡 4 non-null int64 性別 4 non-null object dtypes: int64(1), object(3) memory usage: 160.0+ bytes d.shape # 行數 列數 (4, 4) d.dtypes # 列數據類型 地址 object 姓名 object 年齡 int64 性別 object dtype: object d.index # 獲取行索引 Index(['d1', 'd2', 'd3', 'd4'], dtype='object') d.columns # 獲取列索引 Index(['地址', '姓名', '年齡', '性別'], dtype='object') d.values # 獲取值 array([['北京', '張三', 12, '男'],['上海', '李四', 22, '女'],['廣州', '王五', 32, '女'],['深圳', '趙六', 42, '男']], dtype=object)

DataFrame查增改刪

查 Read

類list/ndarray數據訪問方式

dates = pd.date_range('20130101',periods=10) dates DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04','2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08','2013-01-09', '2013-01-10'],dtype='datetime64[ns]', freq='D') df = pd.DataFrame(np.random.randn(10,4),index=dates,columns=['A','B','C','D']) df ABCD2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10
0.754077-0.346202-0.5570500.778106
0.103394-1.051044-0.4130540.268955
0.1747302.0560071.7813791.643397
-0.950517-0.226887-0.097138-0.442010
0.076178-0.5189701.142290-0.952401
1.371702-1.028873-1.470106-0.113098
0.126720-0.251519-2.2125071.050036
-1.2469181.5302661.7614990.940741
0.941099-2.4209321.927863-0.549143
1.951555-0.264012-0.1716900.869293
#索引 df['A'] 2013-01-01 0.754077 2013-01-02 0.103394 2013-01-03 0.174730 2013-01-04 -0.950517 2013-01-05 0.076178 2013-01-06 1.371702 2013-01-07 0.126720 2013-01-08 -1.246918 2013-01-09 0.941099 2013-01-10 1.951555 Freq: D, Name: A, dtype: float64 df.A 2013-01-01 0.754077 2013-01-02 0.103394 2013-01-03 0.174730 2013-01-04 -0.950517 2013-01-05 0.076178 2013-01-06 1.371702 2013-01-07 0.126720 2013-01-08 -1.246918 2013-01-09 0.941099 2013-01-10 1.951555 Freq: D, Name: A, dtype: float64 df['A']['2013-01-01'] # 先列后行 0.75407705661157032 df.A['2013-01-01'] 0.75407705661157032 df[['A','C']] AC2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10
0.754077-0.557050
0.103394-0.413054
0.1747301.781379
-0.950517-0.097138
0.0761781.142290
1.371702-1.470106
0.126720-2.212507
-1.2469181.761499
0.9410991.927863
1.951555-0.171690
Pandas專用的數據訪問方式 — .loc 通過自定義索引獲取數據 #選取某行 df.loc['2013-01-01'] A 0.754077 B -0.346202 C -0.557050 D 0.778106 Name: 2013-01-01 00:00:00, dtype: float64 #選取某列 df.loc[:,'A'] 2013-01-01 0.754077 2013-01-02 0.103394 2013-01-03 0.174730 2013-01-04 -0.950517 2013-01-05 0.076178 2013-01-06 1.371702 2013-01-07 0.126720 2013-01-08 -1.246918 2013-01-09 0.941099 2013-01-10 1.951555 Freq: D, Name: A, dtype: float64 # 選取特定值 df.loc['2013-01-01','A'] # 先行后列 0.75407705661157032 # 選取指定的行/列 df.loc[[dates[0],dates[2]],:] # 指定行 ABCD2013-01-012013-01-03
0.754077-0.346202-0.5570500.778106
0.1747302.0560071.7813791.643397
df.loc[:,['A','B']] # 指定列 AB2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10
0.754077-0.346202
0.103394-1.051044
0.1747302.056007
-0.950517-0.226887
0.076178-0.518970
1.371702-1.028873
0.126720-0.251519
-1.2469181.530266
0.941099-2.420932
1.951555-0.264012
df.loc[[dates[0],dates[2]],['A','B']] # 指定行列 AB2013-01-012013-01-03
0.754077-0.346202
0.1747302.056007
# 切片 df.loc['2013-01-01':'2013-01-04',:] # 對行切片 ABCD2013-01-012013-01-022013-01-032013-01-04
0.754077-0.346202-0.5570500.778106
0.103394-1.051044-0.4130540.268955
0.1747302.0560071.7813791.643397
-0.950517-0.226887-0.097138-0.442010
df.loc[:,'A':'C'] # 對列切片 ABC2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10
0.754077-0.346202-0.557050
0.103394-1.051044-0.413054
0.1747302.0560071.781379
-0.950517-0.226887-0.097138
0.076178-0.5189701.142290
1.371702-1.028873-1.470106
0.126720-0.251519-2.212507
-1.2469181.5302661.761499
0.941099-2.4209321.927863
1.951555-0.264012-0.171690
# 切片選取連續區塊。行,列。左開右閉 df.loc['2013-01-01':'2013-01-04','A':'C'] ABC2013-01-012013-01-022013-01-032013-01-04
0.754077-0.346202-0.557050
0.103394-1.051044-0.413054
0.1747302.0560071.781379
-0.950517-0.226887-0.097138

.iloc 通過默認索引獲取數據

# 選取某行 df.iloc[3] A -0.950517 B -0.226887 C -0.097138 D -0.442010 Name: 2013-01-04 00:00:00, dtype: float64 # 選取某列 df.iloc[:,2] 2013-01-01 -0.557050 2013-01-02 -0.413054 2013-01-03 1.781379 2013-01-04 -0.097138 2013-01-05 1.142290 2013-01-06 -1.470106 2013-01-07 -2.212507 2013-01-08 1.761499 2013-01-09 1.927863 2013-01-10 -0.171690 Freq: D, Name: C, dtype: float64 # 選取特定值: df.iloc[1,2] -0.41305425875508139 # 選取指定的行/列 df.iloc[[1,2,4],:] # 指定行 ABCD2013-01-022013-01-032013-01-05
0.103394-1.051044-0.4130540.268955
0.1747302.0560071.7813791.643397
0.076178-0.5189701.142290-0.952401
df.iloc[:,[0,2]] # 指定列 AC2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10
0.754077-0.557050
0.103394-0.413054
0.1747301.781379
-0.950517-0.097138
0.0761781.142290
1.371702-1.470106
0.126720-2.212507
-1.2469181.761499
0.9410991.927863
1.951555-0.171690
df.iloc[[1,2,4],[0,2]] # 指定行列 ,先行后列 AC2013-01-022013-01-032013-01-05
0.103394-0.413054
0.1747301.781379
0.0761781.142290
# 切片 df.iloc[1:3,:] # 對行切片: ABCD2013-01-022013-01-03
0.103394-1.051044-0.4130540.268955
0.1747302.0560071.7813791.643397
df.iloc[:,1:3] # 對列切片: BC2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10
-0.346202-0.557050
-1.051044-0.413054
2.0560071.781379
-0.226887-0.097138
-0.5189701.142290
-1.028873-1.470106
-0.251519-2.212507
1.5302661.761499
-2.4209321.927863
-0.264012-0.171690
df.iloc[3:5,0:2] # 切片選取連續區塊。行,列。左開右閉 AB2013-01-042013-01-05
-0.950517-0.226887
0.076178-0.518970

Boolean索引

# 通過某列選擇數據: df[df.A > 0] ABCD2013-01-012013-01-022013-01-032013-01-052013-01-062013-01-072013-01-092013-01-10
0.754077-0.346202-0.5570500.778106
0.103394-1.051044-0.4130540.268955
0.1747302.0560071.7813791.643397
0.076178-0.5189701.142290-0.952401
1.371702-1.028873-1.470106-0.113098
0.126720-0.251519-2.2125071.050036
0.941099-2.4209321.927863-0.549143
1.951555-0.264012-0.1716900.869293
# 通過where選擇數據: b = df[df > 0] b ABCD2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10
0.754077NaNNaN0.778106
0.103394NaNNaN0.268955
0.1747302.0560071.7813791.643397
NaNNaNNaNNaN
0.076178NaN1.142290NaN
1.371702NaNNaNNaN
0.126720NaNNaN1.050036
NaN1.5302661.7614990.940741
0.941099NaN1.927863NaN
1.951555NaNNaN0.869293
type(b['A']['2013-01-01']) numpy.float64 # 通過 isin() 過濾數據: df2 = df.copy() df2['E'] = ['one', 'one','two','three','four','three','five','four','three','five'] df2 ABCDE2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10
0.754077-0.346202-0.5570500.778106one
0.103394-1.051044-0.4130540.268955one
0.1747302.0560071.7813791.643397two
-0.950517-0.226887-0.097138-0.442010three
0.076178-0.5189701.142290-0.952401four
1.371702-1.028873-1.470106-0.113098three
0.126720-0.251519-2.2125071.050036five
-1.2469181.5302661.7614990.940741four
0.941099-2.4209321.927863-0.549143three
1.951555-0.264012-0.1716900.869293five
df2['E'].isin(['one','four']) 2013-01-01 True 2013-01-02 True 2013-01-03 False 2013-01-04 False 2013-01-05 True 2013-01-06 False 2013-01-07 False 2013-01-08 True 2013-01-09 False 2013-01-10 False Freq: D, Name: E, dtype: bool df2[df2['E'].isin(['one','four'])] ABCDE2013-01-012013-01-022013-01-052013-01-08
0.754077-0.346202-0.5570500.778106one
0.103394-1.051044-0.4130540.268955one
0.076178-0.5189701.142290-0.952401four
-1.2469181.5302661.7614990.940741four

增 Create

s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6)) s1 2013-01-02 1 2013-01-03 2 2013-01-04 3 2013-01-05 4 2013-01-06 5 2013-01-07 6 Freq: D, dtype: int64 # 新增一列數據 df2['F'] = s1 df2 ABCDEF2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10
0.754077-0.346202-0.5570500.778106oneNaN
0.103394-1.051044-0.4130540.268955one1.0
0.1747302.0560071.7813791.643397two2.0
-0.950517-0.226887-0.097138-0.442010three3.0
0.076178-0.5189701.142290-0.952401four4.0
1.371702-1.028873-1.470106-0.113098three5.0
0.126720-0.251519-2.2125071.050036five6.0
-1.2469181.5302661.7614990.940741fourNaN
0.941099-2.4209321.927863-0.549143threeNaN
1.951555-0.264012-0.1716900.869293fiveNaN

改 Update

# 更新一列值 df2.loc[:,'D'] 2013-01-01 0.778106 2013-01-02 0.268955 2013-01-03 1.643397 2013-01-04 -0.442010 2013-01-05 -0.952401 2013-01-06 -0.113098 2013-01-07 1.050036 2013-01-08 0.940741 2013-01-09 -0.549143 2013-01-10 0.869293 Freq: D, Name: D, dtype: float64 df2.loc[:,'D'] = 5 df2 ABCDEF2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10
0.754077-0.346202-0.5570505oneNaN
0.103394-1.051044-0.4130545one1.0
0.1747302.0560071.7813795two2.0
-0.950517-0.226887-0.0971385three3.0
0.076178-0.5189701.1422905four4.0
1.371702-1.028873-1.4701065three5.0
0.126720-0.251519-2.2125075five6.0
-1.2469181.5302661.7614995fourNaN
0.941099-2.4209321.9278635threeNaN
1.951555-0.264012-0.1716905fiveNaN
df2.iloc[1,3] 5 df2.iloc[1,3] = 10.1 df2 ABCDEF2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10
0.754077-0.346202-0.5570505.0oneNaN
0.103394-1.051044-0.41305410.1one1.0
0.1747302.0560071.7813795.0two2.0
-0.950517-0.226887-0.0971385.0three3.0
0.076178-0.5189701.1422905.0four4.0
1.371702-1.028873-1.4701065.0three5.0
0.126720-0.251519-2.2125075.0five6.0
-1.2469181.5302661.7614995.0fourNaN
0.941099-2.4209321.9278635.0threeNaN
1.951555-0.264012-0.1716905.0fiveNaN
# 通過where更新 df3 = df.copy() df3[df3 > 0] = -df3 df3 ABCD2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10
-0.754077-0.346202-0.557050-0.778106
-0.103394-1.051044-0.413054-0.268955
-0.174730-2.056007-1.781379-1.643397
-0.950517-0.226887-0.097138-0.442010
-0.076178-0.518970-1.142290-0.952401
-1.371702-1.028873-1.470106-0.113098
-0.126720-0.251519-2.212507-1.050036
-1.246918-1.530266-1.761499-0.940741
-0.941099-2.420932-1.927863-0.549143
-1.951555-0.264012-0.171690-0.869293

總結

以上是生活随笔為你收集整理的Pandas基础复习-DataFrame的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。