日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程语言 > python >内容正文

python

python评估不平衡数据集_Python Pandas:平衡不平衡的数据集(用于面板分析)

發(fā)布時(shí)間:2023/12/10 python 38 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python评估不平衡数据集_Python Pandas:平衡不平衡的数据集(用于面板分析) 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

I know this might be easy to do. I can do it in Stata but I'm trying to move to Python.

I have a big dataset that it's unbalance. It looks like this:

And I need to get a dataset as follows:

Any guidance it's welcome. Thanks a lot!

解決方案

one way is to set 'year' as another level of index with set_index, reindex using pd.MultiIndex.from_product and reset_index the data from 'year' as a column.

Example dataframe with the same structure:

import pandas as pd

df = pd.DataFrame( {'year':[2003,2004,2002,2004,2005,2006],

'city_code':['a']*2+['b']*4,

'total_tax':pd.np.random.randint(100,1000,6)},

index=pd.Index(data=[9]*2+[54]*4,name='id_inf'))

print(df)

city_code total_tax year

id_inf

9 a 417 2003

9 a 950 2004

54 b 801 2002

54 b 218 2004

54 b 886 2005

54 b 855 2006

Now you can create the df_balanced with the method:

df_balanced = (df.set_index('year',append=True)

.reindex(pd.MultiIndex.from_product([df.index.unique(),

range(df.year.min(),df.year.max()+1)],

names=['id_inf','year']))

.reset_index(level=1))

And you get:

print (df_balanced)

year city_code total_tax

id_inf

9 2002 NaN NaN

9 2003 a 417.0

9 2004 a 950.0

9 2005 NaN NaN

9 2006 NaN NaN

54 2002 b 801.0

54 2003 NaN NaN

54 2004 b 218.0

54 2005 b 886.0

54 2006 b 855.0

To fill the NaN, different methods but here two ways. For the column 'city_code', you can use groupby and transform with max to get the value and for the column 'total_tax', just fillna with 0 such as:

df_balanced['city_code'] = df_balanced.groupby(level=0)['city_code'].transform(max)

df_balanced['total_tax'] = df_balanced['total_tax'].fillna(0)

print (df_balanced)

year city_code total_tax

id_inf

9 2002 a 0.0

9 2003 a 417.0

9 2004 a 950.0

9 2005 a 0.0

9 2006 a 0.0

54 2002 b 801.0

54 2003 b 0.0

54 2004 b 218.0

54 2005 b 886.0

54 2006 b 855.0

總結(jié)

以上是生活随笔為你收集整理的python评估不平衡数据集_Python Pandas:平衡不平衡的数据集(用于面板分析)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。