日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

04_pandas字符串函数;数据合并concat、merge;分组groupby;Reshaping;Pivot tables;时间处理(date_range、tz_localize等)

發(fā)布時(shí)間:2024/9/27 编程问答 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 04_pandas字符串函数;数据合并concat、merge;分组groupby;Reshaping;Pivot tables;时间处理(date_range、tz_localize等) 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

字符串函數(shù),Series的lower()函數(shù)

Series在str屬性中提供了一組字符串處理方法,可以方便地對(duì)數(shù)組中的每個(gè)元素進(jìn)行操作,如下面的代碼片段所示。請(qǐng)注意,str中的模式匹配通常默認(rèn)使用正則表達(dá)式(在某些情況下總是使用正則表達(dá)式)

import numpy as np import pandas as pds = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat']) print(s.str.lower())

輸出結(jié)果為:

0 a 1 b 2 c 3 aaba 4 baca 5 NaN 6 caba 7 dog 8 cat dtype: object

數(shù)據(jù)合并

concat
import numpy as np import pandas as pddf = pd.DataFrame(np.random.randn(10,4)) print(df)print("------------------------------------") print(df[:3]) #pieces = [df[]] print("------------------------------------") print(df[3:7]) print("------------------------------------") print(df[7:])pieces = [df[:3],df[3:7],df[7:]] print("------------------------------------") print(pieces) print(pd.concat(pieces))

輸出結(jié)果:

0 1 2 3 0 1.190317 0.751029 1.628000 -0.923804 1 0.926196 1.644827 -1.005915 -0.153604 2 -1.082964 -0.684693 -0.087294 0.707919 3 -0.418695 2.392404 1.020161 0.928821 4 0.798035 -0.458987 -0.612861 0.589815 5 0.749647 -0.939293 -1.883342 -1.408095 6 0.045482 2.362426 -0.792240 -0.127324 7 0.881938 -1.667338 -0.147447 0.529441 8 -1.768780 -1.513335 -0.014616 0.373453 9 -0.553334 -0.066471 -0.367330 -0.815094 ------------------------------------0 1 2 3 0 1.190317 0.751029 1.628000 -0.923804 1 0.926196 1.644827 -1.005915 -0.153604 2 -1.082964 -0.684693 -0.087294 0.707919 ------------------------------------0 1 2 3 3 -0.418695 2.392404 1.020161 0.928821 4 0.798035 -0.458987 -0.612861 0.589815 5 0.749647 -0.939293 -1.883342 -1.408095 6 0.045482 2.362426 -0.792240 -0.127324 ------------------------------------0 1 2 3 7 0.881938 -1.667338 -0.147447 0.529441 8 -1.768780 -1.513335 -0.014616 0.373453 9 -0.553334 -0.066471 -0.367330 -0.815094 ------------------------------------ [ 0 1 2 3 0 1.190317 0.751029 1.628000 -0.923804 1 0.926196 1.644827 -1.005915 -0.153604 2 -1.082964 -0.684693 -0.087294 0.707919, 0 1 2 3 3 -0.418695 2.392404 1.020161 0.928821 4 0.798035 -0.458987 -0.612861 0.589815 5 0.749647 -0.939293 -1.883342 -1.408095 6 0.045482 2.362426 -0.792240 -0.127324, 0 1 2 3 7 0.881938 -1.667338 -0.147447 0.529441 8 -1.768780 -1.513335 -0.014616 0.373453 9 -0.553334 -0.066471 -0.367330 -0.815094]0 1 2 3 0 1.190317 0.751029 1.628000 -0.923804 1 0.926196 1.644827 -1.005915 -0.153604 2 -1.082964 -0.684693 -0.087294 0.707919 3 -0.418695 2.392404 1.020161 0.928821 4 0.798035 -0.458987 -0.612861 0.589815 5 0.749647 -0.939293 -1.883342 -1.408095 6 0.045482 2.362426 -0.792240 -0.127324 7 0.881938 -1.667338 -0.147447 0.529441 8 -1.768780 -1.513335 -0.014616 0.373453 9 -0.553334 -0.066471 -0.367330 -0.815094

merge

import numpy as np import pandas as pdleft = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]}) print(left)right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]}) print(right) print(pd.merge(left,right,on='key'))

輸出結(jié)果為:

key lval 0 foo 1 1 foo 2key rval 0 foo 4 1 foo 5key lval rval 0 foo 1 4 1 foo 1 5 2 foo 2 4 3 foo 2 5

分組

groupby

按照指定列進(jìn)行分組,有點(diǎn)類似sql語(yǔ)句里面的分組的概念的樣子

import numpy as np import pandas as pddf = pd.DataFrame({'A':['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],'B':['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],'C':np.random.randn(8),'D':np.random.randn(8)}) print(df) print("-----------------------------------") print(df.groupby('A').sum()) print("-----------------------------------") print(df.groupby(['A', 'B']).sum())

輸出結(jié)果為:

A B C D 0 foo one -0.411487 -0.908131 1 bar one 0.803172 -0.093416 2 foo two 0.079114 -0.594352 3 bar three -1.423867 -0.025747 4 foo two 0.832108 0.818305 5 bar two 0.551068 -0.859953 6 foo one -1.052481 -0.220297 7 foo three -2.639817 0.402972 -----------------------------------C D A bar -0.069626 -0.979116 foo -3.192563 -0.501502 -----------------------------------C D A B bar one 0.803172 -0.093416three -1.423867 -0.025747two 0.551068 -0.859953 foo one -1.463968 -1.128427three -2.639817 0.402972two 0.911223 0.223953

Reshaping

import numpy as np import pandas as pdtuples = list(zip(*[['bar', 'bar', 'baz', 'baz','foo', 'foo', 'qux', 'qux'],['one', 'two', 'one', 'two','one', 'two', 'one', 'two']]))index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) print(index) df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])df2 = df[:4] print(df2) print("---------df2.stack()------------------") stacked = df2.stack() print(stacked)print("---------stacked.unstack()------------") print(stacked.unstack())print("---------stacked.unstack(1)-----------") print(stacked.unstack(1))print("---------stacked.unstack(0)-----------") print(stacked.unstack(0))

輸出結(jié)果為 :

MultiIndex([('bar', 'one'),('bar', 'two'),('baz', 'one'),('baz', 'two'),('foo', 'one'),('foo', 'two'),('qux', 'one'),('qux', 'two')],names=['first', 'second'])A B first second bar one 0.189887 0.637367two -0.341858 -0.895612 baz one 0.517839 -0.798281two -0.712129 -1.355618 ---------df2.stack()------------------ first second bar one A 0.189887B 0.637367two A -0.341858B -0.895612 baz one A 0.517839B -0.798281two A -0.712129B -1.355618 dtype: float64 ---------stacked.unstack()------------A B first second bar one 0.189887 0.637367two -0.341858 -0.895612 baz one 0.517839 -0.798281two -0.712129 -1.355618 ---------stacked.unstack(1)----------- second one two first bar A 0.189887 -0.341858B 0.637367 -0.895612 baz A 0.517839 -0.712129B -0.798281 -1.355618 ---------stacked.unstack(0)----------- first bar baz second one A 0.189887 0.517839B 0.637367 -0.798281 two A -0.341858 -0.712129B -0.895612 -1.355618

Pivot tables

import numpy as np import pandas as pddf = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 3,'B':['A','B','C'] * 4,'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,'D':np.random.randn(12),'E':np.random.randn(12)})print(df) print("-------------------------------") print(pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C']))

輸出結(jié)果為:

A B C D E 0 one A foo -0.282109 1.696844 1 one B foo 0.715732 0.283795 2 two C foo 0.889333 0.621878 3 three A bar -1.065137 1.184847 4 one B bar 0.420288 -0.299934 5 one C bar -1.269725 -1.261542 6 two A foo 1.142230 1.887502 7 three B foo -0.456574 0.650669 8 one C foo -0.146470 -0.307011 9 one A bar 0.944573 0.967164 10 two B bar 0.432492 -0.554618 11 three C bar -1.928619 -1.158268 ------------------------------- C bar foo A B one A 0.944573 -0.282109B 0.420288 0.715732C -1.269725 -0.146470 three A -1.065137 NaNB NaN -0.456574C -1.928619 NaN two A NaN 1.142230B 0.432492 NaNC NaN 0.889333

時(shí)間處理

panda具有在頻率轉(zhuǎn)換期間執(zhí)行重采樣操作的簡(jiǎn)單、強(qiáng)大和高效的功能(例如,,將第二個(gè)數(shù)據(jù)轉(zhuǎn)換為5分鐘數(shù)據(jù))。這在金融應(yīng)用程序中非常常見(jiàn),但不限于此。

import numpy as np import pandas as pdrng = pd.date_range('1/1/2012',periods=100,freq='S') ts = pd.Series(np.random.randint(0,500,len(rng)),index=rng) print(ts.resample('5Min').sum())print("---------------------date_range-------------------------")rng = pd.date_range('3/6/2012 00:00',periods=5,freq='D') ts = pd.Series(np.random.randn(len(rng)),rng) print(ts)print("----------------------tz_localize-----------------------") ts_utc = ts.tz_localize('UTC') print(ts_utc)print("------------轉(zhuǎn)換成其它時(shí)區(qū)的值---------------------------") print(ts_utc.tz_convert('US/Eastern'))print("------------在時(shí)間跨度表示之間進(jìn)行轉(zhuǎn)換-------------------") rng = pd.date_range('1/1/2012',periods=5,freq='M') ts = pd.Series(np.random.randn(len(rng)),index=rng) print(ts)print("----------------to_period------------------------------") ps = ts.to_period() print(ts)print("----------------to_timestamp---------------------------") print(ps.to_timestamp())print("------------------------------------------------------") prng = pd.period_range('1990Q1', '2000Q4', freq='Q-NOV') ts = pd.Series(np.random.randn(len(prng)), prng) ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9 print(ts.head())

輸出結(jié)果為:

2012-01-01 24102 Freq: 5T, dtype: int32 ---------------------date_range------------------------- 2012-03-06 0.059085 2012-03-07 0.216838 2012-03-08 -1.465363 2012-03-09 -0.349098 2012-03-10 -0.818129 Freq: D, dtype: float64 ----------------------tz_localize----------------------- 2012-03-06 00:00:00+00:00 0.059085 2012-03-07 00:00:00+00:00 0.216838 2012-03-08 00:00:00+00:00 -1.465363 2012-03-09 00:00:00+00:00 -0.349098 2012-03-10 00:00:00+00:00 -0.818129 Freq: D, dtype: float64 ------------轉(zhuǎn)換成其它時(shí)區(qū)的值--------------------------- 2012-03-05 19:00:00-05:00 0.059085 2012-03-06 19:00:00-05:00 0.216838 2012-03-07 19:00:00-05:00 -1.465363 2012-03-08 19:00:00-05:00 -0.349098 2012-03-09 19:00:00-05:00 -0.818129 Freq: D, dtype: float64 ------------在時(shí)間跨度表示之間進(jìn)行轉(zhuǎn)換------------------- 2012-01-31 -0.682776 2012-02-29 0.895222 2012-03-31 -0.162116 2012-04-30 -1.175630 2012-05-31 -0.936218 Freq: M, dtype: float64 ----------------to_period------------------------------ 2012-01-31 -0.682776 2012-02-29 0.895222 2012-03-31 -0.162116 2012-04-30 -1.175630 2012-05-31 -0.936218 Freq: M, dtype: float64 ----------------to_timestamp--------------------------- 2012-01-01 -0.682776 2012-02-01 0.895222 2012-03-01 -0.162116 2012-04-01 -1.175630 2012-05-01 -0.936218 Freq: MS, dtype: float64 ------------------------------------------------------ 1990-03-01 09:00 1.847485 1990-06-01 09:00 -0.909369 1990-09-01 09:00 1.381791 1990-12-01 09:00 0.997901 1991-03-01 09:00 1.470387 Freq: H, dtype: float64

Categoricals

在pandas的DataFrame中包括categorical 數(shù)據(jù).

import numpy as np import pandas as pddf = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6],"raw_grade":['a','b','b','a','a','e']})print("-----------將原始等級(jí)轉(zhuǎn)換為分類數(shù)據(jù)類型-------------") df["grade"] = df["raw_grade"].astype("category") print(df["grade"])# 重新命名這個(gè)分類范疇為更有意義的名字 print("-------重新命名這個(gè)分類范疇為更有意義的名字----------") df["grade"].cat.categories = ["very good", "good", "very bad"] df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium","good","very good"]) print(df["grade"])print("-----按照分類中的[very bad, bad, medium, good, very good]范疇進(jìn)行排序----") print(df.sort_values(by="grade"))print("-----按照分類中的[very bad, bad, medium, good, very good]范疇進(jìn)行排序,并且顯示空數(shù)據(jù)的值----") print(df.groupby('grade').size())

輸出結(jié)果:

-----------將原始等級(jí)轉(zhuǎn)換為分類數(shù)據(jù)類型------------- 0 a 1 b 2 b 3 a 4 a 5 e Name: grade, dtype: category Categories (3, object): [a, b, e] -------重新命名這個(gè)分類范疇為更有意義的名字---------- 0 very good 1 good 2 good 3 very good 4 very good 5 very bad Name: grade, dtype: category Categories (5, object): [very bad, bad, medium, good, very good] -----按照分類中的[very bad, bad, medium, good, very good]范疇進(jìn)行排序----id raw_grade grade 5 6 e very bad 1 2 b good 2 3 b good 0 1 a very good 3 4 a very good 4 5 a very good -----按照分類中的[very bad, bad, medium, good, very good]范疇進(jìn)行排序,并且顯示空數(shù)據(jù)的值---- grade very bad 1 bad 0 medium 0 good 2 very good 3 dtype: int64

總結(jié)

以上是生活随笔為你收集整理的04_pandas字符串函数;数据合并concat、merge;分组groupby;Reshaping;Pivot tables;时间处理(date_range、tz_localize等)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。