當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

pandas之数值计算与统计

發(fā)布時(shí)間：2023/11/29 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 pandas之数值计算与统计小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

數(shù)值計(jì)算與統(tǒng)計(jì)

對(duì)于DataFrame來說，求和、最大、最小、平均等統(tǒng)計(jì)方法，默認(rèn)是按列進(jìn)行統(tǒng)計(jì)，即axis = 0，如果添加參數(shù)axis = 1則會(huì)按照行進(jìn)行統(tǒng)計(jì)。

如果存在空值，在統(tǒng)計(jì)時(shí)默認(rèn)會(huì)忽略空值，如果添加參數(shù)skipna = False，統(tǒng)計(jì)時(shí)不會(huì)忽略空值。

count()? 非NaN的元素個(gè)數(shù)
sum()? 和
mean()? 平均值
median()? 中位數(shù)
max()? 最大值
min()? 最小值
mode()眾數(shù)
std()? 標(biāo)準(zhǔn)差
var()? 方差
describe()：包括count()、mean()、std()、min()、25%、50%、75%、max()
skew()，樣本的偏度
kurt()，樣本的峰度

分位數(shù)

quantile(q=0.5,axis=0)，統(tǒng)計(jì)分位數(shù)，q確定位置，默認(rèn)為0.5，axix=0默認(rèn)按行統(tǒng)計(jì)，1按列統(tǒng)計(jì)??【適用于Seris和DataFrame】

計(jì)算邏輯：r = i + ( j - i ) * f

①將行或者列按數(shù)值大小升序排序，并計(jì)算位置pos = 1+(n-1)*q，其中n為行或列的長度，q為定義的參數(shù)

②根據(jù)pos確定 i 和 j，例如計(jì)算得pos=3.2，則i為第3個(gè)數(shù)，j為第4個(gè)數(shù)，f為pos的小數(shù)部分

dic = {'one':[1,3,2,5,4],'two':[2,4,3,6,5],'three':[3,7,5,6,4]} df = pd.DataFrame(dic,index=list('abcde')) print(df) print(df.quantile(0.1)) print(df.quantile([0.5,0.7])) # one two three # a 1 2 3 # b 3 4 7 # c 2 3 5 # d 5 6 6 # e 4 5 4 # one 1.4 # two 2.4 # three 3.4 # Name: 0.1, dtype: float64 # one two three # 0.5 3.0 4.0 5.0 # 0.7 3.8 4.8 5.8 分位數(shù)

以上以quantile(q=0.7)為例講解，按照列進(jìn)行統(tǒng)計(jì)，每列的長度為5

pos = 1 + ( 5 - 1 ) * 0.7 = 3.8，因此i為每列的第3位數(shù)，j為每列的第4位數(shù)，且f為3.8的小數(shù)部分即0.8

result_one = 3 + ( 4 - 3 ) * 0.8 = 3.8

result_two = 4 + ( 5 - 4 ) * 0.8 = 4.8

reslut_three = 5+ ( 6 - 5 ) * 0.8 = 5.8

上四分位和下四分位確定位置pos = (n-1)/4和pos=3* (n-1)/4，結(jié)果計(jì)算同為r = i + ( j - i ) * f

四分位參考https://blog.csdn.net/kevinelstri/article/details/52937236

累計(jì)值

cumsum() 累計(jì)和、cumprod() 累計(jì)積、cummax()累計(jì)最大值、cummin()累計(jì)最小值【適用于Seris和DataFrame】

dic = {'one':[1,3,2,5,4],'two':[2,4,3,6,5]} df = pd.DataFrame(dic,index=list('abcde')) df['one_cumsum'] = df['one'].cumsum() #相當(dāng)于增加一列 df['one_cumprod'] = df['one'].cumprod() df['two_cummax'] = df['two'].cummax() df['two_cummin'] = df['two'].cummin() print(df) # one two one_cumsum one_cumprod two_cummax two_cummin # a 1 2 1 1 2 2 # b 3 4 4 3 4 2 # c 2 3 6 6 4 2 # d 5 6 11 30 6 2 # e 4 5 15 120 6 2 累計(jì)和、累計(jì)積、累計(jì)最大值、累計(jì)最小值

唯一值

對(duì)序列進(jìn)行唯一值unique()之后生成的是一維數(shù)組?【適用于Seris】

s = pd.Series(list('abaefb')) print(s) sq = s.unique() print(sq,type(sq)) sq_s = pd.Series(sq) # 0 a # 1 b # 2 a # 3 e # 4 f # 5 b # dtype: object # ['a' 'b' 'e' 'f'] <class 'numpy.ndarray'> Seris的unique()

值計(jì)數(shù)

value_counts()，統(tǒng)計(jì)Seris中相同的值出現(xiàn)的次數(shù)，生成一個(gè)新的Seris，新Seris的index為原來的值，值為出現(xiàn)的次數(shù) 【適用于Seris】

參數(shù)：normalize=False, sort=True, ascending=False,bins=None, dropna=True，即默認(rèn)會(huì)將結(jié)果倒序排序

s = pd.Series(list('abaefb')) s_count = s.value_counts() print(s) print(s_count,type(s_count)) # 0 a # 1 b # 2 a # 3 e # 4 f # 5 b # dtype: object # a 2 # b 2 # f 1 # e 1 # dtype: int64 <class 'pandas.core.series.Series'> Seris的值計(jì)數(shù)

成員判斷

isin([ ])，成員要使用中括號(hào)括起來，判斷每個(gè)元素是否在中括號(hào)的元素中，生成的結(jié)果為布爾型的Seris或DataFrame? ?【適用于Seris和DataFrame】

# s = pd.Series(list('abced')) # df = pd.DataFrame(np.arange(6).reshape(2,3),columns=['a','b','c']) # print(s.isin(['a','b'])) # print(df.isin([1,2])) # 0 True # 1 True # 2 False # 3 False # 4 False # dtype: bool # a b c # 0 False True True # 1 False False False 成員判斷

轉(zhuǎn)載于:https://www.cnblogs.com/Forever77/p/11259146.html

總結(jié)

以上是生活随笔為你收集整理的pandas之数值计算与统计的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。