當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

读书笔记6pandas简单使用

發(fā)布時間：2025/4/16 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了读书笔记6pandas简单使用小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

一、序列Series，很像numpy中的array數(shù)組，可以由列表、元組、字典、numpy中的array來初始化

>>> from pandas import Series >>> s = Series([0.1, 1.2, 2.3, 3.4, 4.5]) >>> s 0 0.1 1 1.2 2 2.3 3 3.4 4 4.5 dtype: float64

2、序列也可以由標(biāo)簽組成，默認(rèn)是由數(shù)字表示。

>>> s = Series([0.1, 1.2, 2.3, 3.4, 4.5], index = [’a’,’b’,’c’,’d’,’e’]) >>> s a 0.1 b 1.2 c 2.3 d 3.4 e 4.5 dtype: float64

索引的話可以由數(shù)字、標(biāo)簽、真值表、切片

from pandas import Series s = Series([0.1, 1.2, 2.3, 3.4, 4.5], index = ['a','b','c','d','e']) s[1] Out[36]: 1.2 from pandas import Series s = Series([0.1, 1.2, 2.3, 3.4, 4.5], index = ['a','b','c','d','e']) print s[1],'\n' print s[1:4],'\n' print s[s>3],'\n' print s[[1,2,3]] 1.2 b 1.2 c 2.3 d 3.4 dtype: float64 d 3.4 e 4.5 dtype: float64 b 1.2 c 2.3 d 3.4 dtype: float64

二、序列的常用函數(shù)

1、head and tail來顯示頭部5行或末尾5行數(shù)據(jù)，也可以通過傳遞參數(shù)來修改顯示的行數(shù)

from pandas import Series s = Series([0.1, 1.2, 2.3, 3.4, 4.5], index = ['a','b','c','d','e']) print s.head(),'\n' print s.head(2)
a 0.1 b 1.2 c 2.3 d 3.4 e 4.5 dtype: float64 a 0.1 b 1.2 dtype: float64

2、isnull and notnull返回等長的序列，

3、describe返回序列的一些統(tǒng)計特性

from pandas import Series import numpy as np s=Series(np.arange(1.0,10)) s.describe() Out[43]: count 9.000000 mean 5.000000 std 2.738613 min 1.000000 25% 3.000000 50% 5.000000 75% 7.000000 max 9.000000 dtype: float64

4、unique and nunique，返回不重復(fù)的數(shù)據(jù)集或者重復(fù)的數(shù)據(jù)集

5、drop(labels) 刪除制定標(biāo)簽的數(shù)據(jù)，dropna()是刪除NaN數(shù)據(jù)

6、append(series) 添加數(shù)據(jù)

from pandas import Series import numpy as np s=Series(np.arange(1.0,10)) s2=Series([22,33,44,55]) print s.append(s2) ? 0 1.0 1 2.0 2 3.0 3 4.0 4 5.0 5 6.0 6 7.0 7 8.0 8 9.0 0 22.0 1 33.0 2 44.0 3 55.0 dtype: float64

7、replace(series,values) 將series數(shù)據(jù)集中的數(shù)據(jù)替換成values數(shù)據(jù)集

注意：這個替換是將替換后的數(shù)據(jù)返回，而不是在原來的數(shù)據(jù)集上做替換

from pandas import Series import numpy as np s=Series(np.arange(1.0,10)) s2=Series([22,33,44,55]) s3=s.append(s2) print s3.replace([2,5,8],[22,55,99]) s3 ? 0 1.0 1 22.0 2 3.0 3 4.0 4 55.0 5 6.0 6 7.0 7 99.0 8 9.0 0 22.0 1 33.0 2 44.0 3 55.0 dtype: float64 Out[51]: 0 1.0 1 2.0 2 3.0 3 4.0 4 5.0 5 6.0 6 7.0 7 8.0 8 9.0 0 22.0 1 33.0 2 44.0 3 55.0 dtype: float64

8、update(series)用series來更新，只更新匹配上標(biāo)簽的數(shù)據(jù)

注意：是在原來數(shù)據(jù)集上做更新

>>> s1 = Series(arange(1.0,4.0),index=[’a’,’b’,’c’]) >>> s1 a 1 b 2 c 3 dtype: float64 >>> s2 = Series(-1.0 * arange(1.0,4.0),index=[’c’,’d’,’e’]) >>> s1.update(s2) >>> s1 a 1 b 2 c -1 dtype: float64

9、數(shù)據(jù)框架，DataFrame，相當(dāng)于array上的二維數(shù)組，區(qū)別于array數(shù)組的地方時它可以是不同數(shù)據(jù)類型的數(shù)據(jù)組合在一起

from pandas import DataFrame a=np.array([[1,2],[3,4]]); df=DataFrame(a) df Out[52]:0 1 0 1 2 1 3 4

>>> df = DataFrame(array([[1,2],[3,4]]),columns=[’a’,’b’])
>>> df
a b
0 1 2
1 3 4

也可以指定行標(biāo)簽和列標(biāo)簽

>>> df = DataFrame(array([[1,2],[3,4]]), columns=[’dogs’,’cats’], index=[’Alice’,’Bob’]) >>> df dogs cats Alice 1 2 Bob 3 4

?10、也可以通過字典來初始化DataFrame

?11、也可以指定列標(biāo)簽

>>> df = DataFrame(array([[1,2],[3,4]]), columns=[’dogs’,’cats’], index=[’Alice’,’Bob’])
>>> df
dogs cats
Alice 1 2
Bob 3 4

?二、操作數(shù)據(jù)框架，工作目錄中有一個excel文件可以用，我的是score.xlsx

1、讀取數(shù)據(jù)

2、選擇列可以直接是列名或者列明組成的列表

?3、選擇行可以是列標(biāo)簽或者列標(biāo)簽組成的列表,也可以是數(shù)字切片、真值表

from pandas import read_excel score = read_excel('score.xlsx','Sheet1') score[:1] Out[20]: ?序號englishmathchinesephysicschemistrybiology0

1501	56	65	89	45	87	98

1501

from pandas import read_excel score = read_excel('score.xlsx','Sheet1') t=score[(score.english>60) & (score.english<70)] t Out[22]: ?序號englishmathchinesephysicschemistrybiology25

1503	65	78	68	86	78	87
1506	64	67	82	76	78	73

4、選擇行和列，需要使用ix[rowselector,colselector]

5、添加列跟字典用法差不多

>>> state_gdp_2012 = state_gdp[[’state’,’gdp_2012’]] >>> state_gdp_2012.head() state gdp_2012 0 Alabama 157272 1 Alaska 44732 2 Arizona 230641 3 Arkansas 93892 4 California 1751002 >>> state_gdp_2012[’gdp_growth_2012’] = state_gdp[’gdp_growth_2012’] >>> state_gdp_2012.head() state gdp_2012 gdp_growth_2012 0 Alabama 157272 1.2 1 Alaska 44732 1.1 2 Arizona 230641 2.6 3 Arkansas 93892 1.3

或者insert(location,column_name,series)

>>> state_gdp_2012 = state_gdp[[’state’,’gdp_2012’]]
>>> state_gdp_2012.insert(1,’gdp_growth_2012’,state_gdp[’gdp_growth_2012’])
>>> state_gdp_2012.head()
state gdp_growth_2012 gdp_2012
0 Alabama 1.2 157272
1 Alaska 1.1 44732
2 Arizona 2.6 230641
3 Arkansas 1.3 93892
4 California 3.5 1751002

6、修改數(shù)據(jù)

from pandas import read_excel score = read_excel('score.xlsx','Sheet1') print score[:3] score.ix[0,'english']=90 print score[:3]序號 english math chinese physics chemistry biology 0 1501 56 65 89 45 87 98 1 1502 45 65 89 78 98 89 2 1503 65 78 68 86 78 87序號 english math chinese physics chemistry biology 0 1501 90 65 89 45 87 98 1 1502 45 65 89 78 98 89 2 1503 65 78 68 86 78 87

7、刪除列，可以使用del關(guān)鍵字、pop(column)?方法、drop(list?of columns,axis=1)?

from pandas import Series from pandas import read_excel score = read_excel('score.xlsx','Sheet1') scorecopy = score.copy() print score[:2] score.pop('biology') print score[:2] ?序號 english math chinese physics chemistry biology 0 1501 56 65 89 45 87 98 1 1502 45 65 89 78 98 89序號 english math chinese physics chemistry 0 1501 56 65 89 45 87 1 1502 45 65 89 78 98

8、?dropna 刪除含有Nan的行或者列，and drop_duplicates

9、fillna(value=value )將所有的Nan數(shù)據(jù)替換成所附的值

>>> df = DataFrame(array([[1, nan],[nan, 2]]))
>>> df.columns = [’one’,’two’]
>>> replacements = {’one’:-1, ’two’:-2}
>>> df.fillna(value=replacements)
one two
0 1 -2
1 -1 2

?10、sort

>>> df = DataFrame(array([[1, 3],[1, 2],[3, 2],[2,1]]), columns=[’one’,’two’])
>>> df.sort(columns=’one’)
one two
0 1 3
1 1 2
3 2 1
2 3 2

>>> df.sort(columns=[’one’,’two’], ascending=[0,1])
one two
2 3 2
3 2 1
1 1 2
0 1 3

總結(jié)

以上是生活随笔為你收集整理的读书笔记6pandas简单使用的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Linux headtail命令
下一篇： OneZero第五周第二次站立会议（20