读书笔记6pandas简单使用
一、序列Series,很像numpy中的array數(shù)組,可以由列表、元組、字典、numpy中的array來初始化
>>> from pandas import Series >>> s = Series([0.1, 1.2, 2.3, 3.4, 4.5]) >>> s 0 0.1 1 1.2 2 2.3 3 3.4 4 4.5 dtype: float642、序列也可以由標(biāo)簽組成,默認(rèn)是由數(shù)字表示。
>>> s = Series([0.1, 1.2, 2.3, 3.4, 4.5], index = [’a’,’b’,’c’,’d’,’e’]) >>> s a 0.1 b 1.2 c 2.3 d 3.4 e 4.5 dtype: float64索引的話可以由數(shù)字、標(biāo)簽、真值表、切片
from pandas import Series s = Series([0.1, 1.2, 2.3, 3.4, 4.5], index = ['a','b','c','d','e']) s[1] Out[36]: 1.2 from pandas import Series s = Series([0.1, 1.2, 2.3, 3.4, 4.5], index = ['a','b','c','d','e']) print s[1],'\n' print s[1:4],'\n' print s[s>3],'\n' print s[[1,2,3]] 1.2 b 1.2 c 2.3 d 3.4 dtype: float64 d 3.4 e 4.5 dtype: float64 b 1.2 c 2.3 d 3.4 dtype: float64二、序列的常用函數(shù)
1、head and tail來顯示頭部5行或末尾5行數(shù)據(jù),也可以通過傳遞參數(shù)來修改顯示的行數(shù)
from pandas import Series s = Series([0.1, 1.2, 2.3, 3.4, 4.5], index = ['a','b','c','d','e']) print s.head(),'\n' print s.head(2)a 0.1 b 1.2 c 2.3 d 3.4 e 4.5 dtype: float64 a 0.1 b 1.2 dtype: float64
2、isnull and notnull返回等長的序列,
3、describe返回序列的一些統(tǒng)計特性
from pandas import Series import numpy as np s=Series(np.arange(1.0,10)) s.describe() Out[43]: count 9.000000 mean 5.000000 std 2.738613 min 1.000000 25% 3.000000 50% 5.000000 75% 7.000000 max 9.000000 dtype: float644、unique and nunique,返回不重復(fù)的數(shù)據(jù)集或者重復(fù)的數(shù)據(jù)集
5、drop(labels) 刪除制定標(biāo)簽的數(shù)據(jù),dropna()是刪除NaN數(shù)據(jù)
6、append(series) 添加數(shù)據(jù)
from pandas import Series import numpy as np s=Series(np.arange(1.0,10)) s2=Series([22,33,44,55]) print s.append(s2) ? 0 1.0 1 2.0 2 3.0 3 4.0 4 5.0 5 6.0 6 7.0 7 8.0 8 9.0 0 22.0 1 33.0 2 44.0 3 55.0 dtype: float647、replace(series,values) 將series數(shù)據(jù)集中的數(shù)據(jù)替換成values數(shù)據(jù)集
注意:這個替換是將替換后的數(shù)據(jù)返回,而不是在原來的數(shù)據(jù)集上做替換
from pandas import Series import numpy as np s=Series(np.arange(1.0,10)) s2=Series([22,33,44,55]) s3=s.append(s2) print s3.replace([2,5,8],[22,55,99]) s3 ? 0 1.0 1 22.0 2 3.0 3 4.0 4 55.0 5 6.0 6 7.0 7 99.0 8 9.0 0 22.0 1 33.0 2 44.0 3 55.0 dtype: float64 Out[51]: 0 1.0 1 2.0 2 3.0 3 4.0 4 5.0 5 6.0 6 7.0 7 8.0 8 9.0 0 22.0 1 33.0 2 44.0 3 55.0 dtype: float648、update(series)用series來更新,只更新匹配上標(biāo)簽的數(shù)據(jù)
注意:是在原來數(shù)據(jù)集上做更新
>>> s1 = Series(arange(1.0,4.0),index=[’a’,’b’,’c’]) >>> s1 a 1 b 2 c 3 dtype: float64 >>> s2 = Series(-1.0 * arange(1.0,4.0),index=[’c’,’d’,’e’]) >>> s1.update(s2) >>> s1 a 1 b 2 c -1 dtype: float649、數(shù)據(jù)框架,DataFrame,相當(dāng)于array上的二維數(shù)組,區(qū)別于array數(shù)組的地方時它可以是不同數(shù)據(jù)類型的數(shù)據(jù)組合在一起
from pandas import DataFrame a=np.array([[1,2],[3,4]]); df=DataFrame(a) df Out[52]:0 1 0 1 2 1 3 4>>> df = DataFrame(array([[1,2],[3,4]]),columns=[’a’,’b’])
>>> df
a b
0 1 2
1 3 4
也可以指定行標(biāo)簽和列標(biāo)簽
>>> df = DataFrame(array([[1,2],[3,4]]), columns=[’dogs’,’cats’], index=[’Alice’,’Bob’]) >>> df dogs cats Alice 1 2 Bob 3 4?10、也可以通過字典來初始化DataFrame
?11、也可以指定列標(biāo)簽
>>> df = DataFrame(array([[1,2],[3,4]]), columns=[’dogs’,’cats’], index=[’Alice’,’Bob’])
>>> df
dogs cats
Alice 1 2
Bob 3 4
?
?二、操作數(shù)據(jù)框架,工作目錄中有一個excel文件可以用,我的是score.xlsx
1、讀取數(shù)據(jù)
2、選擇列可以直接是列名或者列明組成的列表
?
?3、選擇行可以是列標(biāo)簽或者列標(biāo)簽組成的列表,也可以是數(shù)字切片、真值表
from pandas import read_excel score = read_excel('score.xlsx','Sheet1') score[:1] Out[20]:| 1501 | 56 | 65 | 89 | 45 | 87 | 98 |
| 1503 | 65 | 78 | 68 | 86 | 78 | 87 |
| 1506 | 64 | 67 | 82 | 76 | 78 | 73 |
?
?
4、選擇行和列,需要使用ix[rowselector,colselector]
5、添加列跟字典用法差不多
>>> state_gdp_2012 = state_gdp[[’state’,’gdp_2012’]] >>> state_gdp_2012.head() state gdp_2012 0 Alabama 157272 1 Alaska 44732 2 Arizona 230641 3 Arkansas 93892 4 California 1751002 >>> state_gdp_2012[’gdp_growth_2012’] = state_gdp[’gdp_growth_2012’] >>> state_gdp_2012.head() state gdp_2012 gdp_growth_2012 0 Alabama 157272 1.2 1 Alaska 44732 1.1 2 Arizona 230641 2.6 3 Arkansas 93892 1.3或者insert(location,column_name,series)
>>> state_gdp_2012 = state_gdp[[’state’,’gdp_2012’]]
>>> state_gdp_2012.insert(1,’gdp_growth_2012’,state_gdp[’gdp_growth_2012’])
>>> state_gdp_2012.head()
state gdp_growth_2012 gdp_2012
0 Alabama 1.2 157272
1 Alaska 1.1 44732
2 Arizona 2.6 230641
3 Arkansas 1.3 93892
4 California 3.5 1751002
6、修改數(shù)據(jù)
from pandas import read_excel score = read_excel('score.xlsx','Sheet1') print score[:3] score.ix[0,'english']=90 print score[:3]序號 english math chinese physics chemistry biology 0 1501 56 65 89 45 87 98 1 1502 45 65 89 78 98 89 2 1503 65 78 68 86 78 87序號 english math chinese physics chemistry biology 0 1501 90 65 89 45 87 98 1 1502 45 65 89 78 98 89 2 1503 65 78 68 86 78 877、刪除列,可以使用del關(guān)鍵字、pop(column)?方法、drop(list?of columns,axis=1)?
from pandas import Series from pandas import read_excel score = read_excel('score.xlsx','Sheet1') scorecopy = score.copy() print score[:2] score.pop('biology') print score[:2] ?序號 english math chinese physics chemistry biology 0 1501 56 65 89 45 87 98 1 1502 45 65 89 78 98 89序號 english math chinese physics chemistry 0 1501 56 65 89 45 87 1 1502 45 65 89 78 988、?dropna 刪除含有Nan的行或者列,and drop_duplicates
9、fillna(value=value )將所有的Nan數(shù)據(jù)替換成所附的值
>>> df = DataFrame(array([[1, nan],[nan, 2]]))
>>> df.columns = [’one’,’two’]
>>> replacements = {’one’:-1, ’two’:-2}
>>> df.fillna(value=replacements)
one two
0 1 -2
1 -1 2
?10、sort
>>> df = DataFrame(array([[1, 3],[1, 2],[3, 2],[2,1]]), columns=[’one’,’two’])
>>> df.sort(columns=’one’)
one two
0 1 3
1 1 2
3 2 1
2 3 2
>>> df.sort(columns=[’one’,’two’], ascending=[0,1])
one two
2 3 2
3 2 1
1 1 2
0 1 3
?
總結(jié)
以上是生活随笔為你收集整理的读书笔记6pandas简单使用的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Linux headtail命令
- 下一篇: OneZero第五周第二次站立会议(20