Pandas CookBook -- 04选取数据子集
生活随笔
收集整理的這篇文章主要介紹了
Pandas CookBook -- 04选取数据子集
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
選取數據子集
簡書大神SeanCheney的譯作,我作了些格式調整和文章目錄結構的變化,更適合自己閱讀,以后翻閱是更加方便自己查找吧
import pandas as pd import numpy as np設定最大列數和最大行數
pd.set_option('max_columns',5 , 'max_rows', 5)1 選取Series數據
college = pd.read_csv('data/college.csv', index_col='INSTNM') city = college['CITY']1.1 iloc整數選取
標量
city.iloc[3] 'Huntsville'多行 - 整數列表
city.iloc[[10,20,30]] INSTNM Birmingham Southern College Birmingham George C Wallace State Community College-Hanceville Hanceville Judson College Marion Name: CITY, dtype: object多行 - 整數切片
city.iloc[4:50:10] INSTNM Alabama State University Montgomery Enterprise State Community College Enterprise Heritage Christian University Florence Marion Military Institute Marion Reid State Technical College Evergreen Name: CITY, dtype: object1.2 loc標簽選取
標量
city.loc['Heritage Christian University'] 'Florence'多行 - 標簽列表
隨機選擇4個標簽
labels = list(np.random.choice(city.index, 4)) labels ['University of St Thomas','Paul Mitchell the School-Woodbridge','San Francisco Conservatory of Music','Trinity Bible College'] city.loc[labels] INSTNM University of St Thomas Saint Paul Paul Mitchell the School-Woodbridge Woodbridge San Francisco Conservatory of Music San Francisco Trinity Bible College Ellendale Name: CITY, dtype: object多行 - 標簽切片
city.loc['Alabama State University':'Reid State Technical College':10] INSTNM Alabama State University Montgomery Enterprise State Community College Enterprise Heritage Christian University Florence Marion Military Institute Marion Reid State Technical College Evergreen Name: CITY, dtype: object也可以切片逆序選取
city.loc['Reid State Technical College':'Alabama State University':-10] INSTNM Reid State Technical College Evergreen Marion Military Institute Marion Heritage Christian University Florence Enterprise State Community College Enterprise Alabama State University Montgomery Name: CITY, dtype: object2 選取DataFrame數據
2.1 選取DataFrame的行
注意iloc是整數索引,loc是標簽索引即可
2.2 同時選取DataFrame的行和列
2.2.1 選取前3行和前4列
college.iloc[:3, :4]| Normal | AL | 1.0 | 0.0 |
| Birmingham | AL | 0.0 | 0.0 |
| Montgomery | AL | 0.0 | 0.0 |
| Normal | AL | 1.0 | 0.0 |
| Birmingham | AL | 0.0 | 0.0 |
| Montgomery | AL | 0.0 | 0.0 |
2.2.2 選取兩列的所有的行
college.iloc[:, [4,6]].head()| 0.0 | 424.0 |
| 0.0 | 570.0 |
| 0.0 | NaN |
| 0.0 | 595.0 |
| 0.0 | 425.0 |
| 0.0 | 424.0 |
| 0.0 | 570.0 |
| ... | ... |
| NaN | NaN |
| NaN | NaN |
7535 rows × 2 columns
2.2.3 選取不連續的行和列
college.iloc[[100, 200], [7, 15]]| NaN | 0.0029 |
| NaN | NaN |
| NaN | 0.0029 |
| NaN | NaN |
2.2.4 選取一個標量值
college.iloc[5, -4] 0.401 college.loc['The University of Alabama', 'PCTFLOAN'] 0.4012.2.5 行切片選取一列
college.iloc[90:80:-2, 5] INSTNM Empire Beauty School-Flagstaff 0 Charles of Italy Beauty College 0 Central Arizona College 0 University of Arizona 0 Arizona State University-Tempe 0 Name: RELAFFIL, dtype: int64 start = 'Empire Beauty School-Flagstaff' stop = 'Arizona State University-Tempe' college.loc[start:stop:-2, 'RELAFFIL'] INSTNM Empire Beauty School-Flagstaff 0 Charles of Italy Beauty College 0 Central Arizona College 0 University of Arizona 0 Arizona State University-Tempe 0 Name: RELAFFIL, dtype: int643 用整數和標簽的互換
用索引方法get_loc,找到指定列的整數位置
col_start = college.columns.get_loc('UGDS_WHITE') col_end = college.columns.get_loc('UGDS_UNKN') + 1 col_start, col_end (10, 19) college.iloc[:5, col_start:col_end]| 0.0333 | 0.9353 | ... | 0.0059 | 0.0138 |
| 0.5922 | 0.2600 | ... | 0.0179 | 0.0100 |
| 0.2990 | 0.4192 | ... | 0.0000 | 0.2715 |
| 0.6988 | 0.1255 | ... | 0.0332 | 0.0350 |
| 0.0158 | 0.9208 | ... | 0.0243 | 0.0137 |
5 rows × 9 columns
獲得整數行對應的標簽名
row_end = college.index[4] col_start,col_end = college.columns[10],college.columns[19] col_start,col_end ('UGDS_WHITE', 'PPTUG_EF') college.loc[:row_end, col_start:col_end]| 0.0333 | 0.9353 | ... | 0.0138 | 0.0656 |
| 0.5922 | 0.2600 | ... | 0.0100 | 0.2607 |
| 0.2990 | 0.4192 | ... | 0.2715 | 0.4536 |
| 0.6988 | 0.1255 | ... | 0.0350 | 0.2146 |
| 0.0158 | 0.9208 | ... | 0.0137 | 0.0892 |
5 rows × 10 columns
4 切片操作
4.1 惰性行切片
惰性,我的理解就就是省去loc和iloc吧,但是惰性切片不能用于列,只能用于DataFrame的行和Series,也不能同時選取行和列。
Series選取數據
city = college['CITY'] city[10:20:2] INSTNM Birmingham Southern College Birmingham Concordia College Alabama Selma Enterprise State Community College Enterprise Faulkner University Montgomery New Beginning College of Cosmetology Albertville Name: CITY, dtype: objectdataframe行選取
college[10:20:2]| Birmingham | AL | ... | 44200 | 27000 |
| Selma | AL | ... | 19900 | PrivacySuppressed |
| Enterprise | AL | ... | 24600 | 8273 |
| Montgomery | AL | ... | 37200 | 22000 |
| Albertville | AL | ... | NaN | 5500 |
5 rows × 26 columns
Series和DataFrame都可以用標簽進行切片。
start = 'Mesa Community College' stop = 'Spokane Community College' college[start:stop:1500]| Mesa | AZ | ... | 35200 | 8000 |
| New Carrollton | MD | ... | 15200 | 9666 |
| Portland | OR | ... | NaN | PrivacySuppressed |
3 rows × 26 columns
4.2 按照字母切片
需先對索引進行排序
college = college.sort_index() college.head()| New Orleans | LA | ... | NaN | 19022.5 |
| Kirksville | MO | ... | 219800 | PrivacySuppressed |
| Garland | TX | ... | NaN | PrivacySuppressed |
| Arkadelphia | AR | ... | PrivacySuppressed | 16500 |
| Miami | FL | ... | 29900 | 31000 |
5 rows × 26 columns
選取字母順序在‘Sp’和‘Su’之間的學校
college.loc['Sp':'Su']| Ipswich | MA | ... | 21500 | 6333 |
| Plymouth | MA | ... | 21500 | 6333 |
| ... | ... | ... | ... | ... |
| Selmer | TN | ... | PrivacySuppressed | PrivacySuppressed |
| Rock Hill | SC | ... | PrivacySuppressed | 9495.5 |
201 rows × 26 columns
轉載于:https://www.cnblogs.com/shiyushiyu/p/9742232.html
總結
以上是生活随笔為你收集整理的Pandas CookBook -- 04选取数据子集的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 离线网页制作器(beta1.0)
- 下一篇: 鼠标问题:鼠标拖拽不灵敏,准备复制拖动的