日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Pandas CookBook -- 04选取数据子集

發布時間:2023/12/10 编程问答 36 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Pandas CookBook -- 04选取数据子集 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

選取數據子集

簡書大神SeanCheney的譯作,我作了些格式調整和文章目錄結構的變化,更適合自己閱讀,以后翻閱是更加方便自己查找吧

import pandas as pd import numpy as np

設定最大列數和最大行數

pd.set_option('max_columns',5 , 'max_rows', 5)

1 選取Series數據

college = pd.read_csv('data/college.csv', index_col='INSTNM') city = college['CITY']

1.1 iloc整數選取

標量

city.iloc[3] 'Huntsville'

多行 - 整數列表

city.iloc[[10,20,30]] INSTNM Birmingham Southern College Birmingham George C Wallace State Community College-Hanceville Hanceville Judson College Marion Name: CITY, dtype: object

多行 - 整數切片

city.iloc[4:50:10] INSTNM Alabama State University Montgomery Enterprise State Community College Enterprise Heritage Christian University Florence Marion Military Institute Marion Reid State Technical College Evergreen Name: CITY, dtype: object

1.2 loc標簽選取

標量

city.loc['Heritage Christian University'] 'Florence'

多行 - 標簽列表

隨機選擇4個標簽

labels = list(np.random.choice(city.index, 4)) labels ['University of St Thomas','Paul Mitchell the School-Woodbridge','San Francisco Conservatory of Music','Trinity Bible College'] city.loc[labels] INSTNM University of St Thomas Saint Paul Paul Mitchell the School-Woodbridge Woodbridge San Francisco Conservatory of Music San Francisco Trinity Bible College Ellendale Name: CITY, dtype: object

多行 - 標簽切片

city.loc['Alabama State University':'Reid State Technical College':10] INSTNM Alabama State University Montgomery Enterprise State Community College Enterprise Heritage Christian University Florence Marion Military Institute Marion Reid State Technical College Evergreen Name: CITY, dtype: object

也可以切片逆序選取

city.loc['Reid State Technical College':'Alabama State University':-10] INSTNM Reid State Technical College Evergreen Marion Military Institute Marion Heritage Christian University Florence Enterprise State Community College Enterprise Alabama State University Montgomery Name: CITY, dtype: object

2 選取DataFrame數據

2.1 選取DataFrame的行

注意iloc是整數索引,loc是標簽索引即可

2.2 同時選取DataFrame的行和列

2.2.1 選取前3行和前4列

college.iloc[:3, :4] CITYSTABBRHBCUMENONLYINSTNMAlabama A & M UniversityUniversity of Alabama at BirminghamAmridge University
NormalAL1.00.0
BirminghamAL0.00.0
MontgomeryAL0.00.0
college.loc[:'Amridge University', :'MENONLY'] CITYSTABBRHBCUMENONLYINSTNMAlabama A & M UniversityUniversity of Alabama at BirminghamAmridge University
NormalAL1.00.0
BirminghamAL0.00.0
MontgomeryAL0.00.0

2.2.2 選取兩列的所有的行

college.iloc[:, [4,6]].head() WOMENONLYSATVRMIDINSTNMAlabama A & M UniversityUniversity of Alabama at BirminghamAmridge UniversityUniversity of Alabama in HuntsvilleAlabama State University
0.0424.0
0.0570.0
0.0NaN
0.0595.0
0.0425.0
college.loc[:, ['WOMENONLY', 'SATVRMID']] WOMENONLYSATVRMIDINSTNMAlabama A & M UniversityUniversity of Alabama at Birmingham...Bay Area Medical Academy - San Jose Satellite LocationExcel Learning Center-San Antonio South
0.0424.0
0.0570.0
......
NaNNaN
NaNNaN

7535 rows × 2 columns

2.2.3 選取不連續的行和列

college.iloc[[100, 200], [7, 15]] SATMTMIDUGDS_NHPIINSTNMGateWay Community CollegeAmerican Baptist Seminary of the West
NaN0.0029
NaNNaN
rows = ['GateWay Community College', 'American Baptist Seminary of the West'] columns = ['SATMTMID', 'UGDS_NHPI'] college.loc[rows, columns] SATMTMIDUGDS_NHPIINSTNMGateWay Community CollegeAmerican Baptist Seminary of the West
NaN0.0029
NaNNaN

2.2.4 選取一個標量值

college.iloc[5, -4] 0.401 college.loc['The University of Alabama', 'PCTFLOAN'] 0.401

2.2.5 行切片選取一列

college.iloc[90:80:-2, 5] INSTNM Empire Beauty School-Flagstaff 0 Charles of Italy Beauty College 0 Central Arizona College 0 University of Arizona 0 Arizona State University-Tempe 0 Name: RELAFFIL, dtype: int64 start = 'Empire Beauty School-Flagstaff' stop = 'Arizona State University-Tempe' college.loc[start:stop:-2, 'RELAFFIL'] INSTNM Empire Beauty School-Flagstaff 0 Charles of Italy Beauty College 0 Central Arizona College 0 University of Arizona 0 Arizona State University-Tempe 0 Name: RELAFFIL, dtype: int64

3 用整數和標簽的互換

用索引方法get_loc,找到指定列的整數位置

col_start = college.columns.get_loc('UGDS_WHITE') col_end = college.columns.get_loc('UGDS_UNKN') + 1 col_start, col_end (10, 19) college.iloc[:5, col_start:col_end] UGDS_WHITEUGDS_BLACK...UGDS_NRAUGDS_UNKNINSTNMAlabama A & M UniversityUniversity of Alabama at BirminghamAmridge UniversityUniversity of Alabama in HuntsvilleAlabama State University
0.03330.9353...0.00590.0138
0.59220.2600...0.01790.0100
0.29900.4192...0.00000.2715
0.69880.1255...0.03320.0350
0.01580.9208...0.02430.0137

5 rows × 9 columns

獲得整數行對應的標簽名

row_end = college.index[4] col_start,col_end = college.columns[10],college.columns[19] col_start,col_end ('UGDS_WHITE', 'PPTUG_EF') college.loc[:row_end, col_start:col_end] UGDS_WHITEUGDS_BLACK...UGDS_UNKNPPTUG_EFINSTNMAlabama A & M UniversityUniversity of Alabama at BirminghamAmridge UniversityUniversity of Alabama in HuntsvilleAlabama State University
0.03330.9353...0.01380.0656
0.59220.2600...0.01000.2607
0.29900.4192...0.27150.4536
0.69880.1255...0.03500.2146
0.01580.9208...0.01370.0892

5 rows × 10 columns

4 切片操作

4.1 惰性行切片

惰性,我的理解就就是省去loc和iloc吧,但是惰性切片不能用于列,只能用于DataFrame的行和Series,也不能同時選取行和列。

Series選取數據

city = college['CITY'] city[10:20:2] INSTNM Birmingham Southern College Birmingham Concordia College Alabama Selma Enterprise State Community College Enterprise Faulkner University Montgomery New Beginning College of Cosmetology Albertville Name: CITY, dtype: object

dataframe行選取

college[10:20:2] CITYSTABBR...MD_EARN_WNE_P10GRAD_DEBT_MDN_SUPPINSTNMBirmingham Southern CollegeConcordia College AlabamaEnterprise State Community CollegeFaulkner UniversityNew Beginning College of Cosmetology
BirminghamAL...4420027000
SelmaAL...19900PrivacySuppressed
EnterpriseAL...246008273
MontgomeryAL...3720022000
AlbertvilleAL...NaN5500

5 rows × 26 columns

Series和DataFrame都可以用標簽進行切片。

start = 'Mesa Community College' stop = 'Spokane Community College' college[start:stop:1500] CITYSTABBR...MD_EARN_WNE_P10GRAD_DEBT_MDN_SUPPINSTNMMesa Community CollegeHair Academy Inc-New CarrolltonNational College of Natural Medicine
MesaAZ...352008000
New CarrolltonMD...152009666
PortlandOR...NaNPrivacySuppressed

3 rows × 26 columns

4.2 按照字母切片

需先對索引進行排序

college = college.sort_index() college.head() CITYSTABBR...MD_EARN_WNE_P10GRAD_DEBT_MDN_SUPPINSTNMA & W Healthcare EducatorsA T Still University of Health SciencesABC Beauty AcademyABC Beauty College IncAI Miami International University of Art and Design
New OrleansLA...NaN19022.5
KirksvilleMO...219800PrivacySuppressed
GarlandTX...NaNPrivacySuppressed
ArkadelphiaAR...PrivacySuppressed16500
MiamiFL...2990031000

5 rows × 26 columns

選取字母順序在‘Sp’和‘Su’之間的學校

college.loc['Sp':'Su'] CITYSTABBR...MD_EARN_WNE_P10GRAD_DEBT_MDN_SUPPINSTNMSpa Tech Institute-IpswichSpa Tech Institute-Plymouth...Styles and Profiles Beauty CollegeStyletrends Barber and Hairstyling Academy
IpswichMA...215006333
PlymouthMA...215006333
...............
SelmerTN...PrivacySuppressedPrivacySuppressed
Rock HillSC...PrivacySuppressed9495.5

201 rows × 26 columns

轉載于:https://www.cnblogs.com/shiyushiyu/p/9742232.html

總結

以上是生活随笔為你收集整理的Pandas CookBook -- 04选取数据子集的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。