當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

机器学习-数据科学库（第四天）

發(fā)布時(shí)間：2025/3/21 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习-数据科学库（第四天）小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

23.pandas的series的了解

為什么要學(xué)習(xí)pandas

numpy能夠幫助我們處理數(shù)值，但是pandas除了處理數(shù)值之外(基于numpy)，還能夠幫助我們處理其他類(lèi)型的數(shù)據(jù)

pandas的常用數(shù)據(jù)類(lèi)型

Series 一維，帶標(biāo)簽數(shù)組

DataFrame 二維，Series容器? ? ??

pandas之Series創(chuàng)建

import pandas as pd t1 = pd.Series([1,2,15,3,4]) print(t1) t2 = pd.Series([1,2,15,3,4],index=list("abcde")) print(t2)dict = {"name":"xiaohong","age":30,"tel":10086} t3 = pd.Series(dict) print(t3) 0 1 1 2 2 15 3 3 4 4 dtype: int64a 1 b 2 c 15 d 3 e 4 dtype: int64name xiaohong age 30 tel 10086 dtype: object

pandas之Series切片和索引

print(t3["age"]) print(t3[0]) print(t3[[0,2]]) 30xiaohongname xiaohong tel 10086 dtype: object

pandas之Series的索引和值

print(t3.index) print(t3.values) print(type(t3.values)) Index(['name', 'age', 'tel'], dtype='object')['xiaohong' 30 10086]<class 'numpy.ndarray'>

24.pandas讀取外部數(shù)據(jù)

pandas之讀取外部數(shù)據(jù)

import pandas as pd #pandas讀取csv中的文件 df = pd.read_csv("")

25.pandas的dataFrame的創(chuàng)建

pandas的dataFrame的創(chuàng)建

import pandas as pd import numpy as np #pandas讀取csv中的文件 t = pd.DataFrame(np.arange(12).reshape(3,4)) print(t) 0 1 2 3 0 0 1 2 3 1 4 5 6 7 2 8 9 10 11

DataFrame對(duì)象既有行索引，又有列索引

行索引，表明不同行，橫向索引，叫index，0軸，axis=0 (豎著的)

列索引，表名不同列，縱向索引，叫columns，1軸，axis=1 (橫著的)

t2 = pd.DataFrame(np.arange(12).reshape(3,4),index =list("abc"),columns=list("wxyz")) print(t2) w x y z a 0 1 2 3 b 4 5 6 7 c 8 9 10 11 import pandas as pd import numpy as np d1 = {"name":["xiaoming","xiaogang"],"age":[20,32],"tel":[10086,10010]} t = pd.DataFrame(d1) print(t) print(type(t))d2 = [{"name":"xiaogong","age":32,"tel":10010},{"name":"xiaoping","age":18,"tel":10880},{"name":"fjj","age":13,"tel":10210}] t2 = pd.DataFrame(d2) print(t2) print(type(t2)) name age tel 0 xiaoming 20 10086 1 xiaogang 32 10010 <class 'pandas.core.frame.DataFrame'>name age tel 0 xiaogong 32 10010 1 xiaoping 18 10880 2 fjj 13 10210 <class 'pandas.core.frame.DataFrame'>

26 .Dataframe的描述信息

Dataframe的描述信息

import pandas as pd import numpy as np from pymongo import MongoClient client = MongoClient() collection = client["douban"]["tv1"] data = collection.find() data_list = [] for i in data:temp = {}temp["info"]=i["info"]temp["rating_count"]=i["rating"]["count"]temp["rating_value"] = i["rating"]["value"]temp["title"]=i["title"]temp["country"]=i["country"]temp["directors"]=i["directors"]temp["actors"] = i["actors"]data_list.append(temp) df=pd.DataFrame(data_list) print(df)

import pandas as pd df = pd.read_csv("/Users/zhucan/Desktop/dogNames2.csv") print(df.head()) print(df.info())#=DataFrames排序的方法 df=df.sort_values(by="Count_AnimalName",ascending=True) print(df) Row_Labels Count_AnimalName 0 1 1 1 2 2 2 40804 1 3 90201 1 4 90203 1<class 'pandas.core.frame.DataFrame'> RangeIndex: 16220 entries, 0 to 16219 Data columns (total 2 columns):# Column Non-Null Count Dtype --- ------ -------------- ----- 0 Row_Labels 16217 non-null object1 Count_AnimalName 16220 non-null int64 dtypes: int64(1), object(1) memory usage: 253.6+ KB NoneRow_Labels Count_AnimalName 0 1 1 9383 MERINO 1 9384 MERISE 1 9386 MERLEDEZ 1 9389 MERLYN 1 ... ... ... 12368 ROCKY 823 3251 COCO 852 2660 CHARLIE 856 9140 MAX 1153 1156 BELLA 1195 [16220 rows x 2 columns]

27.dataFrame的索引

pandas之取行或者列

import pandas as pd df = pd.read_csv("/Users/zhucan/Desktop/dogNames2.csv") #=DataFrames排序的方法 df=df.sort_values(by="Count_AnimalName",ascending=False)#pandas取行或者列的注意點(diǎn) #方括號(hào)寫(xiě)數(shù)組，表示取行，對(duì)行進(jìn)行操作 #寫(xiě)字符串，表示的是取列索引，對(duì)列進(jìn)行操作 print(df[:20]) print(df[:20]["Row_Labels"])

pandas之loc

import pandas as pd import numpy as np t3 =pd.DataFrame(np.arange(12).reshape(3,4),index=list("abc"),columns=list("WXYZ")) print(t3) print(t3.loc["a","Z"]) print(t3.loc["a"]) print(t3.loc[["a","b"]]) print(t3.loc[:,"W"]) print(t3.iloc[1]) W X Y Z a 0 1 2 3 b 4 5 6 7 c 8 9 10 11

28.bool索引和缺失數(shù)據(jù)的處理

pandas之布爾索引

import pandas as pd df = pd.read_csv("/Users/zhucan/Desktop/dogNames2.csv") print(df[(800<df["Count_AnimalName"])&(df["Count_AnimalName"]<1000)])

pandas之字符串方法?

df["Row_Labels"].str.len()>4

df["info"].str.split("/").tolist()

缺失數(shù)據(jù)的處理

我們的數(shù)據(jù)缺失通常有兩種情況：

一種就是空，None等，在pandas是NaN(和np.nan一樣)

另一種是我們讓其為0，藍(lán)色框中?

判斷是否為NaN：pd.isnull(df)? ? ? 若是NaN返回True 若不是NaN返回False

處理方式1：刪除NaN所在的行列t3.dropna (axis=0, how='any')? 刪除存在NaN的行?

? ? ? ? ? ? ? ? ? ? 刪除NaN所在的行列t3.dropna (axis=0, how='all '）刪除全為NaN的行

? ? ? ? ? ? ? ? ? ? t3.dropna (axis=0, how='any', inplace=True)，就地修改t3

處理方式2：填充數(shù)據(jù)，t.fillna(t.mean()),? ? ? t.fiallna(t.median()),? ? ? ?t.fillna(0)

? ? ? ? ? ? ? ? ? ? t2["age"]=t2["age"].fillna(t2["age"].mean())

注意：

處理為0的數(shù)據(jù)：t[t==0]=np.nan

當(dāng)然并不是每次為0的數(shù)據(jù)都需要處理

計(jì)算平均值等情況，nan是不參與計(jì)算的，但是0會(huì)

29.pandas的常用統(tǒng)計(jì)方法

pandas的常用統(tǒng)計(jì)方法

假設(shè)現(xiàn)在我們有一組從2006年到2016年1000部最流行的電影數(shù)據(jù)，我們想知道這些電影數(shù)據(jù)中評(píng)分的平均分，導(dǎo)演的人數(shù)等信息，我們應(yīng)該怎么獲取？

import numpy as np import pandas as pd file_path="/Users/zhucan/Desktop/datasets_IMDB-Movie-Data.csv" df=pd.read_csv(file_path) print(df.info()) print(df.head(1))#獲取平均評(píng)分 print(df["Rating"].mean()) #導(dǎo)演人數(shù) print(len(set(df["Director"].tolist()))) #獲取演員人數(shù) temp_actors_list = df["Actors"].str.split(", ").tolist() actor_list = [i for j in temp_actors_list for i in j] print(len(set(actor_list)))

30.電影數(shù)直方圖

電影數(shù)直方圖

對(duì)于這一組電影數(shù)據(jù)，如果我們想rating，runtime的分布情況，應(yīng)該如何呈現(xiàn)數(shù)據(jù)？

import numpy as np import pandas as pd from matplotlib import pyplot as plt file_path="/Users/zhucan/Desktop/datasets_IMDB-Movie-Data.csv" df=pd.read_csv(file_path) print(df.info()) print(df.head(1))#rating,runtime分布情況 runtime_data = df["Runtime (Minutes)"].valuesmax_runtime = runtime_data.max() min_runtime = runtime_data.min()#計(jì)算組數(shù) num_bin = (max_runtime-min_runtime)//5 plt.figure(figsize=(20,8),dpi=80) plt.hist(runtime_data,num_bin) plt.xticks(range(min_runtime,max_runtime+5,5)) plt.show()

總結(jié)

以上是生活随笔為你收集整理的机器学习-数据科学库（第四天）的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。