Titanic(泰坦尼克号生存预测)---(1)
生活随笔
收集整理的這篇文章主要介紹了
Titanic(泰坦尼克号生存预测)---(1)
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
我是初學(xué)者哈,有問(wèn)題歡迎大家指出。一起加油,共同進(jìn)步!
關(guān)于數(shù)據(jù)以及代碼:
讀取數(shù)據(jù)
train_df = pd.read_csv('data/泰坦尼克號(hào)生存率/train.csv') test_df = pd.read_csv('data/泰坦尼克號(hào)生存率/test.csv') combine = [train_df, test_df] #特征屬性值以及前五個(gè)數(shù)據(jù)樣本 print(train_df.columns.values) train_df.head() # 查看數(shù)據(jù)集的缺失情況 train_df.info() print('_'*50) test_df.info() out: <class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): PassengerId 891 non-null int64 Survived 891 non-null int64 Pclass 891 non-null int64 Name 891 non-null object Sex 891 non-null object Age 714 non-null float64 SibSp 891 non-null int64 Parch 891 non-null int64 Ticket 891 non-null object Fare 891 non-null float64 Cabin 204 non-null object Embarked 889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 66.2+ KB __________________________________________________ <class 'pandas.core.frame.DataFrame'> RangeIndex: 418 entries, 0 to 417 Data columns (total 11 columns): PassengerId 418 non-null int64 Pclass 418 non-null int64 Name 418 non-null object Sex 418 non-null object Age 332 non-null float64 SibSp 418 non-null int64 Parch 418 non-null int64 Ticket 418 non-null object Fare 417 non-null float64 Cabin 91 non-null object Embarked 418 non-null object dtypes: float64(2), int64(4), object(5)得到結(jié)論:
數(shù)據(jù)缺失情況:
對(duì)于訓(xùn)練數(shù)據(jù):cabin信息缺失很多,age部分缺失,再是embarked少量缺失
對(duì)于測(cè)試數(shù)據(jù):cabin>age
數(shù)據(jù)類型:
7+5
6+5
對(duì)缺失數(shù)據(jù)進(jìn)行處理
缺失數(shù)據(jù)處理方法
先看缺失值最少的embarked:
年齡采用均值插補(bǔ)法
age_mean=dataset['Age'].mean() age_meanfor dataset in combine:dataset['Age'] = dataset['Age'].fillna(age_mean)train_df.info() train_df[['Age', 'Survived']].groupby(['Age'], as_index=False).mean().sort_values(by='Survived', ascending=False)cabin可以直接丟棄
- 缺失數(shù)據(jù)過(guò)大
- 該特征值與存活率相關(guān)不大
將數(shù)據(jù)規(guī)格化
對(duì)于
創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎(jiǎng)勵(lì)來(lái)咯,堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎(jiǎng)總結(jié)
以上是生活随笔為你收集整理的Titanic(泰坦尼克号生存预测)---(1)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: m进制数转换为十进制数
- 下一篇: 网络重启程序