生活随笔
收集整理的這篇文章主要介紹了
【机器学习】逻辑回归—良/恶性乳腺癌肿瘤预测
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
邏輯回歸—良/惡性乳腺癌腫瘤預(yù)測(cè)
- 邏輯回歸的損失函數(shù)、優(yōu)化
與線性回歸原理相同,但由于是分類問題,損失函數(shù)不一樣,只能通過梯度下降求解
sklearn邏輯回歸API
sklearn
.linear_model
.LogisticRegression
sklearn
.linear_model
.LogisticRegression
(penalty
=‘l2’
, C
= 1.0)
Logistic回歸分類器
coef_:回歸系數(shù)
LogisticRegression回歸案例-良/惡性乳腺癌腫瘤預(yù)測(cè)
良/惡性乳腺癌腫數(shù)據(jù)
原始數(shù)據(jù)的下載地址:
https://archive.ics.uci.edu/ml/machine-learning-databases/
7.#屬性域
------------------------------------------
1.樣品編號(hào)ID編號(hào)
2.團(tuán)塊厚度1-10
3.像元大小的均勻性1-10
4.細(xì)胞形狀的均勻性1-10
5.邊緣附著力1-10
6.單上皮細(xì)胞大小1-10
7.裸核1-10
8.淡色染色質(zhì)1-10
9.正常核仁1-10
10.線粒體1-10
11.等級(jí):(2代表良性,4代表惡性)
8.缺少屬性值:16
第1到第6組中的16個(gè)實(shí)例包含一個(gè)缺失項(xiàng)
(即不可用)屬性值,現(xiàn)在用“?”表示。
9.班級(jí)分布:
良性:458(65.5%)
惡性:241(34.5%)
此處惡性為正例,良性為反例。
哪一個(gè)類別少,判定概率值是這個(gè)類別!
pd
.read_csv
(’’
,names
=column_names
)
column_names:指定類別名字
,['Sample code number','Clump Thickness', 'Uniformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion', 'Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class']
return:數(shù)據(jù)replace
(to_replace
=’’
,value
=):返回?cái)?shù)據(jù)
dropna
():返回?cái)?shù)據(jù)
- 良/惡性乳腺癌腫分類流程
1、網(wǎng)上獲取數(shù)據(jù)(工具pandas)
2、數(shù)據(jù)缺失值處理、標(biāo)準(zhǔn)化
3、LogisticRegression估計(jì)器流程
from sklearn
.linear_model
import LinearRegression
, SGDRegressor
, Ridge
, LogisticRegression
from sklearn
.model_selection
import train_test_split
from sklearn
.preprocessing
import StandardScaler
from sklearn
.metrics
import mean_squared_error
, classification_report
import pandas
as pd
import numpy
as np
def logistic():"""邏輯回歸做二分類進(jìn)行癌癥預(yù)測(cè)(根據(jù)細(xì)胞的屬性特征):return: NOne"""column
= ['Sample code number','Clump Thickness', 'Uniformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion', 'Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class']data
= pd
.read_csv
("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data", names
=column
)print(data
)data
= data
.replace
(to_replace
='?', value
=np
.nan
)data
= data
.dropna
()x_train
, x_test
, y_train
, y_test
= train_test_split
(data
[column
[1:10]], data
[column
[10]], test_size
=0.25)std
= StandardScaler
()x_train
= std
.fit_transform
(x_train
)x_test
= std
.transform
(x_test
)lg
= LogisticRegression
(C
=1.0)lg
.fit
(x_train
, y_train
)print(lg
.coef_
)y_predict
= lg
.predict
(x_test
)print("準(zhǔn)確率:", lg
.score
(x_test
, y_test
))print("召回率:", classification_report
(y_test
, y_predict
, labels
=[2, 4], target_names
=["良性", "惡性"]))return Noneif __name__
== "__main__":logistic
()
[699 rows x
11 columns
]
[[1.35467578 0.18001121 0.74721681 0.89447017 0.38691172 1.264152650.95382046 0.53218847 0.51240579]]
準(zhǔn)確率:
0.9707602339181286
召回率: precision recall f1
-score support良性
0.97 0.98 0.98 112惡性
0.97 0.95 0.96 59accuracy
0.97 171macro avg
0.97 0.97 0.97 171
weighted avg
0.97 0.97 0.97 171
總結(jié)
以上是生活随笔為你收集整理的【机器学习】逻辑回归—良/恶性乳腺癌肿瘤预测的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。