【sklearrn学习】朴素贝叶斯
樸素貝葉斯是直接衡量標(biāo)簽和特征之間的概率關(guān)系的有監(jiān)督學(xué)習(xí)算法
分類原理:通過某對象的先驗(yàn)概率,利用貝葉斯公式計(jì)算出其后驗(yàn)概率,即該對象屬于某一類的概率,選擇具有最大后驗(yàn)概率的類作為該對象的類。
import pandas as pd import numpy as np from sklearn.datasets import load_breast_cancer, load_wine from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB from sklearn.naive_bayes import MultinomialNB from sklearn.naive_bayes import BernoulliNB from sklearn.naive_bayes import ComplementNB cancer = load_breast_cancer() data_train, data_test, target_train, target_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=0)樸素貝葉斯
樸素貝葉斯法假設(shè):在分類確定的條件下,用于分類的特征是條件獨(dú)立的。
樸素貝葉斯學(xué)習(xí)的參數(shù)是先驗(yàn)概率和條件概率,通常采用極大似然估計(jì)這兩種概率
高斯樸素貝葉斯
假設(shè)特征的條件概率分布滿足高斯分布
sklearn.naive_bayes.GaussianNB
class sklearn.naive_bayes.GaussianNB(*, priors=None, var_smoothing=1e-09)
prior:類的先驗(yàn)概率,如果不指定,則自行根據(jù)數(shù)據(jù)計(jì)算先驗(yàn)概率
var_smoothing:浮點(diǎn)數(shù),默認(rèn)1e-9
GaussianNB_model = GaussianNB() GaussianNB_model.fit(data_train, target_train) train_score = GaussianNB_model.score(data_train, target_train) print('train score:',train_score) test_score = GaussianNB_model.score(data_test, target_test) print('test score:',test_score)多項(xiàng)式樸素貝葉斯
假設(shè)特征的條件概率分布滿足多項(xiàng)式分布
sklearn.naive_bayes.MultinomialNB
class?sklearn.naive_bayes.MultinomialNB(*,?alpha=1.0,?fit_prior=True,?class_prior=None)
alpha:浮點(diǎn)數(shù),用于指定α的值
fit_prior:bool,如果為True,則不學(xué)習(xí)概率值,代以均勻分布
class_prior:數(shù)組,指定每個(gè)分類的先驗(yàn)概率
MultiomiaNB_model = MultinomialNB() MultiomiaNB_model.fit(data_train, target_train) train_score = MultiomiaNB_model.score(data_train, target_train) print('train score:',train_score) test_score = MultiomiaNB_model.score(data_test, target_test) print('test score:',test_score)繪制準(zhǔn)確率與學(xué)習(xí)率的學(xué)習(xí)曲線
def test_MultinomialNB_alpha(*data):'''測試 MultinomialNB 的預(yù)測性能隨 alpha 參數(shù)的影響:param data: 可變參數(shù)。它是一個(gè)元組,這里要求其元素依次為:訓(xùn)練樣本集、測試樣本集、訓(xùn)練樣本的標(biāo)記、測試樣本的標(biāo)記:return: None'''X_train,X_test,y_train,y_test=dataalphas=np.logspace(-2,5,num=200)train_scores=[]test_scores=[]for alpha in alphas:cls=MultinomialNB(alpha=alpha)cls.fit(X_train,y_train)train_scores.append(cls.score(X_train,y_train))test_scores.append(cls.score(X_test, y_test))## 繪圖fig=plt.figure()ax=fig.add_subplot(1,1,1)ax.plot(alphas,train_scores,label="Training Score")ax.plot(alphas,test_scores,label="Testing Score")ax.set_xlabel(r"$\alpha$")ax.set_ylabel("score")ax.set_ylim(0,1.0)ax.set_title("MultinomialNB")ax.set_xscale("log")plt.show()伯努利貝葉斯分類器
假設(shè)特征的條件概率分布滿足二項(xiàng)分布
sklearn.naive_bayes.BernoulliNB
class?sklearn.naive_bayes.BernoulliNB(*,?alpha=1.0,?binarize=0.0,?fit_prior=True,?class_prior=None)
alpha=1.0:浮點(diǎn)數(shù),α值
binarize:將特征二值化的閾值
處理二項(xiàng)分布的樸素貝葉斯,需要先對數(shù)據(jù)二值化
BernoulliNB_model = BernoulliNB() BernoulliNB_model.fit(data_train, target_train) train_score = BernoulliNB_model.score(data_train, target_train) print('train score:',train_score) test_score = BernoulliNB_model.score(data_test, target_test) print('test score:',test_score)測試 BernoulliNB 的預(yù)測性能隨 binarize 參數(shù)的影響
作為經(jīng)驗(yàn)值,可以將binarize取(所有特征中的最小值 + 所有特征中的最大值)/ 2
def test_BernoulliNB_binarize(*data):'''測試 BernoulliNB 的預(yù)測性能隨 binarize 參數(shù)的影響:param data: 可變參數(shù)。它是一個(gè)元組,這里要求其元素依次為:訓(xùn)練樣本集、測試樣本集、訓(xùn)練樣本的標(biāo)記、測試樣本的標(biāo)記:return: None'''X_train,X_test,y_train,y_test=datamin_x=min(np.min(X_train.ravel()),np.min(X_test.ravel()))-0.1max_x=max(np.max(X_train.ravel()),np.max(X_test.ravel()))+0.1binarizes=np.linspace(min_x,max_x,endpoint=True,num=100)train_scores=[]test_scores=[]for binarize in binarizes:cls=BernoulliNB(binarize=binarize)cls.fit(X_train,y_train)train_scores.append(cls.score(X_train,y_train))test_scores.append(cls.score(X_test, y_test))## 繪圖fig=plt.figure()ax=fig.add_subplot(1,1,1)ax.plot(binarizes,train_scores,label="Training Score")ax.plot(binarizes,test_scores,label="Testing Score")ax.set_xlabel("binarize")ax.set_ylabel("score")ax.set_ylim(0,1.0)ax.set_xlim(min_x-1,max_x+1)ax.set_title("BernoulliNB")ax.legend(loc="best")plt.show()test_BernoulliNB_binarize(data_train,data_test,target_train,target_test) # 調(diào)用 test_BernoulliNB_alpha補(bǔ)集樸素貝葉斯
sklearn.naive_bayes.ComplementNB
class?sklearn.naive_bayes.ComplementNB(*,?alpha=1.0,?fit_prior=True,?class_prior=None,?norm=False)
多項(xiàng)式樸素貝葉斯算法的改進(jìn),可以用于捕捉少數(shù)類
ComplementNB_model = ComplementNB() ComplementNB_model.fit(data_train, target_train) train_score = ComplementNB_model.score(data_train, target_train) print('train score:',train_score) test_score = ComplementNB_model.score(data_test, target_test) print('test score:',test_score)總結(jié)
以上是生活随笔為你收集整理的【sklearrn学习】朴素贝叶斯的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 商业保险保的是什么
- 下一篇: 机器学习-cs229-线性回归-梯度下降