當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

二元逻辑回归(logistic regression)

發布時間：2024/1/1 编程问答 55 豆豆

生活随笔收集整理的這篇文章主要介紹了二元逻辑回归(logistic regression) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一，原理

二，python代碼

2.1 數據集的格式

2.2 代碼

三，適用條件

一，原理

回歸：

????????假設存在一些數據點，用一條直線或者曲線或折現去擬合這些點就叫做回歸。也就是找出平面點上兩個軸變量之間的函數關系，或者其他坐標系下的變量間關系。一句話就是：回歸就是依靠已有數據點去擬合函數關系。

? ? ? ? 常見的回歸有：線性回歸，非線性回歸，局部加權回歸……

邏輯回歸：回歸的目標是一個二值結果(0和1)，是一種常見的二元分類模型。本質就是線性回歸與激活函數sigmoid的結合，與大腦神經元工作方式類似，是入門機器學習的基礎。

應用：對于一個具體的實際問題，我們可以得到他既有的一些數據，那么就可以使用邏輯回歸對這些數據進行特征處理學習，讓計算機去尋找處數據之間的函數關系。當我們得到新的數據，就可以應用計算機得到的函數關系去預測某些數據所產生的結果。

理論來源：

數學實現：

????????線性回歸指的是多個y=ax+b這種的一元函數進行累加，如下面所說的數據綜合體。由于需要進行分類，使用sigmoid函數將連續的線性結果人為分為0和1兩種狀態。

sigmoid函數：

? ? ? ? ?以0.5為界限。

使用數學來實現神經元處理信息的過程：

前提：最終結果是0和1，表示兩種分類結果。w(數據權重)成了模型的參數。

第一步：定義損失函數

第二步：

求解L(w)的某一個w使其函數值最小，那么與實際結果數據就越吻合。

?第三步：

????????看這些符號已經頭昏眼花。

第四步：程序實現

二，python代碼

2.1 數據集的格式

? ? ? ? 命名為testset.txt，是一個N×3的形式。-0.017612與14.053064之間是一個tab的距離。

2.2 代碼

import numpy as np import matplotlib.pyplot as plt# 定義激活函數sigmoid def sigmoid(z):return 1.0 / (1 + np.exp(-z))# datas NxD # labs Nx1 # w Dx1# 權重更新 def weight_update(datas, labs, w, alpha=0.01):z = np.dot(datas, w) # Nx1,神經元接受的數據綜合體，有N個數據h = sigmoid(z) # Nx1，激活函數的值，0-1之間，相對于預測值Error = labs - h # Nx1，預測值與實際值的誤差，Y-hw = w + alpha * np.dot(datas.T, Error)return w#進行訓練，求解參數，非隨機梯度下降 def train_LR(datas, labs, n_epoch=2, alpha=0.005):N, D = np.shape(datas)# datas NxDw = np.ones([D, 1]) # Dx1，給權重賦初始值，都是1# 進行n_epoch輪迭代for i in range(n_epoch):w = weight_update(datas, labs, w, alpha)error_rate = test_accuracy(datas, labs, w)#計算誤差率print("epoch %d error %.3f%%" % (i, error_rate * 100))return w# 隨機梯度下降，帶batchsize的，可以使更快的找到導數為0的點，而不會在此左右徘徊，alpha也不能取太大，導致在導數為0的點處左右徘徊 def train_LR_batch(datas, labs, batchsize, n_epoch=2, alpha=0.005):N, D = np.shape(datas)# weight 初始化w = np.ones([D, 1]) # Dx1N_batch = N // batchsizefor i in range(n_epoch):# 數據打亂rand_index = np.random.permutation(N).tolist()# 每個batch 更新一下weightfor j in range(N_batch):# alpha = 4.0/(i+j+1) +0.01index = rand_index[j * batchsize:(j + 1) * batchsize]batch_datas = datas[index]batch_labs = labs[index]w = weight_update(batch_datas, batch_labs, w, alpha)error = test_accuracy(datas, labs, w)print("epoch %d 誤差率 %.2f%%" % (i, error * 100))return w# 測試精確性，與模型無關，用于提醒展示效果，作用是計算誤差率 def test_accuracy(datas, labs, w):N, D = np.shape(datas)z = np.dot(datas, w) # Nx1h = sigmoid(z) # Nx1lab_det = (h > 0.5).astype(np.float)error_rate = np.sum(np.abs(labs - lab_det)) / Nreturn error_rate# 畫圖，直觀地表示出結果 def draw_desion_line(datas, labs, w, name="0.jpg"):dic_colors = {0: (.8, 0, 0), 1: (0, .8, 0)}# 畫數據點for i in range(2):index = np.where(labs == i)[0]sub_datas = datas[index]plt.scatter(sub_datas[:, 1], sub_datas[:, 2], s=16., color=dic_colors[i])# 畫判決線min_x = np.min(datas[:, 1])max_x = np.max(datas[:, 1])w = w[:, 0]x = np.arange(min_x, max_x, 0.01)y = -(x * w[1] + w[0]) / w[2]plt.plot(x, y)plt.savefig(name)#加載數據集進行訓練或者加載測試數據對模型進行檢驗 def load_dataset(file):with open(file, "r", encoding="utf-8") as f:lines = f.read().splitlines()# 取 lab 維度為 N x 1labs = [line.split("\t")[-1] for line in lines]labs = np.array(labs).astype(np.float32)labs = np.expand_dims(labs, axis=-1) # Nx1# 取數據增加一維全是1的特征datas = [line.split("\t")[:-1] for line in lines]datas = np.array(datas).astype(np.float32)N, D = np.shape(datas)# 增加一個維度datas = np.c_[np.ones([N, 1]), datas]return datas, labsif __name__ == "__main__":# 加載數據file = "1.txt"datas, labs = load_dataset(file)weights = train_LR_batch(datas, labs,batchsize=2, alpha=0.001, n_epoch=800)print('w1是{},w2是{},b是{}'.format(weights[0][0],weights[1][0],weights[2][0]))#回歸直線就是w1x+w2y+b=0 draw_desion_line(datas, labs, weights, name="test_1.jpg")#將結果保存為jpg文件

三，適用條件

? ? ? ? 多用于二維平面點集的分類，當數據集維度過大，邏輯回歸的效果并不好，無法使交叉熵趨于0。

總結

以上是生活随笔為你收集整理的二元逻辑回归(logistic regression)的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： IRAT Iu Handover (UT
下一篇：一阶电路的时域分析

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

二元逻辑回归(logistic regression)

一，原理

二，python代碼

2.1 數據集的格式

2.2 代碼

三，適用條件

總結