日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

knn算法python理解与预测_理解KNN算法

發布時間:2023/12/10 python 22 豆豆
生活随笔 收集整理的這篇文章主要介紹了 knn算法python理解与预测_理解KNN算法 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

KNN主要包括訓練過程和分類過程。在訓練過程上,需要將訓練集存儲起來。在分類過程中,將測試集和訓練集中的每一張圖片去比較,選取差別最小的那張圖片。

如果數據集多,就把訓練集分成兩部分,一小部分作為驗證集(假的測試集),剩下的都為訓練集(一般來說是70%-90%,具體多少取決于需要調整的超參數的多少,如果超參數多,驗證集占比就更大一點)。驗證集的好處是用來調節超參數,如果數據集不多,使用交叉驗證的方法來調節參數。但是交叉驗證的代價比較高,K折交叉驗證,K越大越好,但是代價也更高。

決策分類

明確K個鄰居中所有數據類別的個數,將測試數據劃分給個數最多的那一類。即由輸入實例的 K 個最臨近的訓練實例中的多數類決定輸入實例的類別。

常用決策規則:

多數表決法:多數表決法和我們日常生活中的投票表決是一樣的,少數服從多數,是最常用的一種方法。

加權表決法:有些情況下會使用到加權表決法,比如投票的時候裁判投票的權重更大,而一般人的權重較小。所以在數據之間有權重的情況下,一般采用加權表決法。

優點:

所選擇的鄰居都是已經正確分類的對象

KNN算法本身比較簡單,分類器不需要使用訓練集進行訓練,訓練時間復雜度為0。本算法分類的復雜度與訓練集中數據的個數成正比。

對于類域的交叉或重疊較多的待分類樣本,KNN算法比其他方法跟合適。

缺點:

當樣本分布不平衡時,很難做到正確分類

計算量較大,因為每次都要計算測試數據到全部數據的距離。

python代碼實現:

import numpy as np

class kNearestNeighbor:

def init(self):

pass

def train(self, X, y):

self.Xtr = X

self.ytr = y

def predict(self, X, k=1):

num_test = X.shape[0]

Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

for i in range(num_test):

distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)

closest_y = y_train[np.argsort(distances)[:k]]

u, indices = np.unique(closest_y, return_inverse=True)

Ypred[i] = u[np.argmax(np.bincount(indices))]

return Ypred

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

load_CIFAR_batch()和load_CIFAR10()是用來加載CIFAR-10數據集的

import pickle

def load_CIFAR_batch(filename):

“”" load single batch of cifar “”"

with open(filename, ‘rb’) as f:

datadict = pickle.load(f, encoding=‘latin1’)

X = datadict[‘data’]

Y = datadict[‘labels’]

X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype(“float”)

Y = np.array(Y)

return X, Y

1

2

3

4

5

6

7

8

9

10

import os

def load_CIFAR10(ROOT):

“”" load all of cifar “”"

xs = []

ys = []

for b in range(1,6):

f = os.path.join(ROOT, ‘data_batch_%d’ %(b))

X, Y = load_CIFAR_batch(f)

xs.append(X)

ys.append(Y)

Xtr = np.concatenate(xs) #使變成行向量

Ytr = np.concatenate(ys)

del X,Y

Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, ‘test_batch’))

return Xtr, Ytr, Xte, Yte

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Xtr, Ytr, Xte, Yte = load_CIFAR10(‘cifar10’)

Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3)

Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3)

1

2

3

#由于數據集稍微有點大,在電腦上跑的很慢,所以取訓練集5000個,測試集500個

num_training = 5000

num_test = 500

x_train = Xtr_rows[:num_training, :]

y_train = Ytr[:num_training]

x_test = Xte_rows[:num_test, :]

y_test = Yte[:num_test]

1

2

3

4

5

6

7

8

9

knn = kNearestNeighbor()

knn.train(x_train, y_train)

y_predict = knn.predict(x_test, k=7)

acc = np.mean(y_predict == y_test)

print(‘accuracy : %f’ %(acc))

1

2

3

4

5

accuracy : 0.302000

1

#k值取什么最后的效果會更好呢?可以使用交叉驗證的方法,這里使用的是5折交叉驗證

num_folds = 5

k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]

x_train_folds = np.array_split(x_train, num_folds)

y_train_folds = np.array_split(y_train, num_folds)

k_to_accuracies = {}

for k_val in k_choices:

print('k = ’ + str(k_val))

k_to_accuracies[k_val] = []

for i in range(num_folds):

x_train_cycle = np.concatenate([f for j,f in enumerate (x_train_folds) if j!=i])

y_train_cycle = np.concatenate([f for j,f in enumerate (y_train_folds) if j!=i])

x_val_cycle = x_train_folds[i]

y_val_cycle = y_train_folds[i]

knn = kNearestNeighbor()

knn.train(x_train_cycle, y_train_cycle)

y_val_pred = knn.predict(x_val_cycle, k_val)

num_correct = np.sum(y_val_cycle == y_val_pred)

k_to_accuracies[k_val].append(float(num_correct) / float(len(y_val_cycle)))

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

k = 1

k = 3

k = 5

k = 8

k = 10

k = 12

k = 15

k = 20

k = 50

k = 100

1

2

3

4

5

6

7

8

9

10

for k in sorted(k_to_accuracies):

for accuracy in k_to_accuracies[k]:

print(‘k = %d, accuracy = %f’ % (int(k), accuracy))

1

2

3

k = 1, accuracy = 0.098000

k = 1, accuracy = 0.148000

k = 1, accuracy = 0.205000

k = 1, accuracy = 0.233000

k = 1, accuracy = 0.308000

k = 3, accuracy = 0.089000

k = 3, accuracy = 0.142000

k = 3, accuracy = 0.215000

k = 3, accuracy = 0.251000

k = 3, accuracy = 0.296000

k = 5, accuracy = 0.096000

k = 5, accuracy = 0.176000

k = 5, accuracy = 0.240000

k = 5, accuracy = 0.284000

k = 5, accuracy = 0.309000

k = 8, accuracy = 0.100000

k = 8, accuracy = 0.175000

k = 8, accuracy = 0.263000

k = 8, accuracy = 0.289000

k = 8, accuracy = 0.310000

k = 10, accuracy = 0.099000

k = 10, accuracy = 0.174000

k = 10, accuracy = 0.264000

k = 10, accuracy = 0.318000

k = 10, accuracy = 0.313000

k = 12, accuracy = 0.100000

k = 12, accuracy = 0.192000

k = 12, accuracy = 0.261000

k = 12, accuracy = 0.316000

k = 12, accuracy = 0.318000

k = 15, accuracy = 0.087000

k = 15, accuracy = 0.197000

k = 15, accuracy = 0.255000

k = 15, accuracy = 0.322000

k = 15, accuracy = 0.321000

k = 20, accuracy = 0.089000

k = 20, accuracy = 0.225000

k = 20, accuracy = 0.270000

k = 20, accuracy = 0.319000

k = 20, accuracy = 0.306000

k = 50, accuracy = 0.079000

k = 50, accuracy = 0.248000

k = 50, accuracy = 0.278000

k = 50, accuracy = 0.287000

k = 50, accuracy = 0.293000

k = 100, accuracy = 0.075000

k = 100, accuracy = 0.246000

k = 100, accuracy = 0.275000

k = 100, accuracy = 0.284000

k = 100, accuracy = 0.277000

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

可視化交叉驗證的結果

import matplotlib.pyplot as plt

plt.rcParams[‘figure.figsize’] = (10.0, 8.0)

plt.rcParams[‘image.interpolation’] = ‘nearest’

plt.rcParams[‘image.cmap’] = ‘gray’

1

2

3

4

5

for k in k_choices:

accuracies = k_to_accuracies[k]

plt.scatter([k] * len(accuracies), accuracies)

accuracies_mean = np.array([np.mean(v) for k,v in sorted(k_to_accuracies.items())])

accuracies_std = np.array([np.std(v) for k,v in sorted(k_to_accuracies.items())])

plt.errorbar(k_choices, accuracies_mean, yerr=accuracies_std)

plt.title(‘Cross-validation on k’)

plt.xlabel(‘k’)

plt.ylabel(‘Cross-validation accuracy’)

plt.show()

1

2

3

4

5

6

7

8

9

10

11

總結

以上是生活随笔為你收集整理的knn算法python理解与预测_理解KNN算法的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。