當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

SVM支持向量机--sklearn研究

發(fā)布時間：2025/4/16 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 SVM支持向量机--sklearn研究小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

支持向量機(jī)（SVM）是一組有監(jiān)督學(xué)習(xí)方法，被用于分類，回歸和邊界探測

支持向量機(jī)有以下的幾個優(yōu)點：

Effective in high dimensional spaces. 在高維空間有效性
Still effective in cases where number of dimensions is greater than the number of samples. 在維度數(shù)量大于樣本數(shù)量的時候仍然有效
Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient. 再決策函數(shù)上（支持向量）使用訓(xùn)練點的一個子集，因此內(nèi)存有效性（占用的空間小）
Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels. 多功能的：不同的核函數(shù)，可以被特別的用于決策函數(shù)，普通的核被提供，但是這仍然可能去特異化核函數(shù)。

支持向量機(jī)也有下面的這些缺點：

If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial. 如果特征的數(shù)量遠(yuǎn)比樣本的數(shù)量要大，選擇核函數(shù)和正則化余項在避免過擬合上是至關(guān)重要的。
SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation (see Scores and probabilities, below). 支持向量機(jī)們不直接提供概率估計，而是用耗費時間的5-fold 交叉驗證來計算的。

sklearn中的支持向量機(jī)同時支持dense（密集）和 sparse（稀疏）的樣本數(shù)據(jù)作為輸入。但是，如果是在稀疏的數(shù)據(jù)上做預(yù)測，那么一定也是要在稀疏的數(shù)據(jù)上做訓(xùn)練才行。

分類

SVC, NuSVC and LinearSVC are classes capable of performing multi-class classification on a dataset.

SVC，NuSVC 和LinearSVC是一類在數(shù)據(jù)集上有能力去實現(xiàn)多類分類的分類器。

SVC and NuSVC are similar methods, but accept slightly different sets of parameters and have different mathematical formulations (see section Mathematical formulation). On the other hand, LinearSVC is another implementation of Support Vector Classification for the case of a linear kernel.

SVC和NuSVC是比較類似的方法，但是在接受的參數(shù)上有輕微的不同，同時也有不同的數(shù)學(xué)表達(dá)式。
LinearSVC是對于另一種針對于線性核情況的支持向量機(jī)分類器的實現(xiàn)

注意： LinearSVC不接受key wordkernal，因為被假設(shè)為線性了的。同時相比于SVC和NuSVC也缺少了一些方法，例如：support_方法

SVC，NuSVC和LinearSVC都一樣，接受兩個輸入X（[n_samples, n_features]）和 y ([n_samples] )。前者表示樣本特征，后者表示樣本標(biāo)簽，用于訓(xùn)練。

簡單測試

>>> from sklearn import svm D:\SoftWare\Python\lib\site-packages\sklearn\externals\joblib\externals\cloudpickle\cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative usesimport imp >>> X = [[0, 1], [1, 0]] >>> y = [0, 1] >>> clf = svm.SVC(gamma='scale') >>> clf.fit(X, y) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',max_iter=-1, probability=False, random_state=None, shrinking=True,tol=0.001, verbose=False)

測試也很簡單

>>> data = [[0 for i in range(2)] for j in range(2)] >>> data [[0, 0], [0, 0]] >>> for i in range(2): ... for j in range(2): ... data[i][j] = clf.predict([[i , j]])[0] ... >>> data [[1, 0], [1, 1]]

多類分類

svc和NuSVC提供了一種“一對一”的方式來實現(xiàn)多類分類。如果n_class是類的數(shù)量的話，那么n_class * (n_class - 1) / 2 個分類器被建立，來不同的兩個類別之間的相互區(qū)分。

To provide a consistent interface with other classifiers, the decision_function_shape option allows to aggregate the results of the “one-against-one” classifiers to a decision function of shape (n_samples, n_classes):
為了提供一個和其他分類器一致性的接口，這這個 decision_function_shape 選項，允許去累積這個“1對1”的分類器們?nèi)ヒ粋€決策函數(shù)的shape

例如：

>>> X = [[0], [1], [2], [3]] >>> Y = [0, 1, 2, 3] >>> clf = svm.SVC(gamma='scale', decision_function_shape='ovo') >>> clf.fit(X, Y) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape='ovo', degree=3, gamma='scale', kernel='rbf',max_iter=-1, probability=False, random_state=None, shrinking=True,tol=0.001, verbose=False) >>> dec = clf.decision_function([[1]]) >>> dec.shape[1] 6 >>> clf.decision_function_shape = 'ovr' >>> dec = clf.decision_function([[1]]) >>> dec.shape[1] 4 >>> dec array([[ 1.95120255, 3.5 , 0.95120255, -0.4024051 ]]) >>> clf SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',max_iter=-1, probability=False, random_state=None, shrinking=True,tol=0.001, verbose=False)

On the other hand, LinearSVC implements “one-vs-the-rest” multi-class strategy, thus training n_class models. If there are only two classes, only one model is trained:
另一方面，LinearSVC 實現(xiàn)了一個“One-vs-the-rest”的多類分類方式，因此，訓(xùn)練n_class個模型。如果這只有兩個類別，那就只有一個模型被訓(xùn)練。（注意最后的這個強調(diào)，這是一種特殊的情況，因為one和the rest是重復(fù)的~）

非均衡的問題

In problems where it is desired to give more importance to certain classes or certain individual samples keywords class_weight and sample_weight can be used.

在某些問題上，需要更多的關(guān)注特定的類別，或者是特定的樣本個體。這時候，可以使用 class_weight 和 sample_weight

總結(jié)

以上是生活随笔為你收集整理的SVM支持向量机--sklearn研究的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：五种排序方式gif展示【python】
下一篇： numpy.core.umath fai