當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

ML之sklearn：sklearn的make_pipeline函数、RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略

發布時間：2025/3/21 编程问答 59 豆豆

生活随笔收集整理的這篇文章主要介紹了 ML之sklearn：sklearn的make_pipeline函数、RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

ML之sklearn：sklearn的make_pipeline函數、RobustScaler函數、KFold函數、cross_val_score函數的代碼解釋、使用方法之詳細攻略

sklearn的make_pipeline函數的代碼解釋、使用方法

sklearn的make_pipeline函數的代碼解釋

sklearn的make_pipeline函數的使用方法

1、使用Pipeline類來表示在使用MinMaxScaler縮放數據之后再訓練一個SVM的工作流程

2、make_pipeline函數創建管道

sklearn的RobustScaler函數的代碼解釋、使用方法

RobustScaler函數的代碼解釋

RobustScaler函數的使用方法

sklearn的KFold函數的代碼解釋、使用方法

KFold函數的代碼解釋

KFold函數的使用方法

sklearn的cross_val_score函數的代碼解釋、使用方法

cross_val_score函數的代碼解釋

scoring參數可選的對象

cross_val_score函數的使用方法

1、分類預測——糖尿病

2、分類預測——iris鳶尾花

sklearn的make_pipeline函數的代碼解釋、使用方法

? ? ? ? ? 為了簡化構建變換和模型鏈的過程，Scikit-Learn提供了pipeline類，可以將多個處理步驟合并為單個Scikit-Learn估計器。pipeline類本身具有fit、predict和score方法，其行為與Scikit-Learn中的其他模型相同。

sklearn的make_pipeline函數的代碼解釋

def make_pipeline(*steps, **kwargs):
? ? """Construct a Pipeline from the given estimators.

? ? This is a shorthand for the Pipeline constructor; it does not require, and?does not permit, naming the estimators. Instead, their names will be set??to the lowercase of their types automatically.

? ? Parameters
? ? ----------
? ? *steps : list of estimators,

? ? memory : None, str or object with the joblib.Memory interface, optional
? ? ? ? Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of??the transformers before fitting. Therefore, the transformer??instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to??inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.

根據給定的估算器構造一條管道。

這是管道構造函數的簡寫;它不需要，也不允許命名估算器。相反，它們的名稱將自動設置為類型的小寫。

參數

????----------

*steps :評估表、

memory:無，str或帶有joblib的對象。內存接口,可選

用于緩存安裝在管道中的變壓器。默認情況下，不執行緩存。如果給定一個字符串，它就是到緩存目錄的路徑。啟用緩存會在安裝前觸發變壓器的克隆。因此，給管線的變壓器實例不能直接檢查。使用屬性' ' named_steps ' ' '或' ' steps ' '檢查管道中的評估器。當裝配耗時時，緩存變壓器是有利的。

? ? Examples
? ? --------
? ? >>> from sklearn.naive_bayes import GaussianNB
? ? >>> from sklearn.preprocessing import StandardScaler
? ? >>> make_pipeline(StandardScaler(), GaussianNB(priors=None))
? ? ... ? ? # doctest: +NORMALIZE_WHITESPACE
? ? Pipeline(memory=None,
? ? ? ? ? ? ?steps=[('standardscaler',
? ? ? ? ? ? ? ? ? ? ?StandardScaler(copy=True, with_mean=True, with_std=True)),
? ? ? ? ? ? ? ? ? ? ('gaussiannb', GaussianNB(priors=None))])

? ? Returns
? ? -------
? ? p : Pipeline
? ? """
? ? memory = kwargs.pop('memory', None)
? ? if kwargs:
? ? ? ? raise TypeError('Unknown keyword arguments: "{}"'
? ? ? ? ? ? ? ? ? ? ? ? .format(list(kwargs.keys())[0]))
? ? return Pipeline(_name_estimators(steps), memory=memory)

sklearn的make_pipeline函數的使用方法

Examples-------->>> from sklearn.naive_bayes import GaussianNB>>> from sklearn.preprocessing import StandardScaler>>> make_pipeline(StandardScaler(), GaussianNB(priors=None))... # doctest: +NORMALIZE_WHITESPACEPipeline(memory=None,steps=[('standardscaler',StandardScaler(copy=True, with_mean=True, with_std=True)),('gaussiannb', GaussianNB(priors=None))])Returns-------p : Pipeline

1、使用Pipeline類來表示在使用MinMaxScaler縮放數據之后再訓練一個SVM的工作流程

from sklearn.pipeline import Pipeline pipe = Pipeline([("scaler",MinMaxScaler()),("svm",SVC())]) pip.fit(X_train,y_train) pip.score(X_test,y_test)

2、make_pipeline函數創建管道

用Pipeline類構建管道時語法有點麻煩，我們通常不需要為每一個步驟提供用戶指定的名稱，這種情況下，就可以用make_pipeline函數創建管道，它可以為我們創建管道并根據每個步驟所屬的類為其自動命名。

from sklearn.pipeline import make_pipeline pipe = make_pipeline(MinMaxScaler(),SVC())

參考文章
《Python機器學習基礎教程》構建管道(make_pipeline)
Python?sklearn.pipeline.make_pipeline()?Examples

sklearn的RobustScaler函數的代碼解釋、使用方法

RobustScaler函數的代碼解釋

class RobustScaler(BaseEstimator, TransformerMixin):
? ? """Scale features using statistics that are robust to outliers.

? ? This Scaler removes the median and scales the data according to??the quantile range (defaults to IQR: Interquartile Range).
? ? The IQR is the range between the 1st quartile (25th quantile)??and the 3rd quartile (75th quantile). Centering and scaling happen independently on each feature (or each sample, depending on the ``axis`` argument) by computing the relevant statistics on the samples in the training set. Median and ?interquartile? range are then stored to be used on later data using the ``transform`` method.

? ? Standardization of a dataset is a common requirement for many??machine learning estimators. Typically this is done by removing the mean and scaling to unit variance. However, outliers can often influence the sample mean / variance in a negative way. In such cases, the median and??the interquartile range often give better results.

? ? .. versionadded:: 0.17

? ? Read more in the :ref:`User Guide <preprocessing_scaler>`.

? ? Parameters
? ? ----------
? ? with_centering : boolean, True by default
? ? ? ? If True, center the data before scaling.??This will cause ``transform`` to raise an exception when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in??memory.

? ? with_scaling : boolean, True by default
? ? ? ? If True, scale the data to interquartile range.

? ? quantile_range : tuple (q_min, q_max), 0.0 < q_min < q_max < 100.0
? ? ? ? Default: (25.0, 75.0) = (1st quantile, 3rd quantile) = IQR
? ? ? ? Quantile range used to calculate ``scale_``.

? ? ? ? .. versionadded:: 0.18

? ? copy : boolean, optional, default is True
? ? ? ? If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be
? ? ? ? returned.

? ? Attributes
? ? ----------
? ? center_ : array of floats
? ? ? ? The median value for each feature in the training set.

? ? scale_ : array of floats
? ? ? ? The (scaled) interquartile range for each feature in the training set.

? ? ? ? .. versionadded:: 0.17
? ? ? ? ? ?*scale_* attribute.

? ? See also
? ? --------
? ? robust_scale: Equivalent function without the estimator API.

? ? :class:`sklearn.decomposition.PCA`
? ? ? ? Further removes the linear correlation across features with?? 'whiten=True'.

? ? Notes
? ? -----
? ? For a comparison of the different scalers, transformers, and normalizers, see :ref:`examples/preprocessing/plot_all_scaling.py
? ? <sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>`.

? ? https://en.wikipedia.org/wiki/Median_(statistics)
? ? https://en.wikipedia.org/wiki/Interquartile_range
? ? """

使用對離群值穩健的統計數據來衡量特征。

這個標量去除中值，并根據分位數范圍(默認為IQR:四分位數范圍)對數據進行縮放。
IQR是第1個四分位數(第25分位數)和第3個四分位數(第75分位數)之間的范圍。通過計算訓練集中樣本的相關統計數據，在每個特征(或每個樣本，取決于“軸”參數)上獨立地進行定心和縮放。然后，中值和四分位范圍被存儲起來，以便使用“變換”方法在以后的數據中使用。

數據集的標準化是許多機器學習估計器的常見需求。這通常是通過去除平均值和縮放到單位方差來實現的。然而，異常值往往會對樣本均值/方差產生負面影響。在這種情況下，中位數和四分位范圍通常會得到更好的結果。

. .versionadded:: 0.17

詳見:ref: ' User Guide ?'。</preprocessing_scaler>

參數
?----------
with_center:boolean，默認為True
如果為真，在縮放前將數據居中。這將導致“轉換”在嘗試處理稀疏矩陣時引發異常，因為圍繞它們需要構建一個密集的矩陣，在常見的用例中，這個矩陣可能太大而無法裝入內存。

with_scaling:布爾值，默認為True
如果為真，將數據縮放到四分位范圍。

quantile_range:元組(q_min, q_max)， 0.0 < q_min < q_max < 100.0
默認:(25.0,75.0)=(第1分位數，第3分位數)= IQR
用于計算' ' scale_ ' '的分位數范圍。

. .versionadded:: 0.18

布爾值，可選，默認為真
如果為False，則盡量避免復制，而改為就地縮放。這并不能保證總是有效的;例如，如果數據不是一個NumPy數組或scipy。稀疏CSR矩陣，仍可復制
返回。

屬性
----------
浮點數數組
訓練集中每個特征的中值。

浮點數數組
訓練集中每個特征的(縮放的)四分位范圍。

. .versionadded:: 0.17
* scale_ *屬性。

另請參閱
?--------
沒有estimator API的等價函數。

類:“sklearn.decomposition.PCA”
進一步消除了“whiten=True”特征之間的線性相關性。

筆記
-----
對于不同的標量、轉換器和規整器的比較，請參見:ref: ' examples/preprocessing/ plot_all_scale .py
< sphx_glr_auto_examples_preprocessing_plot_all_scaling.py >”。

https://en.wikipedia.org/wiki/Median_(統計)
https://en.wikipedia.org/wiki/Interquartile_range
”“”

? ? def __init__(self, with_centering=True, with_scaling=True,
? ? ? ? ? ? ? ? ?quantile_range=(25.0, 75.0), copy=True):
? ? ? ? self.with_centering = with_centering
? ? ? ? self.with_scaling = with_scaling
? ? ? ? self.quantile_range = quantile_range
? ? ? ? self.copy = copy

? ? def _check_array(self, X, copy):
? ? ? ? """Makes sure centering is not enabled for sparse matrices."""
? ? ? ? X = check_array(X, accept_sparse=('csr', 'csc'), copy=self.copy,
? ? ? ? ? ? ? ? ? ? ? ? estimator=self, dtype=FLOAT_DTYPES)

? ? ? ? if sparse.issparse(X):
? ? ? ? ? ? if self.with_centering:
? ? ? ? ? ? ? ? raise ValueError(
? ? ? ? ? ? ? ? ? ? "Cannot center sparse matrices: use `with_centering=False`"
? ? ? ? ? ? ? ? ? ? " instead. See docstring for motivation and alternatives.")
? ? ? ? return X

? ? def fit(self, X, y=None):
? ? ? ? """Compute the median and quantiles to be used for scaling.

? ? ? ? Parameters
? ? ? ? ----------
? ? ? ? X : array-like, shape [n_samples, n_features]
? ? ? ? ? ? The data used to compute the median and quantiles
? ? ? ? ? ? used for later scaling along the features axis.
? ? ? ? """
? ? ? ? if sparse.issparse(X):
? ? ? ? ? ? raise TypeError("RobustScaler cannot be fitted on sparse inputs")
? ? ? ? X = self._check_array(X, self.copy)
? ? ? ? if self.with_centering:
? ? ? ? ? ? self.center_ = np.median(X, axis=0)

? ? ? ? if self.with_scaling:
? ? ? ? ? ? q_min, q_max = self.quantile_range
? ? ? ? ? ? if not 0 <= q_min <= q_max <= 100:
? ? ? ? ? ? ? ? raise ValueError("Invalid quantile range: %s" %
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?str(self.quantile_range))

? ? ? ? ? ? q = np.percentile(X, self.quantile_range, axis=0)
? ? ? ? ? ? self.scale_ = (q[1] - q[0])
? ? ? ? ? ? self.scale_ = _handle_zeros_in_scale(self.scale_, copy=False)
? ? ? ? return self

? ? def transform(self, X):
? ? ? ? """Center and scale the data.

? ? ? ? Can be called on sparse input, provided that ``RobustScaler`` has been
? ? ? ? fitted to dense input and ``with_centering=False``.

? ? ? ? Parameters
? ? ? ? ----------
? ? ? ? X : {array-like, sparse matrix}
? ? ? ? ? ? The data used to scale along the specified axis.
? ? ? ? """
? ? ? ? if self.with_centering:
? ? ? ? ? ? check_is_fitted(self, 'center_')
? ? ? ? if self.with_scaling:
? ? ? ? ? ? check_is_fitted(self, 'scale_')
? ? ? ? X = self._check_array(X, self.copy)

? ? ? ? if sparse.issparse(X):
? ? ? ? ? ? if self.with_scaling:
? ? ? ? ? ? ? ? inplace_column_scale(X, 1.0 / self.scale_)
? ? ? ? else:
? ? ? ? ? ? if self.with_centering:
? ? ? ? ? ? ? ? X -= self.center_
? ? ? ? ? ? if self.with_scaling:
? ? ? ? ? ? ? ? X /= self.scale_
? ? ? ? return X

? ? def inverse_transform(self, X):
? ? ? ? """Scale back the data to the original representation

? ? ? ? Parameters
? ? ? ? ----------
? ? ? ? X : array-like
? ? ? ? ? ? The data used to scale along the specified axis.
? ? ? ? """
? ? ? ? if self.with_centering:
? ? ? ? ? ? check_is_fitted(self, 'center_')
? ? ? ? if self.with_scaling:
? ? ? ? ? ? check_is_fitted(self, 'scale_')
? ? ? ? X = self._check_array(X, self.copy)

? ? ? ? if sparse.issparse(X):
? ? ? ? ? ? if self.with_scaling:
? ? ? ? ? ? ? ? inplace_column_scale(X, self.scale_)
? ? ? ? else:
? ? ? ? ? ? if self.with_scaling:
? ? ? ? ? ? ? ? X *= self.scale_
? ? ? ? ? ? if self.with_centering:
? ? ? ? ? ? ? ? X += self.center_
? ? ? ? return X

RobustScaler函數的使用方法

lasso = make_pipeline(RobustScaler(), Lasso(alpha =0.5, random_state=1)) ENet = make_pipeline(RobustScaler(), ElasticNet(alpha=0.5, l1_ratio=.9, random_state=3))

sklearn的KFold函數的代碼解釋、使用方法

KFold函數的代碼解釋

class KFold Found at: sklearn.model_selection._split

class KFold(_BaseKFold):
? ? """K-Folds cross-validator
? ? Provides train/test indices to split data in train/test sets. Split??dataset into k consecutive folds (without shuffling by default).
? ? Each fold is then used once as a validation while the k - 1 remaining??folds form the training set.
? ? Read more in the :ref:`User Guide <cross_validation>`.?
? ? Parameters
? ? ----------
? ? n_splits : int, default=3
? ? Number of folds. Must be at least 2.
? ??
? ? shuffle : boolean, optional
? ? Whether to shuffle the data before splitting into batches.
? ??
? ? random_state : int, RandomState instance or None, optional,?
? ? ?default=None
? ? If int, random_state is the seed used by the random number? generator;
? ? If RandomState instance, random_state is the random number? generator;
? ? If None, the random number generator is the RandomState? instance used??by `np.random`. Used when ``shuffle`` == True.

在:sklearn.model_select ._split找到的類KFold

類KFold (_BaseKFold):
”““K-Folds cross-validator
提供訓練/測試索引來分割訓練/測試集中的數據。將數據集分割成k個連續的折疊(默認情況下沒有洗牌)。
然后，每條折疊使用一次作為驗證，而k - 1條剩余折疊形成訓練集。
更多信息參見:ref: ' User Guide <cross_validation> '。</cross_validation>
參數
----------
n_splits :int，默認=3
折疊的數量。必須至少是2。

shuffle :布爾型，可選
在分割成批之前是否打亂數據。

random_state :int, RandomState實例或None，可選，
默認=沒有
如果int, random_state是隨機數生成器使用的種子;
如果是RandomState實例，則random_state為隨機數生成器;
如果沒有，隨機數生成器是“np.random”使用的RandomState實例。當' ' shuffle ' == True時使用。

? ? Examples
? ? --------
? ? >>> from sklearn.model_selection import KFold
? ? >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
? ? >>> y = np.array([1, 2, 3, 4])
? ? >>> kf = KFold(n_splits=2)
? ? >>> kf.get_n_splits(X)
? ? 2
? ? >>> print(kf) ?# doctest: +NORMALIZE_WHITESPACE
? ? KFold(n_splits=2, random_state=None, shuffle=False)
? ? >>> for train_index, test_index in kf.split(X):
? ? ... ? ?print("TRAIN:", train_index, "TEST:", test_index)
? ? ... ? ?X_train, X_test = X[train_index], X[test_index]
? ? ... ? ?y_train, y_test = y[train_index], y[test_index]
? ? TRAIN: [2 3] TEST: [0 1]
? ? TRAIN: [0 1] TEST: [2 3]
? ??
? ? Notes
? ? -----
? ? The first ``n_samples % n_splits`` folds have size
? ? ``n_samples // n_splits + 1``, other folds have size
? ? ``n_samples // n_splits``, where ``n_samples`` is the number of?
? ? ?samples.
? ??
? ? See also
? ? --------
? ? StratifiedKFold
? ? Takes group information into account to avoid building folds with? imbalanced class distributions (for binary or multiclass??classification tasks).
? ? GroupKFold: K-fold iterator variant with non-overlapping groups.
? ? RepeatedKFold: Repeats K-Fold n times.
? ? """

另請參閱
--------
StratifiedKFold
考慮組信息，以避免構建不平衡的類分布的折疊(對于二進制或多類分類任務)。
GroupKFold:不重疊組的K-fold迭代器變體。
RepeatedKFold:重復K-Fold n次。
”“”

? ? def __init__(self, n_splits=3, shuffle=False,?
? ? ? ? random_state=None):
? ? ? ? super(KFold, self).__init__(n_splits, shuffle, random_state)
? ??
? ? def _iter_test_indices(self, X, y=None, groups=None):
? ? ? ? n_samples = _num_samples(X)
? ? ? ? indices = np.arange(n_samples)
? ? ? ? if self.shuffle:
? ? ? ? ? ? check_random_state(self.random_state).shuffle(indices)
? ? ? ? n_splits = self.n_splits
? ? ? ? fold_sizes = (n_samples // n_splits) * np.ones(n_splits, dtype=np.
? ? ? ? ?int)
? ? ? ? fold_sizes[:n_samples % n_splits] += 1
? ? ? ? current = 0
? ? ? ? for fold_size in fold_sizes:
? ? ? ? ? ? start, stop = current, current + fold_size
? ? ? ? ? ? yield indices[start:stop]
? ? ? ? ? ? current = stop

KFold函數的使用方法

? ? Examples-------->>> from sklearn.model_selection import KFold>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])>>> y = np.array([1, 2, 3, 4])>>> kf = KFold(n_splits=2)>>> kf.get_n_splits(X)2>>> print(kf) ?# doctest: +NORMALIZE_WHITESPACEKFold(n_splits=2, random_state=None, shuffle=False)>>> for train_index, test_index in kf.split(X):... ? ?print("TRAIN:", train_index, "TEST:", test_index)... ? ?X_train, X_test = X[train_index], X[test_index]... ? ?y_train, y_test = y[train_index], y[test_index]TRAIN: [2 3] TEST: [0 1]TRAIN: [0 1] TEST: [2 3]

sklearn的cross_val_score函數的代碼解釋、使用方法

cross_val_score函數的代碼解釋

def cross_val_score Found at: sklearn.model_selection._validation

def cross_val_score(estimator, X, y=None, groups=None, scoring=None, cv=None,? n_jobs=1, verbose=0, fit_params=None,???pre_dispatch='2*n_jobs'):
? ? """Evaluate a score by cross-validation
? ? Read more in the :ref:`User Guide <cross_validation>`.

通過交叉驗證來評估一個分數

更多信息參見:ref: ' User Guide '。

? Parameters
? ? ----------
? ? estimator : estimator object implementing 'fit' The object to use to fit the data.
? ??
? ? X : array-like
? ? The data to fit. Can be for example a list, or an array.
? ??
? ? y : array-like, optional, default: None
? ? The target variable to try to predict in the case of??supervised learning.
? ??
? ? groups : array-like, with shape (n_samples,), optional
? ? Group labels for the samples used while splitting the dataset into??train/test set.
? ??
? ? scoring : string, callable or None, optional, default: None
? ? A string (see model evaluation documentation) or a scorer callable object / function with signature??``scorer(estimator, X, y)``.
? ??
? ? cv : int, cross-validation generator or an iterable, optional
? ? Determines the cross-validation splitting strategy.
? ? Possible inputs for cv are:
? ? - None, to use the default 3-fold cross validation,
? ? - integer, to specify the number of folds in a `(Stratified)KFold`,
? ? - An object to be used as a cross-validation generator.
? ? - An iterable yielding train, test splits.
? ? For integer/None inputs, if the estimator is a classifier and ``y`` is??either binary or multiclass, :class:`StratifiedKFold` is used. In all??other cases, :class:`KFold` is used.
? ??
? ? Refer :ref:`User Guide <cross_validation>` for the various cross-validation strategies that can be used here.
? ??
? ? n_jobs : integer, optional
? ? The number of CPUs to use to do the computation. -1 means?? 'all CPUs'.
? ??
? ? verbose : integer, optional
? ? The verbosity level.
? ??
? ? fit_params : dict, optional
? ? Parameters to pass to the fit method of the estimator.
? ??
? ? pre_dispatch : int, or string, optional
? ? Controls the number of jobs that get dispatched during parallel??execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched??than CPUs can process. This parameter can be:
? ??
? ? - None, in which case all the jobs are immediately??created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand?spawning of the jobs
? ? - An int, giving the exact number of total jobs that are spawned
? ? - A string, giving an expression as a function of n_jobs, as in '2*n_jobs'
? ??
? ? Returns
? ? -------
? ? scores : array of float, shape=(len(list(cv)),)
? ? Array of scores of the estimator for each run of the cross validation.

參數

????----------

estimator:實現“適合”對象以適合數據。

????

X:數組類

需要匹配的數據。可以是列表，也可以是數組。

????

y : 類似數組，可選，默認:無

在監督學習的情況下，預測的目標變量。

????

groups : 類數組，形狀(n_samples，)，可選

將數據集分割為訓練/測試集時使用的樣本的標簽分組。

????

scoring : 字符串，可調用或無，可選，默認:無

一個字符串(參見模型評估文檔)或簽名為' ' scorer(estimator, X, y) ' '的scorer可調用對象/函數。

????

cv : int，交叉驗證生成器或可迭代，可選

確定交叉驗證分割策略。

cv可能的輸入有:

-無，使用默認的三折交叉驗證，

-整數，用于指定“(分層的)KFold”中的折疊數，

-用作交叉驗證生成器的對象。

-一個可迭代產生的序列，測試分裂。

對于整數/無輸入，如果估計器是一個分類器，并且' ' y ' '是二進制的或多類的，則使用:class: ' StratifiedKFold '。在所有其他情況下，使用:class: ' KFold '。

????

請參考:ref: ' User Guide '，了解可以在這里使用的各種交叉驗證策略。

????

n_jobs:整數，可選

用于進行計算的cpu數量。-1表示“所有cpu”。

????

verbose:整數，可選

冗長的水平。

????

fit_params :dict，可選

參數傳遞給估計器的擬合方法。

????

pre_dispatch: int或string，可選

控制并行執行期間分派的作業數量。當分配的作業多于cpu能夠處理的任務時，減少這個數量有助于避免內存消耗激增。該參數可以為:

-無，在這種情況下，立即創建并派生所有作業。將此用于輕量級和快速運行的作業，以避免由于按需生成作業而造成的延遲

-一個int，給出生成的作業的確切總數

一個字符串，給出一個作為n_jobs函數的表達式，如'2*n_jobs'

????

????-------

(len(list(cv))，)

交叉驗證的每次運行估計器的分數數組。

? ? Examples
? ? --------
? ? >>> from sklearn import datasets, linear_model
? ? >>> from sklearn.model_selection import cross_val_score
? ? >>> diabetes = datasets.load_diabetes()
? ? >>> X = diabetes.data[:150]
? ? >>> y = diabetes.target[:150]
? ? >>> lasso = linear_model.Lasso()
? ? >>> print(cross_val_score(lasso, X, y)) ?# doctest: +ELLIPSIS
? ? [ 0.33150734 ?0.08022311 ?0.03531764]
? ??
? ? See Also
? ? ---------
? ? :func:`sklearn.model_selection.cross_validate`:
? ? To run cross-validation on multiple metrics and also to return??train scores, fit times and score times.
? ??
? ? :func:`sklearn.metrics.make_scorer`:
? ? Make a scorer from a performance metric or loss function.
? ??
? ? """
? ? # To ensure multimetric format is not supported
? ? scorer = check_scoring(estimator, scoring=scoring)
? ? cv_results = cross_validate(estimator=estimator, X=X, y=y, groups=groups,?
? ? ? ? scoring={'score':scorer}, cv=cv,?
? ? ? ? return_train_score=False,?
? ? ? ? n_jobs=n_jobs, verbose=verbose,?
? ? ? ? fit_params=fit_params,?
? ? ? ? pre_dispatch=pre_dispatch)
? ? return cv_results['test_score']

另請參閱
---------
:func:“sklearn.model_selection.cross_validate”:
在多個指標上進行交叉驗證，并返回訓練分數、適應時間和得分時間。

:func:“sklearn.metrics.make_scorer”:
從性能度量或損失函數中制作一個記分員。

”“”
#以確保不支持多度量格式

scoring參數可選的對象

https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

Scoring

Function

Comment

Classification	?	?
‘accuracy’	metrics.accuracy_score	?
‘balanced_accuracy’	metrics.balanced_accuracy_score	?
‘average_precision’	metrics.average_precision_score	?
‘neg_brier_score’	metrics.brier_score_loss	?
‘f1’	metrics.f1_score	for binary targets
‘f1_micro’	metrics.f1_score	micro-averaged
‘f1_macro’	metrics.f1_score	macro-averaged
‘f1_weighted’	metrics.f1_score	weighted average
‘f1_samples’	metrics.f1_score	by multilabel sample
‘neg_log_loss’	metrics.log_loss	requires predict_proba support
‘precision’ etc.	metrics.precision_score	suffixes apply as with ‘f1’
‘recall’ etc.	metrics.recall_score	suffixes apply as with ‘f1’
‘jaccard’ etc.	metrics.jaccard_score	suffixes apply as with ‘f1’
‘roc_auc’	metrics.roc_auc_score	?
‘roc_auc_ovr’	metrics.roc_auc_score	?
‘roc_auc_ovo’	metrics.roc_auc_score	?
‘roc_auc_ovr_weighted’	metrics.roc_auc_score	?
‘roc_auc_ovo_weighted’	metrics.roc_auc_score	?
Clustering	?	?
‘adjusted_mutual_info_score’	metrics.adjusted_mutual_info_score	?
‘adjusted_rand_score’	metrics.adjusted_rand_score	?
‘completeness_score’	metrics.completeness_score	?
‘fowlkes_mallows_score’	metrics.fowlkes_mallows_score	?
‘homogeneity_score’	metrics.homogeneity_score	?
‘mutual_info_score’	metrics.mutual_info_score	?
‘normalized_mutual_info_score’	metrics.normalized_mutual_info_score	?
‘v_measure_score’	metrics.v_measure_score	?
Regression	?	?
‘explained_variance’	metrics.explained_variance_score	?
‘max_error’	metrics.max_error	?
‘neg_mean_absolute_error’	metrics.mean_absolute_error	?
‘neg_mean_squared_error’	metrics.mean_squared_error	?
‘neg_root_mean_squared_error’	metrics.mean_squared_error	?
‘neg_mean_squared_log_error’	metrics.mean_squared_log_error	?
‘neg_median_absolute_error’	metrics.median_absolute_error	?
‘r2’	metrics.r2_score	?
‘neg_mean_poisson_deviance’	metrics.mean_poisson_deviance	?
‘neg_mean_gamma_deviance’	metrics.mean_gamma_deviance

cross_val_score函數的使用方法

1、分類預測——糖尿病

? ? >>> from sklearn import datasets, linear_model>>> from sklearn.model_selection import cross_val_score>>> diabetes = datasets.load_diabetes()>>> X = diabetes.data[:150]>>> y = diabetes.target[:150]>>> lasso = linear_model.Lasso()>>> print(cross_val_score(lasso, X, y)) ?# doctest: +ELLIPSIS[ 0.33150734 ?0.08022311 ?0.03531764]

2、分類預測——iris鳶尾花

from sklearn import datasets #自帶數據集 from sklearn.model_selection import train_test_split,cross_val_score #劃分數據交叉驗證 from sklearn.neighbors import KNeighborsClassifier #一個簡單的模型，只有K一個參數，類似K-means import matplotlib.pyplot as plt iris = datasets.load_iris() #加載sklearn自帶的數據集 X = iris.data #這是數據 y = iris.target #這是每個數據所對應的標簽 train_X,test_X,train_y,test_y = train_test_split(X,y,test_size=1/3,random_state=3) #這里劃分數據以1/3的來劃分訓練集訓練結果測試集測試結果 k_range = range(1,31) cv_scores = [] #用來放每個模型的結果值 for n in k_range:knn = KNeighborsClassifier(n) #knn模型，這里一個超參數可以做預測，當多個超參數時需要使用另一種方法GridSearchCVscores = cross_val_score(knn,train_X,train_y,cv=10,scoring='accuracy') #cv：選擇每次測試折數 accuracy：評價指標是準確度,可以省略使用默認值，具體使用參考下面。cv_scores.append(scores.mean()) plt.plot(k_range,cv_scores) plt.xlabel('K') plt.ylabel('Accuracy') #通過圖像選擇最好的參數 plt.show() best_knn = KNeighborsClassifier(n_neighbors=3) # 選擇最優的K=3傳入模型 best_knn.fit(train_X,train_y) #訓練模型 print(best_knn.score(test_X,test_y)) #看看評分

總結

以上是生活随笔為你收集整理的ML之sklearn：sklearn的make_pipeline函数、RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： DataScience：初学者进阶数学处
下一篇： ML之FE：对爬取的某平台二手房数据进行