當(dāng)前位置：首頁(yè) > 编程语言 > python >内容正文

python

Python之 sklearn：sklearn中的RobustScaler 函数的简介及使用方法之详细攻略

發(fā)布時(shí)間：2025/3/21 python 34 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python之 sklearn：sklearn中的RobustScaler 函数的简介及使用方法之详细攻略小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Python之 sklearn：sklearn中的RobustScaler 函數(shù)的簡(jiǎn)介及使用方法之詳細(xì)攻略

sklearn中的RobustScaler 函數(shù)的簡(jiǎn)介及使用方法

? ? ? ? RobustScaler 函數(shù)使用對(duì)異常值魯棒的統(tǒng)計(jì)信息來(lái)縮放特征。這個(gè)標(biāo)量去除中值，并根據(jù)分位數(shù)范圍(默認(rèn)為IQR即四分位數(shù)范圍)對(duì)數(shù)據(jù)進(jìn)行縮放。IQR是第1個(gè)四分位數(shù)(第25分位數(shù))和第3個(gè)四分位數(shù)(第75分位數(shù))之間的范圍。通過計(jì)算訓(xùn)練集中樣本的相關(guān)統(tǒng)計(jì)量，對(duì)每個(gè)特征分別進(jìn)行定心和縮放。然后將中值和四分位范圍存儲(chǔ)起來(lái)，使用“變換”方法用于以后的數(shù)據(jù)。
? ? ? ? 數(shù)據(jù)集的標(biāo)準(zhǔn)化是許多機(jī)器學(xué)習(xí)估計(jì)器的常見需求。這通常是通過去除平均值和縮放到單位方差來(lái)實(shí)現(xiàn)的。然而，異常值往往會(huì)對(duì)樣本均值/方差產(chǎn)生負(fù)面影響。在這種情況下，中位數(shù)和四分位范圍通常會(huì)給出更好的結(jié)果。

class RobustScaler?Found at: sklearn.preprocessing._data

class RobustScaler(TransformerMixin, BaseEstimator):

????"""Scale features using statistics that are robust to outliers.

????This Scaler removes the median and scales the data according to?the quantile range (defaults to IQR: Interquartile Range).??The IQR is the range between the 1st quartile (25th quantile)?and the 3rd quartile (75th quantile).

????Centering and scaling happen independently on each feature by?computing the relevant statistics on the samples in the training?set. Median and interquartile range are then stored to be used on?later data using the ``transform`` method.

?Standardization of a dataset is a common requirement for many?machine learning estimators. Typically this is done by removing the mean?and scaling to unit variance. However, outliers can often influence the?sample mean / variance in a negative way. In such cases, the?median and?the interquartile range often give better results.

????.. versionadded:: 0.17

????

????Read more in the :ref:`User Guide <preprocessing_scaler>`.

使用對(duì)異常值魯棒的統(tǒng)計(jì)信息來(lái)縮放特征。

這個(gè)標(biāo)量去除中值，并根據(jù)分位數(shù)范圍(默認(rèn)為IQR即四分位數(shù)范圍)對(duì)數(shù)據(jù)進(jìn)行縮放。IQR是第1個(gè)四分位數(shù)(第25分位數(shù))和第3個(gè)四分位數(shù)(第75分位數(shù))之間的范圍。

通過計(jì)算訓(xùn)練集中樣本的相關(guān)統(tǒng)計(jì)量，對(duì)每個(gè)特征分別進(jìn)行定心和縮放。然后將中值和四分位范圍存儲(chǔ)起來(lái)，使用“變換”方法用于以后的數(shù)據(jù)。

數(shù)據(jù)集的標(biāo)準(zhǔn)化是許多機(jī)器學(xué)習(xí)估計(jì)器的常見需求。這通常是通過去除平均值和縮放到單位方差來(lái)實(shí)現(xiàn)的。然而，異常值往往會(huì)對(duì)樣本均值/方差產(chǎn)生負(fù)面影響。在這種情況下，中位數(shù)和四分位范圍通常會(huì)給出更好的結(jié)果。

. .versionadded:: 0.17

更多內(nèi)容見:ref: ' User Guide ?'。</preprocessing_scaler>

????Parameters

????----------

????with_centering : boolean, True by default. If True, center the data before scaling.?This will cause ``transform`` to raise an exception when?attempted on?sparse matrices, because centering them entails building a?dense?matrix which in common use cases is likely to be too large to fit?in?memory.

????with_scaling?: boolean, True by default. If True, scale the data to interquartile range.?quantile_range : tuple (q_min, q_max), 0.0 < q_min < q_max < ?100.0.?Default: (25.0, 75.0) = (1st quantile, 3rd quantile) = IQR. Quantile range used to calculate ``scale_``.

????

????.. versionadded:: 0.18

????

????copy : boolean, optional, default is True. If False, try to avoid a copy and do inplace scaling instead.?This is not guaranteed to always work inplace; e.g. if the data is?not a NumPy array or scipy.sparse CSR matrix, a copy may still be?returned.

????

????Attributes

????----------

????center_ : array of floats. The median value for each feature in the training set.

????scale_?: array of floats. The (scaled) interquartile range for each feature in the training?set.

參數(shù)

----------

with_centering : boolean類型，默認(rèn)為True。如果為真，在縮放前將數(shù)據(jù)居中。這將導(dǎo)致“轉(zhuǎn)換”在嘗試處理稀疏矩陣時(shí)引發(fā)異常，因?yàn)閲@它們需要構(gòu)建一個(gè)密集的矩陣，在常見的用例中，這個(gè)矩陣可能太大而無(wú)法裝入內(nèi)存。

with_scaling?: boolean類型，默認(rèn)為True。如果為真，將數(shù)據(jù)縮放到四分位范圍。quantile_range:元組(q_min, q_max)， 0.0 < q_min < q_max < 100.0。默認(rèn):(25.0,75.0)=(第1分位數(shù)，第3分位數(shù))= IQR。用于計(jì)算' ' scale_ ' '的分位數(shù)范圍。

. .versionadded:: 0.18

copy : boolean類型，可選，默認(rèn)為真。如果為False，則盡量避免復(fù)制，而改為就地縮放。這并不能保證總是有效的;例如，如果數(shù)據(jù)不是一個(gè)NumPy數(shù)組或scipy。稀疏CSR矩陣，仍可返回副本。

屬性

----------

center_ : 浮點(diǎn)數(shù)數(shù)組。訓(xùn)練集中每個(gè)特征的中值。

scale_?:浮點(diǎn)數(shù)數(shù)組。訓(xùn)練集中每個(gè)特征的(縮放的)四分位范圍。

????.. versionadded:: 0.17

????*scale_* attribute.

????

????Examples

????--------

????>>> from sklearn.preprocessing import RobustScaler

????>>> X = [[ 1., -2., ?2.],

????... ?????[ -2., ?1., ?3.],

????... ?????[ 4., ?1., -2.]]

????>>> transformer = RobustScaler().fit(X)

????>>> transformer

????RobustScaler()

????>>> transformer.transform(X)

????array([[ 0. , -2. , ?0. ],

????[-1. , ?0. , ?0.4],

????[ 1. , ?0. , -1.6]])

????

????See also

????--------

????robust_scale: Equivalent function without the estimator API.

????:class:`sklearn.decomposition.PCA`

????Further removes the linear correlation across features with

????'whiten=True'.

????Notes

????-----

????For a comparison of the different scalers, transformers, and normalizers,?see :ref:`examples/preprocessing/plot_all_scaling.py

????<sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>`.

????

????https://en.wikipedia.org/wiki/Median

????https://en.wikipedia.org/wiki/Interquartile_range

????"""

????@_deprecate_positional_args

????def __init__(self, *, with_centering=True, with_scaling=True,

????????quantile_range=(25.0, 75.0), copy=True):

????????self.with_centering = with_centering

????????self.with_scaling = with_scaling

????????self.quantile_range = quantile_range

????????self.copy = copy

????

????def fit(self, X, y=None):

????????"""Compute the median and quantiles to be used for scaling.

????????Parameters

????????----------

????????X : array-like, shape [n_samples, n_features]. The data used to compute the median and quantiles?used for later scaling along the features axis.

????????"""

????????# at fit, convert sparse matrices to csc for optimized?computation of?the quantiles

????????X = self._validate_data(X, accept_sparse='csc', estimator=self,

?????????dtype=FLOAT_DTYPES,

????????????force_all_finite='allow-nan')

????????q_min, q_max = self.quantile_range

????????if not 0 <= q_min <= q_max <= 100:

????????????raise ValueError(

????????????????"Invalid quantile range: %s" % str(self.quantile_range))

????????if self.with_centering:

????????????if sparse.issparse(X):

????????????????raise ValueError(

????????????????????"Cannot center sparse matrices: use

?????????????????????`with_centering=False`"

????????????????????" instead. See docstring for motivation and

?????????????????????alternatives.")

????????????self.center_ = np.nanmedian(X, axis=0)

????????else:

????????????self.center_ = None

????????if self.with_scaling:

????????????quantiles = []

????????????for feature_idx in range(X.shape[1]):

????????????????if sparse.issparse(X):

????????????????????column_nnz_data = X.data[X.indptr[feature_idx]:

????????????????????????X.indptr[feature_idx + 1]]

????????????????????column_data = np.zeros(shape=X.shape[0], dtype=X.

?????????????????????dtype)

????????????????????column_data[:len(column_nnz_data)] =

?????????????????????column_nnz_data

????????????????else:

????????????????????column_data = X[:feature_idx]

????????????????quantiles.append(np.nanpercentile(column_data,

????????????????????????self.quantile_range))

????????????

????????????quantiles = np.transpose(quantiles)

????????????self.scale_ = quantiles[1] - quantiles[0]

????????????self.scale_ = _handle_zeros_in_scale(self.scale_, copy=False)

????????else:

????????????self.scale_ = None

????????return self

????

????def transform(self, X):

????????"""Center and scale the data.

????????Parameters

????????----------

????????X : {array-like, sparse matrix}

????????????The data used to scale along the specified axis.

????????"""

????????check_is_fitted(self)

????????X = check_array(X, accept_sparse=('csr', 'csc'), copy=self.

?????????copy,

????????????estimator=self, dtype=FLOAT_DTYPES,

????????????force_all_finite='allow-nan')

????????if sparse.issparse(X):

????????????if self.with_scaling:

????????????????inplace_column_scale(X, 1.0 / self.scale_)

????????else:

????????????if self.with_centering:

????????????????X -= self.center_

????????????if self.with_scaling:

????????????????X /= self.scale_

????????return X

????

????def inverse_transform(self, X):

????????"""Scale back the data to the original representation

????????Parameters

????????----------

????????X : array-like

????????????The data used to scale along the specified axis.

????????"""

????????check_is_fitted(self)

????????X = check_array(X, accept_sparse=('csr', 'csc'), copy=self.

?????????copy,

????????????estimator=self, dtype=FLOAT_DTYPES,

????????????force_all_finite='allow-nan')

????????if sparse.issparse(X):

????????????if self.with_scaling:

????????????????inplace_column_scale(X, self.scale_)

????????else:

????????????if self.with_scaling:

????????????????X *= self.scale_

????????????if self.with_centering:

????????????????X += self.center_

????????return X

????

????def _more_tags(self):

????????return {'allow_nan':True}

總結(jié)

以上是生活随笔為你收集整理的Python之 sklearn：sklearn中的RobustScaler 函数的简介及使用方法之详细攻略的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：成功解决ValueError: colu
下一篇： Python之Pyforest：Pyfo

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

python

Python之 sklearn：sklearn中的RobustScaler 函数的简介及使用方法之详细攻略

sklearn中的RobustScaler 函數(shù)的簡(jiǎn)介及使用方法

總結(jié)