當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Huber loss--转

發(fā)布時間：2025/4/5 编程问答 40 豆豆

生活随笔收集整理的這篇文章主要介紹了 Huber loss--转小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

原文地址：https://en.wikipedia.org/wiki/Huber_loss

In?statistics, the?Huber loss?is a?loss function?used in?robust regression, that is less sensitive to?outliers?in data than the?squared error loss. A variant for classification is also sometimes used.

Definition

Huber loss (green,?

The Huber loss function describes the penalty incurred by an?estimation procedure? $f.?Huber?(1964) defines the loss function piecewise by [1]$

This function is quadratic for small values of? $a, and linear for large values, with equal values and slopes of the different sections at the two points where?{\displaystyle |a|=\delta }. The variable?a?often refers to the residuals, that is to the difference between the observed and predicted values?{\displaystyle a=y-f(x)}, so the former can be expanded to[2]$

Motivation

Two very commonly used loss functions are the?squared loss,?{\displaystyle L(a)=a^{2}}, and the?absolute loss,?{\displaystyle L(a)=|a|}. The squared loss function results in an?arithmetic mean-unbiased estimator, and the absolute-value loss function results in a?median-unbiased estimator (in the one-dimensional case, and a?geometric median-unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of?{\displaystyle a}'s (as in?{\textstyle \sum _{i=1}^{n}L(a_{i})}), the sample mean is influenced too much by a few particularly large a-values when the distribution is heavy tailed: in terms of?estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions.

As defined above, the Huber loss function is?convex?in a uniform neighborhood of its minimum?{\displaystyle a=0}, at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points?{\displaystyle a=-\delta }?and?{\displaystyle a=\delta }. These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimator (using the absolute value function).

Pseudo-Huber loss function

The?Pseudo-Huber loss function?can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees. It is defined as^[3]^[4]

As such, this function approximates?{\displaystyle a^{2}/2}?for small values of?{\displaystyle a}, and approximates a straight line with slope?{\displaystyle \delta }?for large values of?{\displaystyle a}.

While the above is the most common form, other smooth approximations of the Huber loss function also exist.^[5]

Variant for classification

For?classification?purposes, a variant of the Huber loss called?modified Huber?is sometimes used. Given a prediction?{\displaystyle f(x)}?(a real-valued classifier score) and a true?binary?class label?{\displaystyle y\in \{+1,-1\}}, the modified Huber loss is defined as^[6]

The term?{\displaystyle \max(0,1-y\,f(x))}?is the?hinge loss?used by?support vector machines; the?quadratically smoothed hinge loss?is a generalization of?{\displaystyle L}.^[6]

Applications

The Huber loss function is used in?robust statistics,?M-estimation?and?additive modelling.^[7]

References

?Huber, Peter J. (1964). "Robust Estimation of a Location Parameter".?Annals of Statistics?53?(1): 73–101.?doi:10.1214/aoms/1177703732.?JSTOR?2238020.

?Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009).?The Elements of Statistical Learning. p.?349.?Compared to Hastie?et al., the loss is scaled by a factor of ?, to be consistent with Huber's original definition given earlier.

?Charbonnier, P.; Blanc-Feraud, L.; Aubert, G.; Barlaud, M. (1997). "Deterministic edge-preserving regularization in computed imaging".?IEEE Trans. Image Processing?6(2): 298–311.?doi:10.1109/83.551699.

?Hartley, R.; Zisserman, A. (2003).?Multiple View Geometry in Computer Vision?(2nd ed.). Cambridge University Press. p.?619.?ISBN?0-521-54051-8.

?Lange, K. (1990). "Convergence of Image Reconstruction Algorithms with Gibbs Smoothing".?IEEE Trans. Medical Imaging?9?(4): 439–446.?doi:10.1109/42.61759.

Zhang, Tong (2004).?Solving large scale linear prediction problems using stochastic gradient descent algorithms. ICML.

?Friedman, J. H. (2001). "Greedy Function Approximation: A Gradient Boosting Machine".?Annals of Statistics?26?(5): 1189–1232.?doi:10.1214/aos/1013203451.JSTOR?2699986.

轉(zhuǎn)載于:https://www.cnblogs.com/davidwang456/articles/5586178.html