當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

一文搞懂F.binary_cross_entropy以及weight参数

發布時間：2023/12/16 编程问答 55 豆豆

生活随笔收集整理的這篇文章主要介紹了一文搞懂F.binary_cross_entropy以及weight参数小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

相信有很多人在用pytorch做深度學習的時候，可能只是知道模型中用的是F.binary_cross_entropy或者F.cross_entropy，但是從來沒有想過這兩者的區別，即使知道這兩者是分別在什么情況下使用的，也沒有想過它們在pytorch中是如何具體實現的。在另一篇文章中介紹了F.cross_entropy()的具體實現，所以本文將介紹F.binary_cross_entropy的具體實現。
當你分別了解了它們在pytorch中的具體實現，也就自然知道它們的區別以及應用場景了。

1、pytorch對BCELoss的官方解釋
在自己實現F.binary_cross_entropy之前，我們首先得看一下pytorch的官方實現，下面是pytorch官方對BCELoss類的描述：
在目標和輸出之間創建一個衡量二進制交叉熵的標準。the unreduced loss（如：reduction屬性被設置為none）的數學表達式為：
$ln=?wn[yn?log?xn+(1?yn)?log?(1?xn)]\quad l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right]$
其中，N表示batch size，如果reduction is not none（reduction的默認是‘mean’）時的表達式為：
$?(x,y)={mean?(L),if?reduction=’mean’;sum?(L),if?reduction=’sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{'mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{'sum'.} \end{cases}$

補充：targets也就是表達式中的y應該是0-1之間的數，Xn不能為0或1，如果Xn是0或者1，也就意味著log(Xn)或者log(1-Xn)中的一項沒有意義，pytorch中對log(0)作出的定義如下，也是數學上對log(0)的定義：
$log?(0)=?∞，lim?x→0log?(x)=?∞\log (0) = -\infty，\lim_{x\to 0} \log (x) = -\infty$
然而，由于一些原因，無窮項在在損失函數中無法表述。舉個例子：如果Yn=0或者1-Yn=0，我們就會用0乘上無窮。而且如果我們有一個無窮的損失值，我們在計算梯度的時候也會是一個無窮，也是因為數學上的定義：
$lim?x→0ddxlog?(x)=∞\lim_{x\to 0} \fracozvdkddzhkzd{dx} \log (x) = \infty$
而且會導致BECLoss的反向傳播方法非線性。對于上述可能會出現的問題，pytorch官方給出的解決方案是限制log函數的輸出大于等于-100，這樣的話就可以得到一個有限的損失值，以及線性的反向傳播方法。下面寫個代碼測試一下pytorch限制log函數輸出的機制：

print(np.log(1e-50)) input = torch.tensor([1e-50]) target = torch.tensor([1.0]) print(F.binary_cross_entropy(input, target)) print(torch.log(input)) # 輸出 -115.12925464970229 tensor(100.) tensor([-inf])

首先，我們取一個數讓其log運算后的值小于-100，發現F.binary_cross_entropy中的計算結果為100，而torch.log()的計算結果為負無窮，原因在于pytorch官方實現的F.binary_cross_entropy對log輸出做了限制。大家不要對100感到疑惑呀，為什么不是-100，那是因為損失函數計算的時候前面有個負號。

2、pytorch的官方實現
input的維度(N，*)，其中*表示可以是任何維度。target和input的維度需一致。OK，其實最關鍵的還是上面的數學表達式，知道了表達式也就可以簡單實現二值交叉熵了。

input = torch.rand(1, 3, 3) target = torch.rand(1, 3, 3).random_(2) print(input) print(target) input = torch.sigmoid(input) output = torch.nn.functional.binary_cross_entropy(input, target) print(output)

輸出：

# input tensor([[[0.7266, 0.9478, 0.3987],[0.4134, 0.1654, 0.0298],[0.1266, 0.1153, 0.0549]]]) # target tensor([[[0., 1., 1.],[1., 0., 0.],[0., 0., 0.]]]) # output tensor(0.6877)

3、根據公式自己實現

class binary_ce_loss(torch.nn.Module):def __init__(self):super(binary_ce_loss, self).__init__()def forward(self, input, target):input = input.view(input.shape[0], -1)target = target.view(target.shape[0], -1)loss = 0.0for i in range(input.shape[0]):for j in range(input.shape[1]):loss += -(target[i][j] * torch.log(input[i][j]) + (1 - target[i][j]) * torch.log(1 - input[i][j]))return loss/(input.shape[0]*input.shape[1]) # 默認取均值

input和target的維度需相同，上述的例子中，它們的維度均是[1,3,3]，我們可以把1看作batchsize的大小，3*3看作是圖片的大小。首先將shape變成[1,3*3]，然后按照公式計算每一個batchsize的損失，再求和，最后按照pytorch官方默認的方式求平均，即可大功告成。
4、weight參數含義
在寫代碼的過程中，我們會發現F.binary_cross_entropy中還有一個參數weight，它的默認值是None，估計很多人不知道weight參數怎么作用的，下面簡單的分析一下：
首先，看一下pytorch官方對weight給出的解釋，if provided it’s repeated to match input tensor shape，就是給出weight參數后，會將其shape和input的shape相匹配。回憶公式：
$ln=?wn[yn?log?xn+(1?yn)?log?(1?xn)]\quad l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right]$
默認情況，也就是weight=None時，上述公式中的Wn=1；當weight!=None時，也就意味著我們需要為每一個樣本賦予權重Wi，這樣weight的shape和input一致就很好理解了。
首先看pytorch中weight參數作用后的結果：

input = torch.rand(3, 3) target = torch.rand(3, 3).random_(2) print(input) print(target) w = [0.1, 0.9] # 標簽0和標簽1的權重 weight = torch.zeros(target.shape) # 權重矩陣 for i in range(target.shape[0]):for j in range(target.shape[1]):weight[i][j] = w[int(target[i][j])] print(weight) loss = F.binary_cross_entropy(input, target, weight=weight) print(loss) """ # input tensor([[0.1531, 0.3302, 0.7537],[0.2200, 0.6875, 0.2268],[0.5109, 0.5873, 0.9275]]) # target tensor([[1., 0., 0.],[0., 0., 1.],[0., 1., 0.]]) # weight tensor([[0.9000, 0.1000, 0.1000],[0.1000, 0.1000, 0.9000],[0.1000, 0.9000, 0.1000]]) # loss tensor(0.4621) """

通過下面的代碼再次驗證weight是如何作用的，weight就是為每一個樣本加權。

class binary_ce_loss(torch.nn.Module):def __init__(self):super(binary_ce_loss, self).__init__()def forward(self, input, target, weight=None):input = input.view(input.shape[0], -1)target = target.view(target.shape[0], -1)loss = 0.0for i in range(input.shape[0]):for j in range(input.shape[1]):loss += -weight[i][j] * (target[i][j] * torch.log(input[i][j]) + (1 - target[i][j]) * torch.log(1 - input[i][j]))return loss/(input.shape[0]*input.shape[1]) # 默認取均值 myloss = binary_ce_loss() print(myloss(input, target, weight=weight)) """ # myloss tensor(0.4621) """

pytorch官方的代碼和自己實現的計算出的損失一致，再次說明binary_cross_entropy的weight權重會分別對應的作用在每一個樣本上。
5、總結
看源碼是最直接有效的手段。 留個彩蛋，下篇文章講balanced_cross_entropy，解決樣本之間的不平衡問題。

注：如有錯誤還請指出！

總結

以上是生活随笔為你收集整理的一文搞懂F.binary_cross_entropy以及weight参数的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。