當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

图像检索：CNN对Hash组算法的颠覆

發(fā)布時間：2025/3/15 编程问答 59 豆豆

生活随笔收集整理的這篇文章主要介紹了图像检索：CNN对Hash组算法的颠覆小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

參看論文：Liu H, Wang R, Shan S, et al. Deep Supervised Hashing for Fast Image Retrieval[C]. computer vision and pattern recognition, 2016: 2064-2072.

會議水平：CVPR2016

供稿單位：中科院計算所（中科院計算所、自動化所都是做圖像處理的神一樣的單位，各種CVPR、各種姿勢）

1.導(dǎo)讀

自從孿生網(wǎng)絡(luò)又被大家撿起來哪一天（2014），就注定了和匹配相關(guān)領(lǐng)域的不平靜，無論是圖像檢索、立體匹配還是基于最佳匹配的跟蹤，孿生網(wǎng)絡(luò)及其衍生分支網(wǎng)絡(luò)正在一點點顛覆幾乎所有的經(jīng)典算法。由Haomiao Liu報道的深度監(jiān)督哈希算法短短一年多就收獲了134的引用量，大部分的相關(guān)工作都是在該基礎(chǔ)之上開展的，所以很有必要研究。

Hash，就是把任意長度的輸入通過散列算法變換成固定長度的輸出，該輸出就是散列值。這種轉(zhuǎn)換是一種壓縮映射，也就是，散列值的空間通常遠小于輸入的空間，不同的輸入可能會散列成相同的輸出，所以不可能從散列值來確定唯一的輸入值。簡單的說就是一種將任意長度的消息壓縮到某一固定長度的消息摘要的函數(shù)。

2. 摘要及目標

為了在大數(shù)據(jù)上開發(fā)高效的圖像檢索算法，作者提出了一種新的Hashing方法用于學習圖像緊密的二值編碼。在圖像檢索領(lǐng)域，盡管圖像的形貌變化帶來非常大的挑戰(zhàn)，但是利用CNN學習一個魯棒性的圖像表達為解決這個挑戰(zhàn)帶來了曙光。這邊文章就是利用CNN來學習高相似緊湊的二值編碼形式，也就是原文作者提到的深度監(jiān)督Hashing。特別的，作者設(shè)計了CNN結(jié)構(gòu)，利用一對圖像輸入，輸出判別分類。（現(xiàn)在看來似乎很簡單，但放在16年，確實不容易）。作者精心設(shè)計了損失函數(shù)，用于最大化判別性能。

Our goal is to learn compact binary codes for imagessuch that: (a) similar images should be encoded to similar binary codes in Hamming space, and vice versa; (b) the binary codes could be computed efficiently。

Our method first trains the CNN using image pairs and the corresponding similarity labels. （創(chuàng)新點來了，鼻祖...）
?

以下幾句真是寫出了藝術(shù)啊...

While the complex image appearance variations still pose a great challenge to reliable retrieval, in light of the recent progress of Convolutional Neural Networks (CNNs) in learning robust image representation on various vision tasks, this paper proposes a novel Deep Supervised Hashing (DSH) method to learn compact similarity-preserving binary code for the huge body of image data.

盡管復(fù)雜的圖像形貌變化對圖像檢索而言是一個巨大的挑戰(zhàn)，得益于最近卷機神經(jīng)網(wǎng)絡(luò)在各種視覺問題學習魯棒性性的圖像表達的進展，我們提出一種新的深度監(jiān)督hashing方法用于學習緊湊相似的二值編碼信息。

To this end, a loss function iselaborately designed to maximize the discriminability of the output space by encoding the supervised information from the input image pairs, and simultaneously imposing regularization on the real-valued outputs to approximate the desired discrete values.

最后，精心設(shè)計的損失函數(shù)去最大化編碼信息的輸出空間，這些監(jiān)督信息是從圖像對中學習到的，同時對輸出施加正則化，讓輸出迫近離散值（金標準標簽）。
?

Extensive experiments on two large scale datasets CIFAR-10 and NUS-WIDE show the promising performance of our method compared with the state-of-the-arts.?

Comment：我發(fā)現(xiàn)CVPR的審稿委員會就喜歡the promising performance 和 the state-of-the-arts 這兩個詞；吹得過分的和太過謙虛的被拒稿了。

3. 方法及細節(jié)

Figure 1. The network structure used in our method. The network consists of 3 convolution-pooling layers and 2 fully connected layers. The filters in convolution layers are of size 5 × 5 with stride 1 (32, 32, and 64 filters in the three convolution layers respectively), and pooling over 3 × 3 patches with stride 2. The first fully connected layer contains 500 nodes, and the second (output layer) has k (the code length) nodes. The loss function is designed to learn similarity-preserving binary-like codes by exploiting discriminability terms and a regularizer. Binary codes are obtained by quantizing the network outputs of images.

圖1. 作者方法中的網(wǎng)絡(luò)框架。這個網(wǎng)絡(luò)包含3個卷積-池化層和兩個全連接層。

3.1 損失函數(shù)設(shè)計

The loss function is designed to pull the network outputs of similar images together and push the outputs of dissimilar ones far away, so that the learned Hamming space can well approximate the semantic structure of images.

設(shè)計的損失函數(shù)驅(qū)動網(wǎng)絡(luò)對相同的圖像輸出距離很近，對不同圖像輸出距離很遠。
To avoid optimizing the nondifferentiable loss function in Hamming space, the network?outputs are relaxed to real values, while simultaneously a regularizer is imposed to encourage the real-valued outputs to approach the desired discrete values.
為了避免在hamming空間優(yōu)化不可微分的損失函數(shù)，網(wǎng)絡(luò)輸出輕松轉(zhuǎn)為實數(shù)值，同時采用了正則化，使得實數(shù)值輸出更加貼近離散值{-1,?+1}.

Comment：我之前也在思考，為什么在hamming空間存在不可微分的情況，其實做著這樣描述是非常保守的。粗魯一點就是K長度的Hamming二值編碼，你告訴我怎么設(shè)計金標準？怎么設(shè)計損失函數(shù)？怎么進行求偏導(dǎo)數(shù)，誤差傳播？如果對這個問題陷入了深思，那就別扯了，直接把二值編碼變成實數(shù)值就好了，這個原理就相當于在最后一層加載了一個全連接層，而這個全連接層剛好只有一個神經(jīng)元，只此而已。

b1，b2是一對圖象輸入。y是標簽，y=0表示相似，也表示不相似。Dh是兩個二值編碼矢量的hamming距離。m>0是一個margine值。前一項懲罰相似圖像對，后一項懲罰不相似圖像對

為了便于設(shè)計反向傳播，作者用了L2-norm：

這里重點看第三項，該項最大的一個好處在于對特征的緊湊型進行了約束，如果特征很稀疏，那么該項懲罰值很大。

這個求偏導(dǎo)過程也就很順利：

3.2 應(yīng)用細節(jié)

1. 所有的卷積層作者都利用了ReLU進行調(diào)整

2.During training, the batch size is set to 200, momentum to 0.9, and weight decay to 0.004. The initial learning rate is set to 10-3 and decreases by 40% after every 20,000 iterations (150,000 iterations in total). The margin m in Eqn.(4) is heuristically set to m = 2k to encourage the codes of dissimilar images to differ in no less than k2 bits.

Comment：這里要說明一點，作者發(fā)表這篇文章的時候，BatchNorm還沒有廣泛應(yīng)用，后來有人進做了一點點的結(jié)構(gòu)更改，性能取得了非常大的提升。

4. 結(jié)果

We attribute the promising retrieval performance of DSH to three aspects: First, the coupling of non-linear feature learning and hash coding for extracting task-specific image representations; Second, the proposed regularizer for reducing the discrepancy between the real-valued network output space and the desired Hamming space; Third, the online generated dense pairwise supervision for well describing the desired Hamming space.

Comment：寫這篇文章主要是因為他是開山之作。以我們現(xiàn)在的觀點審視作者的這段話，其實性能這么好并不是因為損失函數(shù)多厲害，在線訓練多出色。就是卷積的效益太強了! 后面也有人把損失函數(shù)簡單的換成了L2-norm，結(jié)構(gòu)上裁剪全連接層，使得網(wǎng)絡(luò)變得很輕，性能更好。所以，對于計算機視覺領(lǐng)域，基于數(shù)據(jù)驅(qū)動的特征提取器，才是核心。

總結(jié)

以上是生活随笔為你收集整理的图像检索：CNN对Hash组算法的颠覆的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： 10条途径迅速提高你的生活
下一篇：形变块匹配跟踪(2)：配准跟踪与几何约束