當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

BoW图像检索Python实战

發(fā)布時(shí)間：2025/7/25 python 40 豆豆

生活随笔收集整理的這篇文章主要介紹了 BoW图像检索Python实战小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

BoW圖像檢索Python實(shí)戰(zhàn)

?2015年06月16日??Image Retrieval??詞袋模型?字?jǐn)?shù):11854

前幾天把HABI哈希圖像檢索工具包更新到V2.0版本后，小白菜又重新回頭來用Python搞BoW詞袋模型，一方面主要是練練Python，另一方面也是為了CBIR群開講的關(guān)于圖像檢索群活動(dòng)第二期而準(zhǔn)備的一些素材。關(guān)于BoW，網(wǎng)上堆資料講得挺好挺全的了，小白菜自己在曾留下過一篇講解BoW詞袋構(gòu)建過程的博文Bag of Words模型，所以這里主要講講BoW的實(shí)戰(zhàn)。不過在實(shí)戰(zhàn)前，小白菜還想在結(jié)合自己這兩年多BoW的思考和沉淀重新以更直白的方式對BoW做一下總結(jié)。

舉兩個(gè)例子來說明BoW詞袋模型。第一個(gè)例子在介紹BoW詞袋模型時(shí)一般資料里會(huì)經(jīng)常使用到，就是將圖像類比成文檔，即一幅圖像類比成一個(gè)文檔，將圖像中提取的諸如SIFT特征點(diǎn)類比成文檔中的單詞，然后把從圖像庫中所有提取的所有SIFT特征點(diǎn)弄在一塊進(jìn)行聚類，從中得到具有代表性的聚類中心(單詞)，再對每一幅圖像中的SIFT特征點(diǎn)找距離它最近的聚類中心(單詞)，做詞頻(TF)統(tǒng)計(jì)，圖解如下：做完詞頻(TF)統(tǒng)計(jì)后，為了降低停用詞帶來的干擾，可以再算個(gè)逆文檔詞頻(IDF)，也就是給TF乘上個(gè)權(quán)重，該過程可以圖解如下：上面單詞權(quán)重即逆文檔詞頻(IDF)，那時(shí)通過統(tǒng)計(jì)每個(gè)單詞包含了多少個(gè)文檔然后按設(shè)定的一個(gè)對數(shù)權(quán)重公式計(jì)算得來的，具體如下：對于上傳上來的查詢圖像，提取SIFT然后統(tǒng)計(jì)tf后乘上上面的idf便可得到id-idf向量，然后進(jìn)行L2歸一化，用內(nèi)積做相似性度量。

在做TF統(tǒng)計(jì)的時(shí)候，我們知道一般為了取得更好的效果，通常單詞數(shù)目會(huì)做得比較大，動(dòng)則上萬或幾十萬，所以在做聚類的時(shí)候，可以對這些類中心做一個(gè)K-D樹，這樣在做TF詞頻統(tǒng)計(jì)的時(shí)候便可以加快單詞直方圖計(jì)算的速度。

上面舉的例子對于初次接觸BoW的人來說可能講得不是那么的直觀，小白菜可以舉一個(gè)更直觀的例子(雖然有些地方可能會(huì)不怎么貼切，但還是觸及BoW的本質(zhì))，比如美國總統(tǒng)全國大選，假設(shè)有10000個(gè)比較有影響力的人參加總統(tǒng)競選，這10000個(gè)人表示的就是聚類中心，他們最具有代表性(K-means做的就是得到那些設(shè)定數(shù)目的最具有代表性的特征點(diǎn))，每個(gè)州類比成一幅圖像，州里的人手里持的票就好比是SIFT特征點(diǎn)，這樣的話，我們就可以對每個(gè)州做一個(gè)10000維的票數(shù)統(tǒng)計(jì)結(jié)果，這個(gè)統(tǒng)計(jì)出來的就是上面第一個(gè)例子里所說的詞頻向量。另外，我們還可以統(tǒng)計(jì)每個(gè)競選人有多少個(gè)州投了他的票，那么就可以得到一個(gè)10000維長的對州的統(tǒng)計(jì)結(jié)果，這個(gè)結(jié)果再稍微和對數(shù)做下處理，便得到了所謂的逆文檔詞頻。

上面的兩個(gè)例子應(yīng)該講清楚了BoW詞袋模型吧，下面就來看看BoW詞袋模型用Python是怎么實(shí)現(xiàn)的。

#!/usr/local/bin/python2.7 #python findFeatures.py -t dataset/train/import argparse as ap import cv2 import numpy as np import os from sklearn.externals import joblib from scipy.cluster.vq import *from sklearn import preprocessing from rootsift import RootSIFT import math# Get the path of the training set parser = ap.ArgumentParser() parser.add_argument("-t", "--trainingSet", help="Path to Training Set", required="True") args = vars(parser.parse_args())# Get the training classes names and store them in a list train_path = args["trainingSet"] #train_path = "dataset/train/"training_names = os.listdir(train_path)numWords = 1000# Get all the path to the images and save them in a list # image_paths and the corresponding label in image_paths image_paths = [] for training_name in training_names:image_path = os.path.join(train_path, training_name)image_paths += [image_path]# Create feature extraction and keypoint detector objects fea_det = cv2.FeatureDetector_create("SIFT") des_ext = cv2.DescriptorExtractor_create("SIFT")# List where all the descriptors are stored des_list = []for i, image_path in enumerate(image_paths):im = cv2.imread(image_path)print "Extract SIFT of %s image, %d of %d images" %(training_names[i], i, len(image_paths))kpts = fea_det.detect(im)kpts, des = des_ext.compute(im, kpts)# rootsift#rs = RootSIFT()#des = rs.compute(kpts, des)des_list.append((image_path, des))# Stack all the descriptors vertically in a numpy array #downsampling = 1 #descriptors = des_list[0][1][::downsampling,:] #for image_path, descriptor in des_list[1:]: # descriptors = np.vstack((descriptors, descriptor[::downsampling,:]))# Stack all the descriptors vertically in a numpy array descriptors = des_list[0][1] for image_path, descriptor in des_list[1:]:descriptors = np.vstack((descriptors, descriptor))# Perform k-means clustering print "Start k-means: %d words, %d key points" %(numWords, descriptors.shape[0]) voc, variance = kmeans(descriptors, numWords, 1)# Calculate the histogram of features im_features = np.zeros((len(image_paths), numWords), "float32") for i in xrange(len(image_paths)):words, distance = vq(des_list[i][1],voc)for w in words:im_features[i][w] += 1# Perform Tf-Idf vectorization nbr_occurences = np.sum( (im_features > 0) * 1, axis = 0) idf = np.array(np.log((1.0*len(image_paths)+1) / (1.0*nbr_occurences + 1)), 'float32')# Perform L2 normalization im_features = im_features*idf im_features = preprocessing.normalize(im_features, norm='l2')joblib.dump((im_features, image_paths, idf, numWords, voc), "bof.pkl", compress=3)

將上面的文件保存為findFeatures.py，前面主要是一些通過parse使得可以在敲命令行的時(shí)候可以向里面?zhèn)鬟f參數(shù)，后面就是提取SIFT特征，然后聚類，計(jì)算TF和IDF，得到單詞直方圖后再做一下L2歸一化。一般在一幅圖像中提取的到SIFT特征點(diǎn)是非常多的，而如果圖像庫很大的話，SIFT特征點(diǎn)會(huì)非常非常的多，直接聚類是非常困難的(內(nèi)存不夠，計(jì)算速度非常慢)，所以，為了解決這個(gè)問題，可以以犧牲檢索精度為代價(jià)，在聚類的時(shí)候先對SIFT做降采樣處理。最后對一些在在線查詢時(shí)會(huì)用到的變量保存下來。對于某個(gè)圖像庫，可以在命令行里通過下面命令生成BoF：

python findFeatures.py -t dataset/train/

在線查詢階段相比于上面簡單了些，沒有了聚類過程，具體代碼如下:

#!/usr/local/bin/python2.7 #python search.py -i dataset/train/ukbench00000.jpgimport argparse as ap import cv2 import imutils import numpy as np import os from sklearn.externals import joblib from scipy.cluster.vq import *from sklearn import preprocessing import numpy as npfrom pylab import * from PIL import Image from rootsift import RootSIFT# Get the path of the training set parser = ap.ArgumentParser() parser.add_argument("-i", "--image", help="Path to query image", required="True") args = vars(parser.parse_args())# Get query image path image_path = args["image"]# Load the classifier, class names, scaler, number of clusters and vocabulary im_features, image_paths, idf, numWords, voc = joblib.load("bof.pkl")# Create feature extraction and keypoint detector objects fea_det = cv2.FeatureDetector_create("SIFT") des_ext = cv2.DescriptorExtractor_create("SIFT")# List where all the descriptors are stored des_list = []im = cv2.imread(image_path) kpts = fea_det.detect(im) kpts, des = des_ext.compute(im, kpts)# rootsift #rs = RootSIFT() #des = rs.compute(kpts, des)des_list.append((image_path, des))# Stack all the descriptors vertically in a numpy array descriptors = des_list[0][1]# test_features = np.zeros((1, numWords), "float32") words, distance = vq(descriptors,voc) for w in words:test_features[0][w] += 1# Perform Tf-Idf vectorization and L2 normalization test_features = test_features*idf test_features = preprocessing.normalize(test_features, norm='l2')score = np.dot(test_features, im_features.T) rank_ID = np.argsort(-score)# Visualize the results figure() gray() subplot(5,4,1) imshow(im[:,:,::-1]) axis('off') for i, ID in enumerate(rank_ID[0][0:16]):img = Image.open(image_paths[ID])gray()subplot(5,4,i+5)imshow(img)axis('off')show()

將上面的代碼保存為search.py,對某幅圖像進(jìn)行查詢時(shí)，只需在命令行里輸入：

#python search.py -i dataset/train/ukbench00000.jpg(查詢圖像的路徑)

上面的代碼中，你可以看到rootSIFT注釋掉了，你也可以去掉注釋，采用rootSIFT，但這里實(shí)驗(yàn)中我發(fā)覺rootSIFT并沒有SIFT的效果好。最后看看檢索的效果，最上面一張是查詢圖像，后面的是搜索到的圖像：

整個(gè)實(shí)戰(zhàn)的代碼可以在這里下載：下載地址。

總結(jié)

以上是生活随笔為你收集整理的BoW图像检索Python实战的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：深度学习新算法，完成字里行间的情绪识别
下一篇：如何设计好词袋模型BoW模型的类类型