當前位置：首頁 >

人脸数据集——亚洲人脸数据集

發布時間：2023/12/9 60 豆豆

生活随笔收集整理的這篇文章主要介紹了人脸数据集——亚洲人脸数据集小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

大規模亞洲人臉數據的制作

在這次大規模亞洲人臉數據制作主要是亞洲明星人臉數據集，此次我爬取了大概20萬張亞洲人臉圖像，可以修改爬取每位明星圖片的數量來獲取更多的圖片，過程中主要分以下幾步：

獲取明星名字列表

（1）、首先從百度搜索欄中搜索“明星”，顯示出明星欄目，地區包括內地、香港、臺灣、韓國和日本，如下圖：

（2）、使用python爬蟲將這些明星的名字爬取下來，代碼如下所示：

import os import time import json import requestsdef getManyPages(pages):params=[]for i in range(0, 12*pages+12, 12):params.append({'resource_id': 28266,'from_mid': 1,'format': 'json','ie': 'utf-8','oe': 'utf-8','query': '臺灣明星','sort_key': '','sort_type': 1,'stat0': '','stat1': '臺灣','stat2': '','stat3': '','pn': i,'rn': 12})url = 'https://sp0.baidu.com/8aQDcjqpAAV3otqbppnN2DJv/api.php' # names = [] # img_results = []x = 0f = open('starName.txt', 'w')for param in params:try:res = requests.get(url, params=param)js = json.loads(res.text)results = js.get('data')[0].get('result')except AttributeError as e:print(e)continuefor result in results:img_name = result['ename'] # img_url = result['pic_4n_78'] # img_result = [img_name,img_url] # img_results.append(img_result)f.write(img_name+'\n')# names.append(img_name)if x % 10 == 0:print('第%d頁......'%x)x += 1f.close()if __name__ == '__main__':getManyPages(400)

這里需要注意：params里的‘query’：‘臺灣明星’；‘start1’：‘臺灣’，對應臺灣地區的明星，修改這兩個值可以獲得‘內地’、‘香港’、‘韓國’等地區的明星。從圖一可以看出，每頁有12位明星，getManyPages(400)是獲取400頁的明星名單結果，也就是12*400=4800位明星名單，通過修改獲取頁碼值來獲取更多明星的名單，將獲取的明星名單保存成文本文件，在后續操作中將會用到，同時也能避免代碼終止又要重新爬取。

根據名單列表爬取明星圖片

這里我將使用開源的網絡爬蟲來爬取明星圖片，需要導入icrawler庫，對該庫的詳細介紹，讀者可以自己百度；由于一些小明星的照片在百度上有可能搜不到，并且百度上照片有時會很雜，所以這里我將在Bing搜索上爬取明星人臉圖片，代碼如下：

import os from icrawler.builtin import BingImageCrawler path = r'E:\weather_test\BingImage' f = open('KoreaStarName.txt', 'r') lines = f.readlines() for i, line in enumerate(lines):name = line.strip('\n')file_path = os.path.join(path, name)if not os.path.exists(file_path):os.makedirs(file_path)bing_storage = {'root_dir': file_path}bing_crawler = BingImageCrawler(parser_threads=2, downloader_threads=4, storage=bing_storage)bing_crawler.crawl(keyword=name, max_num=10)print('第{}位明星：{}'.format(i, name))

其中，path為將要保存爬取的明星圖片的路徑，可以.txt文件為上一步爬取的明星名單文件，通過修改

bing_crawler.crawl()中的max_num的值來改變爬取每位明星的圖像數目。

對爬取的明星圖片做粗略清洗

這里主要針對非人臉圖像的清洗，由于爬取的圖像里有可能出現非人像圖片，需要對它進行刪除，通過對圖像人臉判斷來確認是否包含人臉圖像，代碼如下：

其中，使用face_recognition庫檢測圖像上是否有人臉，該庫在windows上的安裝比較麻煩，讀者可以百度，有詳細的windows安裝教程，同時也使用了一些容錯代碼，避免出現錯誤，代碼運行中斷。path為爬取明星圖片的文件目錄，new_path為清洗后保存到的人臉圖像目錄。

import os import face_recognition from PIL import Image from PIL import ImageFile import threading ImageFile.LOAD_TRUNCATED_IMAGES = Truedef process_img(path, new_path):dirs = os.listdir(path)for pic_dir in dirs:print(pic_dir)dir_path = os.path.join(path, pic_dir)pics = os.listdir(dir_path)for pic in pics:pic_path = os.path.join(dir_path, pic)image = face_recognition.load_image_file(pic_path)face_locations = face_recognition.face_locations(image)if len(face_locations) == 0:continueimg = Image.open(pic_path)new_pic_path = os.path.join(new_path, pic_dir)if not os.path.exists(new_pic_path):os.makedirs(new_pic_path)if len(img.split()) == 4:# 利用split和merge將通道從四個轉換為三個r, g, b, a = img.split()toimg = Image.merge("RGB", (r, g, b))toimg.save(new_pic_path + '\\' + pic)else:try:img.save(new_pic_path + '\\' + pic)except:continueprint('Finish......!')def lock_test(path, new_path):mu = threading.Lock()if mu.acquire(True):process_img(path, new_path)mu.release()if __name__ == '__main__':paths = [r'E:\weather_test\亞洲人臉4_1', r'E:\weather_test\亞洲人臉4_2', r'E:\weather_test\亞洲人臉4_3',r'E:\weather_test\亞洲人臉4_4', r'E:\weather_test\亞洲人臉4_5', r'E:\weather_test\亞洲人臉4_6']new_paths = [r'E:\weather_test\4_1', r'E:\weather_test\4_2', r'E:\weather_test\4_3', r'E:\weather_test\4_4',r'E:\weather_test\4_5', r'E:\weather_test\4_6']for i in range(len(paths)):my_thread = threading.Thread(target=lock_test, args=(paths[i], new_paths[i]))my_thread.start()

總結

以上是生活随笔為你收集整理的人脸数据集——亚洲人脸数据集的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：便宜的手机图传遥控模块
下一篇：深度学习简明教程系列 —— 基础知识（合

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

人脸数据集——亚洲人脸数据集

大規模亞洲人臉數據的制作

獲取明星名字列表

根據名單列表爬取明星圖片

對爬取的明星圖片做粗略清洗

總結