當前位置：首頁 > 编程语言 > python >内容正文

python

python爬取qq群成员_Python爬取QQ群群员

發布時間：2023/12/10 python 44 豆豆

生活随笔收集整理的這篇文章主要介紹了 python爬取qq群成员_Python爬取QQ群群员小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

昨天發現了一個群中人雖然很多，有一千人，但是沒有幾個人說話，群中一位朋友說有許多人是死號，我好奇去看了看，發現確實如此，有許多人的空間中說說，照片，日志都是0，訪客只有幾百，甚至幾十，想通過學過的Python做一點事，思路是通過Python+selenium通過瀏覽器動態登錄qq空間，然后通過selenium的find_elements_by_class_name動態獲取網頁的內容中qq成員的網址，然后存入數組，加載完后，遍歷數組中每一個qq成員去加載成員空間網址，之后，通過find_element_by_id得到每個成員的說說，照片，日志數量，當然有的空間限制非好友訪問，抓取不到，默認為是活號。還有的空間有自定義空間，與其他空間不同，也抓取不到數據，默認為是活好。只有說說，照片，日志數量都是0的號，我們認為是死號。

于是用Python寫了代碼查看，結果如下：

其中success:70是訪問對方空間成功并且說說，照片，日志的數量不全為0，有70人。

fail：71是訪問權限受限，有71人。false:853是訪問對方空間成功并且說說，照片，日志的數量全為0，有853人。這說明這個群有853死號，這些號的qq號大多數是10位，以3開頭，應該是買來的。

首先：

安裝Python3，32位的先，去官網下，然后去找一個集成開發環境(IDE),我安裝的pycharm。現在可以抓取一些靜態網頁了，記住(Python2中urllib2,在Python3中是urllib,自己查)，我首先通過瀏覽器的開發人員工具(快捷鍵F12)去觀看源代碼，如圖：

查找源代碼特征(審查元素是一個好東西，用審查元素能節省你查找信息的時間)，發現了一些特征，見圖：

每個成員的網址在href中，而且他們都有共同的class，這一點我們可以利用，想到這里，是不是大功告成了？很遺憾，遠遠不夠，我試著獲取，卻發現沒有這個東西。這個文件是動態加載的，跟ajax有關系，(現在許多網頁都是動態加載，urllib的作用不如以前，用selenium可以)，selenium是什么東西？Selenium也是一個用于Web應用程序測試的工具。Selenium測試直接運行在瀏覽器中，就像真正的用戶在操作一樣。selenium就是一個自動化測試工具。現在有三種方法獲取成員的href：直接在開發人員工具中獲取，復制到一個文本文件中，然后通過代碼讀取文件，正則匹配【reg = r'href="http://user.qzone.qq.com/\d{8,11}" class='】，在寫入到文件(用realaddr = ''.join(realaddrlist)可以將一個list轉換成string，當然有其他方法)，然后Python寫代碼加載成員的空間，判斷是否死號。

通過抓包工具(fiddler等)，抓包，但我沒試過，因為我不知道抓哪個包。

通過Python代碼登錄QQ空間，進入群空間，用selenium獲取成員信息(推薦)，這種方式是最好的，因為總不能沒查一個網站，就去復制粘貼，正則匹配一番。

下面看登錄空間代碼：

import urllib

from selenium import webdriver

import time

import win32com.client

option= webdriver.ChromeOptions()

option.add_argument("--test-type")

option.add_argument(r'user-data-dir=******') #設置成用戶自己的數據目錄(谷歌瀏覽器用 chrome://version/查個人資料路徑)

#chromedriver2.9與chrome不兼容，用chromedriver2.4

driver = webdriver.Chrome(executable_path='c:\chromedriver\chromedriver.exe',chrome_options=option)

#不用上面的數據是可以正常登陸的，即只用下面一行，加載我的數據目錄更好

# driver = webdriver.Chrome(executable_path='c:\chromedriver\chromedriver.exe')

#從網頁端進入群成員界面，復制網址

driver.get('http://qun.qzone.qq.com/group#!/******/member')

driver.switch_to_frame('login_frame')

driver.find_element_by_id('switcher_plogin').click()

driver.find_element_by_id('u').clear()

#你的qq號

driver.find_element_by_id('u').send_keys('******')

driver.find_element_by_id('p').clear()

#你的密碼

driver.find_element_by_id('p').send_keys('*****')

driver.find_element_by_id('login_button').click()

time.sleep(10)

記得chrome瀏覽器要下載一個chromedriver驅動，怎么下自己去找。登錄后：

for handle in driver.window_handles:#方法二，始終獲得當前最后的窗口，所以多要多次使用

driver.switch_to_window(handle) #此行代碼用來定位當前頁面,要不然抓取的不是此頁面

#從去空間取得全部成員標簽類[數組]

qqslist = driver.find_elements_by_class_name("avatar_50")

print(driver.title)

numqq = 0

path = r'C:\Users\Desktop\qqmsg.txt'

#從去空間取得每個成員的信息[遍歷得到網址]

qqhreflist = list()

for qq in qqslist:

qqhref = qq.get_attribute("href")

numqq=numqq+1

print(qqhref)

qqhreflist.append(qqhref)

writeTxt(path,qqhref)

time.sleep(1)

print("getqqs sucesss")

print("start process")

然后到這里已經獲取到了所有成員的href數組：qqhreflist,接下來：

wpsApp = win32com.client.Dispatch("KET.Application")

wpsApp.Visible=1

xlBook = wpsApp.Workbooks.Add()

getSele(qqhreflist,xlBook)

try:

xlBook.SaveAs(r"C:\Users\Desktop\qqmsg.xls")

finally:

xlBook.Close()

wpsApp.Quit()

del wpsApp

driver.close()

driver.quit()

這里有一個win32com，自己下載，可能少一個capicom.dll，下載，管理員運行cmd 輸入:C:\Windows\SysWOW64\regsvr32.exe C:\Windows\SysWOW64\capicom.dll 就行了，這里我用的wps來接受數據，存入表格。

接下來是重頭戲，判別是否是死號，函數getSele(),代碼如下：

def getSele(list,xlBook):

xlBook.ActiveSheet.Cells(1, 1).Value = "昵稱" xlBook.ActiveSheet.Cells(1, 2).Value = "照片" xlBook.ActiveSheet.Cells(1, 3).Value = "說說" xlBook.ActiveSheet.Cells(1, 4).Value = "日志" path = r'C:\Users\煙魂\Desktop\data1.txt' numfalse = 0 numtrue = 0 numfail = 0 sum=0 for data in list:

sum = sum+1 print("當前第幾個："+str(sum))

driver.get(data)

time.sleep(0.1)

try:

driver.find_element_by_id("QM_Profile_Photo_Cnt")

driver.find_element_by_id("QM_Profile_Mood_Cnt")

driver.find_element_by_id("QM_Profile_Blog_Cnt")

except:

print("fail")

numfail = numfail + 1 try:

user_name = driver.find_elements_by_class_name("user_name")[0].text

if user_name=="":

selfname = "無名字，未知錯誤" print(selfname)

xlBook.ActiveSheet.Cells(sum + 1, 1).Value = selfname

else:

print(user_name)

xlBook.ActiveSheet.Cells(sum + 1, 1).Value = user_name

xlBook.ActiveSheet.Cells(sum + 1, 2).Value = "-1" xlBook.ActiveSheet.Cells(sum + 1, 3).Value = "-1" xlBook.ActiveSheet.Cells(sum + 1, 4).Value = "-1" print("-1")

writeTxt(path, user_name)

except:

print("none")

selfname = "自定義面板或自己" print(selfname)

xlBook.ActiveSheet.Cells(sum + 1, 1).Value = selfname

continue writeTxt(path, "fail")

continue numphoto = driver.find_element_by_id("QM_Profile_Photo_Cnt").text

nummood = driver.find_element_by_id("QM_Profile_Mood_Cnt").text

numblog = driver.find_element_by_id("QM_Profile_Blog_Cnt").text

if(numblog=="0" and numphoto=="0" and nummood=="0"):

numfalse=numfalse+1 else:

numtrue = numtrue +1 print(driver.title)

print(numphoto)

print(nummood)

print(numblog)

xlBook.ActiveSheet.Cells(sum+1, 1).Value = driver.title

xlBook.ActiveSheet.Cells(sum+1, 2).Value = numphoto

xlBook.ActiveSheet.Cells(sum+1, 3).Value = nummood

xlBook.ActiveSheet.Cells(sum+1, 4).Value = numblog

print("sucess:"+str(numtrue)+" fail:"+str(numfail)+" false:"+str(numfalse))

data = " drivertitle:" + driver.title + " numphoto:" + numphoto + " nummood:" + nummood + " numblog:" + numblog

writeTxt(path, data)

datanum = "numtrue: "+str(numtrue)+" numfalse:"+str(numfalse)+" numfail:"+str(numfail)

print(datanum)

writeTxt(path,datanum)

num = numfalse/(numtrue+numfail+numfalse)

datanums = "num: " + str(num)

print(datanums)

writeTxt(path,datanums)

writeTxt(path,"success")

然后一些其他函數：

def getHtml(url):

page = urllib.request.urlopen(url)

html = page.read()

html = html.decode('UTF-8')

return html

def readTxt(path):

file_object = open(path,'r',encoding= 'utf-8')

try:

all_the_text = file_object.read()

finally:

file_object.close()

return all_the_text

def writeTxt(path,data):

file_object = open(path,'a',encoding= 'utf-8')

try:

file_object.write(data)

finally:

file_object.close()

然后完了,Python自己抓取幾十分鐘就行了。代碼截圖

總結

以上是生活随笔為你收集整理的python爬取qq群成员_Python爬取QQ群群员的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python水平_如何在python中水
下一篇： python 文件状态_Python：如

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

python

python爬取qq群成员_Python爬取QQ群群员

總結