获取西刺代理的IP
環境:Python2.7,requests,bs4,re,
獲取數據的網址:西刺代理
得到的結果,后續可自行構建代理池或者保存文件:
代碼段:
#coding=utf8 import requests from bs4 import BeautifulSoup import re import os.pathuser_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)' headers = {'User-Agent': user_agent}def getListProxies():session = requests.session()page = session.get("http://www.xicidaili.com/nn", headers=headers)soup = BeautifulSoup(page.text, 'lxml')proxyList = []taglist = soup.find_all('tr', attrs={'class': re.compile("(odd)|()")})for trtag in taglist:tdlist = trtag.find_all('td')proxy = {'http': tdlist[1].string + ':' + tdlist[2].string,'https': tdlist[1].string + ':' + tdlist[2].string}# url = "http://ip.chinaz.com/getip.aspx" # 用來測試IP是否可用的url(現在該網址好像不能使用)# try:# print('proxy is ',proxy)# response = session.get(url, proxies=proxy, timeout=5)# print(response)# proxyList.append(proxy)# if (len(proxyList) == 3):# break# except Exception, e:# continueproxyList.append(proxy)#設定代理ip個數if len(proxyList)>=10:breakreturn proxyListres=getListProxies() print len(res) print(res)?
總結
- 上一篇: halcon找矩形顶点的一种方法
- 下一篇: JAVA 疯狂讲义 学习笔记