日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

bs4抓起大众点评的用户评论

發布時間:2025/3/21 编程问答 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 bs4抓起大众点评的用户评论 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

為什么80%的碼農都做不了架構師?>>> ??

抓起大眾點評的用戶評論

?#encoding='UTF-8' __author__?=?'Administrator' import?sys,urllib.request sys.path.append('./') import?sql #import?re,time import?time from?bs4?import?BeautifulSoup #------------------------------------------------------ def?Mysqls():return??sql.Mysql('127.0.0.1','root','123456','test_msccms') #------------------------------------------------------ class?dianping:def?__init__(self):self.names=''self.cturl=[]self.ctname=[]self.ctaddr=[]self.users=[]self.datas=[]self.tms=[]def?get_ct_url(self,htmlurl):self.htmlurl=htmlurlheaders?=?('User-Agent','Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/45.0.2454.93?Safari/537.36')opener?=?urllib.request.build_opener()opener.addheaders?=?[headers]htmlline?=?opener.open(self.htmlurl).read()#page=urllib.request.urlopen(self.htmlurl)#htmlline?=?page.read()#soup=BeautifulSoup(htmlline,"html.parser",from_encoding="UTF-8")soup=BeautifulSoup(htmlline,"html.parser")self.names=soup.span.stringprint('\n店名:',soup.span.string)#獲取餐廳名稱for?i?in??soup.find_all(attrs={"class"?:?"field-name"}):#psoup=BeautifulSoup(str(i),"html.parser")#self.ctname.append(psoup.div.string)try:#必須print打印,否則無法觸發異常,導致報錯程序停止print(i)psoup=BeautifulSoup(str(i),"html.parser")self.ctname.append(psoup.div.string)except:self.ctname.append('')pass#print(self.users)#獲取餐廳地址for?i?in??soup.find_all(attrs={"class"?:?"field-addr"}):psoup=BeautifulSoup(str(i),"html.parser")self.ctaddr.append(psoup.div.string)##獲取餐廳URLfor?i?in??soup.find_all(attrs={"target"?:?"_blank"}):psoup=BeautifulSoup(str(i),"html.parser")if?psoup.a.string?==?None:self.cturl.append(psoup.a.attrs['href'])#print(psoup.a.attrs['href'])#print(self.cturl)return?self.cturl,self.ctname,self.ctaddrdef?get_ct_pinlun(self,htmlurl):self.htmlurl=htmlurlpage?=?urllib.request.urlopen(self.htmlurl)htmlline?=?page.read()#soup=BeautifulSoup(htmlline,"html.parser",from_encoding="UTF-8")soup=BeautifulSoup(htmlline,"html.parser")self.names=soup.span.stringprint('\n店名:',soup.span.string)for?i?in??soup.find_all(attrs={"class"?:?"name","rel":"nofollow"}):psoup=BeautifulSoup(str(i),"html.parser")self.users.append(psoup.a.string)#print(self.users)for?i?in?soup.find_all("span",{"class"?:?"time"}):tmsoup=BeautifulSoup(str(i),"html.parser",exclude_encodings="UTF-8").span.string#????#tmsoup.span.stringtmsjoin=''.join(str(tmsoup).split('\xa0\xa0'))self.tms.append(tmsjoin)sps=soup.findAll("p",{"class"?:?"desc"})for?i?in??sps:strs=str(i).split()try:dts=strs[1].split('>')[1:][0].split('<')[0]self.datas.append(dts)except:#print('F',i)continue#print('--',self.datas)return?self.names,self.htmlurl,self.users,self.datas,self.tmsdef?get_ct_info(self,htmlurl):self.htmlurl=htmlurlheaders?=?('User-Agent','Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/45.0.2454.93?Safari/537.36')opener?=?urllib.request.build_opener()opener.addheaders?=?[headers]htmlline?=?opener.open(self.htmlurl).read()#page?=?urllib.request.urlopen(self.htmlurl)#htmlline?=?page.read()#soup=BeautifulSoup(htmlline,"html.parser",from_encoding="UTF-8")soup=BeautifulSoup(htmlline,"html.parser")#獲取餐廳名稱names=soup.title.string.split('電話')[0]#print('\n店名:',names)#獲取地址addrs=soup.find_all(attrs={"class"?:?"item","itemprop":"street-address"})ap=BeautifulSoup(str(addrs),"html.parser")addrs=ap.span.string.split()[0]#print(ap.span.string.split()[0])#獲取電話phone=soup.find_all(attrs={"class"?:?"item","itemprop":"tel"})pp=BeautifulSoup(str(phone),"html.parser")phones=pp.span.string.split()[0]#print(pp.span.string.split()[0])return?names,phones,addrsdef?run(self,htmlurl):#dianping().get_html_test(htmlurl)#print('--------------------')cturl,ctname,ctaddr=dianping().get_ct_url(htmlurl)#mysql=Mysqls()n=1for?u?in?ctname[1:]:try:print(htmlurl,cturl[n],u,ctaddr[n])names,addrs,phones=dianping().get_ct_info(cturl[n])print(names,addrs,phones)#sqls="insert?into??tongji_user_pinglun?(ctid,ctname,ctarea,source_url,username,content,cttms)?values(%s,'%s','%s','%s','%s','%s','%s');"#mysql.cmd(sqls%(ctid,names,ctarea,htmlurl,u,datas[n],tms[n]))#mysql.commit()except:print('F',u)n=n+1time.sleep(1)#mysql.close()##==============================================================================================================if?__name__?=="__main__":url='http://dpindex.dianping.com/dpindex?type=rank&p='for?i?in?range(1,51):print(url+str(i))dianping().run('http://dpindex.dianping.com/dpindex?type=rank&p=1')#dianping().get_ct_info('http://www.dianping.com/shop/4708533')pass


轉載于:https://my.oschina.net/jk409/blog/659108

《新程序員》:云原生和全面數字化實踐50位技術專家共同創作,文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的bs4抓起大众点评的用户评论的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。