日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程语言 > python >内容正文

python

python爬取b站排行榜_抓取+硒元素,获得Bilibili排行榜(紧急列表)(动态加载),scrapyselenium,获取,哔哩,应援...

發(fā)布時(shí)間:2023/12/31 python 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python爬取b站排行榜_抓取+硒元素,获得Bilibili排行榜(紧急列表)(动态加载),scrapyselenium,获取,哔哩,应援... 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

目標(biāo)數(shù)據(jù):

爬蟲代碼:

# -*- coding: utf-8 -*-

import scrapy

from bilibili_yy.items import BilibiliYyItem

import re

from selenium import webdriver

import pyperclip

class BiliSpider(scrapy.Spider):

name = 'bili'

# allowed_domains = ['manga.bilibili.com']

start_urls = ['https://manga.bilibili.com/ranking?from=manga_homepage#/ouenn/']

def __init__(self):

self.driver = webdriver.Chrome()

def parse(self, response):

item = BilibiliYyItem()

for data_s in response.xpath('//div[@class="rank-item dp-i-block border-box p-relative"]'):

pmqingkuang = data_s.xpath('.//div[starts-with(@class,"rank-movement p-absolute bg-center bg-cover bg-no-repeat")]/@class').extract()[0]

if len(data_s.xpath('.//span[starts-with(@class,"digit-item bg-center bg-contain bg-no-repeat dp-i-block digit-")]')) == 2:

item['paiming'] = re.findall(r"\d", data_s.xpath('.//span[starts-with(@class,"digit-item bg-center bg-contain bg-no-repeat dp-i-block digit-")]/@class').extract()[0])[0]+ re.findall(r"\d", data_s.xpath('.//span[starts-with(@class,"digit-item bg-center bg-contain bg-no-repeat dp-i-block digit-")]/@class').extract()[1])[0]

else:

item['paiming'] = re.findall(r"\d", data_s.xpath('.//span[starts-with(@class,"digit-item bg-center bg-contain bg-no-repeat dp-i-block digit-")]/@class').extract()[0])[0].zfill(2)

if 'hold' in pmqingkuang:

item['pmqingkuang'] = '保持'

elif 'up' in pmqingkuang:

item['pmqingkuang'] = '上升'

else:

item['pmqingkuang'] = '下降'

item['pic_link'] = data_s.xpath('.//div[starts-with(@class,"manga-cover bg-center bg-cover bg-no-repeat")]/@data-src').extract()[0]

item['cartoon_link'] ='https://manga.bilibili.com'+ data_s.xpath('.//a[starts-with(@class,"dp-block manga-title")]/@href').extract()[0]

item['name'] = data_s.xpath('.//a[starts-with(@class,"dp-block manga-title")]/text()').extract()[0]

item['author'] = data_s.xpath('.//p[@class="fans-author-text t-over-hidden t-no-wrap"]/text()').extract()[0]

item['fensizhi'] = data_s.xpath('.//p[@class="fans-value"]/text()').extract()[0].replace(' 萬 粉絲值','')

if data_s.xpath('.//div[@class="award-user-ctnr p-absolute w-100"]/div[2]/@title'):

item['zhugong1'] = data_s.xpath('.//div[@class="award-user-ctnr p-absolute w-100"]/div[2]/@title').extract()[0]

else:

item['zhugong1'] = ''

if data_s.xpath('.//div[@class="award-user-ctnr p-absolute w-100"]/div[3]/@title'):

item['zhugong2'] = data_s.xpath('.//div[@class="award-user-ctnr p-absolute w-100"]/div[3]/@title').extract()[0]

else:

item['zhugong2'] = ''

if data_s.xpath('.//div[@class="award-user-ctnr p-absolute w-100"]/div[4]/@title'):

item['zhugong3'] = data_s.xpath('.//div[@class="award-user-ctnr p-absolute w-100"]/div[4]/@title').extract()[0]

else:

item['zhugong3'] = ''

yield item

def close_spider(self,spider):

print('關(guān)閉瀏覽器對(duì)象')

self.driver.quit()

寫出mongo:

全部文件下載:

總結(jié)

以上是生活随笔為你收集整理的python爬取b站排行榜_抓取+硒元素,获得Bilibili排行榜(紧急列表)(动态加载),scrapyselenium,获取,哔哩,应援...的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。