日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

python 抓取网页链接_从Python中的网页抓取链接

發布時間:2023/12/1 python 24 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python 抓取网页链接_从Python中的网页抓取链接 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

python 抓取網頁鏈接

Prerequisite:

先決條件:

  • Urllib3: It is a powerful, sanity-friendly HTTP client for Python with having many features like thread safety, client-side SSL/TSL verification, connection pooling, file uploading with multipart encoding, etc.

    Urllib3 :這是一個功能強大,對環境友好的Python HTTP客戶端,具有許多功能,例如線程安全,客戶端SSL / TSL驗證,連接池,使用多部分編碼的文件上傳等。

    Installing urllib3:

    安裝urllib3:

    $ pip install urllib3
  • BeautifulSoup: It is a Python library that is used to scrape/get information from the webpages, XML files i.e. for pulling data out of HTML and XML files.

    BeautifulSoup :這是一個Python庫,用于從網頁,XML文件中抓取/獲取信息,即從HTML和XML文件中提取數據。

    Installing BeautifulSoup:

    安裝BeautifulSoup:

    $ pip install beautifulsoup4
  • Commands Used:

    使用的命令:

    html= urllib.request.urlopen(url).read(): Opens the URL and reads the whole blob with newlines at the end and it all comes into one big string.

    html = urllib.request.urlopen(url).read() :打開URL并以換行符結尾讀取整個blob,所有這些都變成一個大字符串。

    soup= BeautifulSoup(html,'html.parser'): Using BeautifulSoup to parse the string BeautifulSoup converts the string and it just takes the whole file and uses the HTML parser, and we get back an object.

    soup = BeautifulSoup(html,'html.parser') :使用BeautifulSoup解析字符串BeautifulSoup轉換該字符串,它只獲取整個文件并使用HTML解析器,然后返回一個對象。

    tags= soup('a'): To get the list of all the anchor tags.

    tags =湯('a') :獲取所有錨標簽的列表。

    tag.get('href',None): Extract and get the data from the href.

    tag.get('href',None) :從href中提取并獲取數據。

    網頁鏈接的Python程序 (Python program to Links from a Webpage)

    # import statements import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup# Get links # URL of a WebPage url = input("Enter URL: ") # Open the URL and read the whole page html = urllib.request.urlopen(url).read() # Parse the string soup = BeautifulSoup(html, 'html.parser') # Retrieve all of the anchor tags # Returns a list of all the links tags = soup('a')#Prints all the links in the list tags for tag in tags: # Get the data from href keyprint(tag.get('href', None), end = "\n")

    Output:

    輸出:

    Enter URL: https://www.google.com/ https://www.google.com/imghp?hl=en&tab=wi https://maps.google.com/maps?hl=en&tab=wl https://play.google.com/?hl=en&tab=w8 https://www.youtube.com/?gl=US&tab=w1 https://news.google.com/nwshp?hl=en&tab=wn https://mail.google.com/mail/?tab=wmhttps://drive.google.com/?tab=wo https://www.google.com/intl/en/about/products?tab=wh http://www.google.com/history/optout?hl=en /preferences?hl=en https://accounts.google.com/ServiceLogin?hl=en&passive=true &continue=https://www.google.com/ /advanced_search?hl=en&authuser=0 /intl/en/ads/ /services/ /intl/en/about.html /intl/en/policies/privacy/ /intl/en/policies/terms/

    翻譯自: https://www.includehelp.com/python/scraping-links-from-a-webpage.aspx

    python 抓取網頁鏈接

    總結

    以上是生活随笔為你收集整理的python 抓取网页链接_从Python中的网页抓取链接的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。