日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

Python爬虫遍历文档树

發(fā)布時間:2025/3/20 python 17 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Python爬虫遍历文档树 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

1.直接子節(jié)點:.contents .children屬性

.content

Tag的.content屬性可以將Tag的子節(jié)點以列表的方式輸出

from bs4 import BeautifulSouphtml = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p># 創(chuàng)建 Beautiful Soup 對象,指定lxml解析器 soup = BeautifulSoup(html, "lxml")# 輸出方式為列表 print(soup.head.contents)print(soup.head.contents[0])

運行結果

[<title>The Dormouse's story</title>] <title>The Dormouse's story</title>

.children

它返回的不是一個列表,不過我們可以通過遍歷獲取所有的子節(jié)點。

from bs4 import BeautifulSouphtml = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """# 創(chuàng)建 Beautiful Soup 對象,指定lxml解析器 soup = BeautifulSoup(html, "lxml")# 輸出方式為列表生成器對象 print(soup.head.children)# 通過遍歷獲取所有子節(jié)點 for child in soup.head.children:print(child)

運行結果

<list_iterator object at 0x008FF950> <title>The Dormouse's story</title>

2.所有子孫節(jié)點:.descendants屬性

上面講的.contents和.children屬性僅包含Tag的直接子節(jié)點,.descendants屬性可以對所有Tag的子孫節(jié)點進行遞歸循環(huán),和children類似,我們也需要通過遍歷的方式獲取其中的內(nèi)容。

''' 遇到問題沒人解答?小編創(chuàng)建了一個Python學習交流QQ群:531509025 尋找有志同道合的小伙伴,互幫互助,群里還有不錯的視頻學習教程和PDF電子書! ''' from bs4 import BeautifulSouphtml = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """# 創(chuàng)建 Beautiful Soup 對象,指定lxml解析器 soup = BeautifulSoup(html, "lxml")# 輸出方式為列表生成器對象 print(soup.head.descendants)# 通過遍歷獲取所有子孫節(jié)點 for child in soup.head.descendants:print(child)

運行結果

<generator object descendants at 0x00519AB0> <title>The Dormouse's story</title> The Dormouse's story

3.節(jié)點內(nèi)容:.string屬性

如果Tag只有一個NavigableString類型子節(jié)點,那么這個Tag可以使用.string得到子節(jié)點。如果一個Tag僅有一個子節(jié)點,那么這個Tab也可以使用.string方法,輸出結果與當前唯一子節(jié)點的.string結果相同。

通俗點來講就是:如果一個標簽里面沒有標簽了,那么.string就會返回標簽里面的內(nèi)容。如果標簽里面只有唯一的一個標簽了,那么.string也會返回里面的內(nèi)容。例如:

from bs4 import BeautifulSouphtml = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """# 創(chuàng)建 Beautiful Soup 對象,指定lxml解析器 soup = BeautifulSoup(html, "lxml")print(soup.head.string)print(soup.head.title.string)

運行結果

The Dormouse's story The Dormouse's story

總結

以上是生活随笔為你收集整理的Python爬虫遍历文档树的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。