當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

使用Xpath

發布時間：2025/3/14 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了使用Xpath 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

使用Xpath模塊

一、選取節點nodename 選取nodename節點的所有子節點 xpath(‘//div’) 選取了所有div節點 / 從根節點選取 xpath(‘/div’) 從根節點上選取div節點 // 選取所有的當前節點，不考慮他們的位置 xpath(‘//div’) 選取所有的div節點 . 選取當前節點 xpath(‘./div’) 選取當前節點下的div節點 .. 選取當前節點的父節點 xpath(‘..’) 回到上一個節點 @ 選取屬性 xpath（’//@calss’）選取所有的class屬性'''ret=selector.xpath("//div") ret=selector.xpath("/div") ret=selector.xpath("./div") ret=selector.xpath("//p[@id='p1']") ret=selector.xpath("//div[@class='d1']/div/p[@class='story']")''' 二、謂語表達式結果 xpath(‘/body/div[1]’) 選取body下的第一個div節點 xpath(‘/body/div[last()]’) 選取body下最后一個div節點 xpath(‘/body/div[last()-1]’) 選取body下倒數第二個div節點 xpath(‘/body/div[positon()<3]’) 選取body下前兩個div節點 xpath(‘/body/div[@class]’) 選取body下帶有class屬性的div節點 xpath(‘/body/div[@class=”main”]’) 選取body下class屬性為main的div節點 xpath(‘/body/div[@price>35.00]’) 選取body下price元素值大于35的div節點'''ret=selector.xpath("//p[@class='story']//a[2]") ret=selector.xpath("//p[@class='story']//a[last()]")''' 通配符 Xpath通過通配符來選取未知的XML元素表達式結果 xpath（’/div/*’）選取div下的所有子節點 xpath(‘/div[@*]’) 選取所有帶屬性的div節點'''ret=selector.xpath("//p[@class='story']/*") ret=selector.xpath("//p[@class='story']/a[@class]")''' 四、取多個路徑使用“|”運算符可以選取多個路徑表達式結果 xpath(‘//div|//table’) 選取所有的div和table節點'''ret=selector.xpath("//p[@class='story']/a[@class]|//div[@class='d3']") print(ret)'''五、Xpath軸軸可以定義相對于當前節點的節點集軸名稱表達式描述 ancestor xpath(‘./ancestor::*’) 選取當前節點的所有先輩節點（父、祖父） ancestor-or-self xpath(‘./ancestor-or-self::*’) 選取當前節點的所有先輩節點以及節點本身 attribute xpath(‘./attribute::*’) 選取當前節點的所有屬性 child xpath(‘./child::*’) 返回當前節點的所有子節點 descendant xpath(‘./descendant::*’) 返回當前節點的所有后代節點（子節點、孫節點） following xpath(‘./following::*’) 選取文檔中當前節點結束標簽后的所有節點 following-sibing xpath(‘./following-sibing::*’) 選取當前節點之后的兄弟節點 parent xpath(‘./parent::*’) 選取當前節點的父節點 preceding xpath(‘./preceding::*’) 選取文檔中當前節點開始標簽前的所有節點preceding-sibling xpath(‘./preceding-sibling::*’) 選取當前節點之前的兄弟節點 self xpath(‘./self::*’) 選取當前節點六、功能函數使用功能函數能夠更好的進行模糊搜索函數用法解釋 starts-with xpath(‘//div[starts-with(@id,”ma”)]‘) 選取id值以ma開頭的div節點 contains xpath(‘//div[contains(@id,”ma”)]‘) 選取id值包含ma的div節點 and xpath(‘//div[contains(@id,”ma”) and contains(@id,”in”)]‘) 選取id值包含ma和in的div節點 text() xpath(‘//div[contains(text(),”ma”)]‘) 選取節點文本包含ma的div節點 Element對象class xml.etree.ElementTree.Element(tag, attrib={}, **extra)tag：string，元素代表的數據種類。text：string，元素的內容。tail：string，元素的尾形。attrib：dictionary，元素的屬性字典。＃針對屬性的操作clear()：清空元素的后代、屬性、text和tail也設置為None。get(key, default=None)：獲取key對應的屬性值，如該屬性不存在則返回default值。items()：根據屬性字典返回一個列表，列表元素為(key, value）。keys()：返回包含所有元素屬性鍵的列表。set(key, value)：設置新的屬性鍵與值。＃針對后代的操作append(subelement)：添加直系子元素。extend(subelements)：增加一串元素對象作為子元素。＃python2.7新特性find(match)：尋找第一個匹配子元素，匹配對象可以為tag或path。findall(match)：尋找所有匹配子元素，匹配對象可以為tag或path。findtext(match)：尋找第一個匹配子元素，返回其text值。匹配對象可以為tag或path。insert(index, element)：在指定位置插入子元素。iter(tag=None)：生成遍歷當前元素所有后代或者給定tag的后代的迭代器。＃python2.7新特性iterfind(match)：根據tag或path查找所有的后代。itertext()：遍歷所有后代并返回text值。remove(subelement)：刪除子元素。

def get_page(url):import requestsheaders = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"}response = requests.get(url,headers=headers)return response.textdef page_parse(response):from lxml import etreetree = etree.HTML(response)title_list = tree.xpath('//div[@class="title"]/a/text()')detail_list = tree.xpath('//div[@class="houseInfo"]/text()')price_list = [price.xpath('string(.)') for price in tree.xpath('//div[@class="totalPrice"]')]msg_list =[]for i in range(len(title_list)):dic = {"title":title_list[i],"detail":detail_list[i],"price":price_list[i]}msg_list.append(dic)return msg_listdef save_data(argv):import jsonwith open("lianjie.txt",'a',encoding='utf-8') as f:for i in argv:f.write(json.dumps(i,ensure_ascii=False)+"\n")def mycrawler():from concurrent.futures import ThreadPoolExecutorp = ThreadPoolExecutor(5)l = []for i in range(1,11):url = "https://sz.lianjia.com/ershoufang/pg%srs南山區/"%iresponse = p.submit(get_page, url)l.append(response.result())for k in l:msg_list = page_parse(k)save_data(msg_list)print("done")def main():mycrawler()if __name__ == '__main__':main() 爬取鏈家網

轉載于:https://www.cnblogs.com/st-st/p/10307739.html

總結

以上是生活随笔為你收集整理的使用Xpath的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

xpath

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

使用Xpath

總結