日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

python 解析xml 文件: SAX方式

發布時間:2024/1/17 python 39 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python 解析xml 文件: SAX方式 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

環境

python:3.4.4

準備xml文件

首先新建一個xml文件,countries.xml。內容是在python官網上看到的。

<?xml version="1.0"?> <data><country name="Liechtenstein"><rank>1</rank><year>2008</year><gdppc>141100</gdppc><neighbor name="Austria" direction="E"/><neighbor name="Switzerland" direction="W"/></country><country name="Singapore"><rank>4</rank><year>2011</year><gdppc>59900</gdppc><neighbor name="Malaysia" direction="N"/></country><country name="Panama"><rank>68</rank><year>2011</year><gdppc>13600</gdppc><neighbor name="Costa Rica" direction="W"/><neighbor name="Colombia" direction="E"/></country> </data>

準備python文件

新建一個test_SAX.py,用來解析xml文件。

#!/usr/bin/python # -*- coding: UTF-8 -*-import xml.saxclass CountryHandler( xml.sax.ContentHandler ):def __init__(self):self.CurrentData = ""self.CurrentAttributes = ""self.rank = ""self.year = ""self.gdppc = ""self.nei_name = ""self.nei_dire = ""def startElement(self, tag, attributes):self.CurrentData = tagself.CurrentAttributes = attributesif tag == "country":print ("*****Country*****")name = attributes["name"]print ("Name:", name)if tag == "neighbor":self.nei_name = attributes["name"]self.nei_dire = attributes["direction"] def endElement(self, tag):if self.CurrentData == "rank":print ("Rank:", self.rank)elif self.CurrentData == "year":print ("Year:", self.year)elif self.CurrentData == "gdppc":print ("Gdppc:", self.gdppc)elif self.CurrentData == "neighbor":print ("Neighbor:", self.nei_name,self.nei_dire)self.CurrentData = ""self.nei_name = ""self.nei_dire = ""def characters(self, content):if self.CurrentData == "rank":self.rank = contentelif self.CurrentData == "year":self.year = contentelif self.CurrentData == "gdppc":self.gdppc = contentif __name__ == "__main__":parser = xml.sax.make_parser() parser.setFeature(xml.sax.handler.feature_namespaces, 0)Handler = CountryHandler()parser.setContentHandler( Handler )parser.parse("countries.xml")

執行結果

>python test_SAX.py *****Country***** Name: Liechtenstein Rank: 1 Year: 2008 Gdppc: 141100 Neighbor: Austria E Neighbor: Switzerland W *****Country***** Name: Singapore Rank: 4 Year: 2011 Gdppc: 59900 Neighbor: Malaysia N *****Country***** Name: Panama Rank: 68 Year: 2011 Gdppc: 13600 Neighbor: Costa Rica W Neighbor: Colombia E

備注

SAX是一種基于事件驅動的API。

SAX主要包括三種對象: readers,handlers 以及 input sources。即解析器,事件處理器以及輸入源。

解析器負責讀取輸入源,如xml文檔,并向事件處理器發送事件,如元素開始和元素結束事件。

事件處理器負責處理事件,對xml文檔數據進行處理。

parser = xml.sax.make_parser()

新建并且返回一個 SAX XMLReader 對象。

參見:?https://docs.python.org/2/library/xml.sax.html

xml.sax.make_parser([parser_list]) Create and return a SAX XMLReader object. The first parser found will be used. If parser_list is provided, it must be a sequence of strings which name modules that have a function named create_parser(). Modules listed in parser_list will be used before modules in the default list of parsers.

parser.setFeature(xml.sax.handler.feature_namespaces, 0)

設置xml.sax.handler.feature_namespaces值為0。其實就是關閉 namespace模式。

參見:https://docs.python.org/2/library/xml.sax.reader.html

XMLReader.setFeature(featurename, value) Set the featurename to value. If the feature is not recognized, SAXNotRecognizedException is raised. If the feature or its setting is not supported by the parser, SAXNotSupportedException is raised.

class CountryHandler( xml.sax.ContentHandler )

SAX API 定義了4種handler:content handler,DTD handler,error handlers,和 entity resolvers。

程序只需要實現自己感興趣的事件的接口,比如我們這里只實現了 ContentHandler接口里的部分方法。

class xml.sax.handler.ContentHandler This is the main callback interface in SAX, and the one most important to applications. The order of events in this interface mirrors the order of the information in the document.

ContentHandler 有很多方法。具體可參見:?https://docs.python.org/2/library/xml.sax.handler.html#contenthandler-objects

我們這里首先新建一個CountryHandler類,繼承自 xml.sax.ContentHandler。然后實現了他的?startElement(),endElement() 以及?characters()方法。

def startElement(self, tag, attributes)

遇到XML開始標簽時調用。tag是標簽的名字,attributes 是標簽的屬性值字典。

Signals the start of an element in non-namespace mode.The name parameter contains the raw XML 1.0 name of the element type as a string and the attrs parameter holds an object of the Attributes interface (see The Attributes Interface) containing the attributes of the element. The object passed as attrs may be re-used by the parser; holding on to a reference to it is not a reliable way to keep a copy of the attributes. To keep a copy of the attributes, use the copy() method of the attrs object.

def endElement(self, tag)

遇到XML結束標簽時調用。tag是標簽的名字。

Signals the end of an element in non-namespace mode.
The name parameter contains the name of the element type, just as with the startElement() event.

def characters(self, content)

遇到XML元素內容時調用。content為元素的內容值。

Receive notification of character data.The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.content may be a Unicode string or a byte string; the expat reader module produces always Unicode strings.

parser.setContentHandler( Handler )

設置當前的ContentHandler為我們自己寫的handler實例。如果不進行設置,content 事件會被忽略。

參見:https://docs.python.org/2/library/xml.sax.reader.html

XMLReader.setContentHandler(handler)? Set the current ContentHandler. If no ContentHandler is set, content events will be discarded.

parser.parse("countries.xml")

開始解析 xml文件。

參見:https://docs.python.org/2/library/xml.sax.reader.html

Process an input source, producing SAX events. The source object can be a system identifier (a string identifying the input source – typically a file name or an URL), a file-like object, or an InputSource object. When parse() returns, the input is completely processed, and the parser object can be discarded or reset. As a limitation, the current implementation only accepts byte streams; processing of character streams is for further study.

?

轉載于:https://www.cnblogs.com/miniren/p/5091744.html

總結

以上是生活随笔為你收集整理的python 解析xml 文件: SAX方式的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。