xml.etree.cElementTree --- ElementTree XML API¶
原始碼:Lib/xml/etree/ElementTree.py
The xml.etree.ElementTree module implements a simple and efficient API
for parsing and creating XML data.
在 3.3 版的變更: This module will use a fast implementation whenever available.
在 3.3 版之後被棄用: xml.etree.cElementTree 模組已被棄用。
備註
如果你需要剖析不受信任或未經驗證的資料,請參閱 XML 安全性。
教學¶
This is a short tutorial for using xml.etree.ElementTree (ET in
short). The goal is to demonstrate some of the building blocks and basic
concepts of the module.
XML tree and elements¶
XML is an inherently hierarchical data format, and the most natural way to
represent it is with a tree. ET has two classes for this purpose -
ElementTree represents the whole XML document as a tree, and
Element represents a single node in this tree. Interactions with
the whole document (reading and writing to/from files) are usually done
on the ElementTree level. Interactions with a single XML element
and its sub-elements are done on the Element level.
剖析 XML¶
We'll be using the fictive country_data.xml XML document as the sample data for this section:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
We can import this data by reading from a file:
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
Or directly from a string:
root = ET.fromstring(country_data_as_string)
fromstring() parses XML from a string directly into an Element,
which is the root element of the parsed tree. Other parsing functions may
create an ElementTree. Check the documentation to be sure.
As an Element, root has a tag and a dictionary of attributes:
>>> root.tag
'data'
>>> root.attrib
{}
It also has children nodes over which we can iterate:
>>> for child in root:
... print(child.tag, child.attrib)
...
country {'name': 'Liechtenstein'}
country {'name': 'Singapore'}
country {'name': 'Panama'}
Children are nested, and we can access specific child nodes by index:
>>> root[0][1].text
'2008'
備註
Not all elements of the XML input will end up as elements of the
parsed tree. Currently, this module skips over any XML comments,
processing instructions, and document type declarations in the
input. Nevertheless, trees built using this module's API rather
than parsing from XML text can have comments and processing
instructions in them; they will be included when generating XML
output. A document type declaration may be accessed by passing a
custom TreeBuilder instance to the XMLParser
constructor.
Pull API for non-blocking parsing¶
Most parsing functions provided by this module require the whole document
to be read at once before returning any result. It is possible to use an
XMLParser and feed data into it incrementally, but it is a push API that
calls methods on a callback target, which is too low-level and inconvenient for
most needs. Sometimes what the user really wants is to be able to parse XML
incrementally, without blocking operations, while enjoying the convenience of
fully constructed Element objects.
The most powerful tool for doing this is XMLPullParser. It does not
require a blocking read to obtain the XML data, and is instead fed with data
incrementally with XMLPullParser.feed() calls. To get the parsed XML
elements, call XMLPullParser.read_events(). Here is an example:
>>> parser = ET.XMLPullParser(['start', 'end'])
>>> parser.feed('<mytag>sometext')
>>> list(parser.read_events())
[('start', <Element 'mytag' at 0x7fa66db2be58>)]
>>> parser.feed(' more text</mytag>')
>>> for event, elem in parser.read_events():
... print(event)
... print(elem.tag, 'text=', elem.text)
...
end
mytag text= sometext more text
The obvious use case is applications that operate in a non-blocking fashion where the XML data is being received from a socket or read incrementally from some storage device. In such cases, blocking reads are unacceptable.
Because it's so flexible, XMLPullParser can be inconvenient to use for
simpler use-cases. If you don't mind your application blocking on reading XML
data but would still like to have incremental parsing capabilities, take a look
at iterparse(). It can be useful when you're reading a large XML document
and don't want to hold it wholly in memory.
Where immediate feedback through events is wanted, calling method
XMLPullParser.flush() can help reduce delay;
please make sure to study the related security notes.
Finding interesting elements¶
Element has some useful methods that help iterate recursively over all
the sub-tree below it (its children, their children, and so on). For example,
Element.iter():
>>> for neighbor in root.iter('neighbor'):
... print(neighbor.attrib)
...
{'name': 'Austria', 'direction': 'E'}
{'name': 'Switzerland', 'direction': 'W'}
{'name': 'Malaysia', 'direction': 'N'}
{'name': 'Costa Rica', 'direction': 'W'}
{'name': 'Colombia', 'direction': 'E'}
Element.findall() finds only elements with a tag which are direct
children of the current element. Element.find() finds the first child
with a particular tag, and Element.text accesses the element's text
content. Element.get() accesses the element's attributes:
>>> for country in root.findall('country'):
... rank = country.find('rank').text
... name = country.get('name')
... print(name, rank)
...
Liechtenstein 1
Singapore 4
Panama 68
More sophisticated specification of which elements to look for is possible by using XPath.
改動 XML 檔案¶
ElementTree provides a simple way to build XML documents and write them to files.
The ElementTree.write() method serves this purpose.
Once created, an Element object may be manipulated by directly changing
its fields (such as Element.text), adding and modifying attributes
(Element.set() method), as well as adding new children (for example
with Element.append()).
Let's say we want to add one to each country's rank, and add an updated
attribute to the rank element:
>>> for rank in root.iter('rank'):
... new_rank = int(rank.text) + 1
... rank.text = str(new_rank)
... rank.set('updated', 'yes')
...
>>> tree.write('output.xml')
XML 現在看起來像這樣:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
We can remove elements using Element.remove(). Let's say we want to
remove all countries with a rank higher than 50:
>>> for country in root.findall('country'):
... # 使用 root.findall() 來避免在遍歷時移除
... rank = int(country.find('rank').text)
... if rank > 50:
... root.remove(country)
...
>>> tree.write('output.xml')
Note that concurrent modification while iterating can lead to problems,
just like when iterating and modifying Python lists or dicts.
Therefore, the example first collects all matching elements with
root.findall(), and only then iterates over the list of matches.
XML 現在看起來像這樣:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
</data>
建立 XML 文件¶
The SubElement() function also provides a convenient way to create new
sub-elements for a given element:
>>> a = ET.Element('a')
>>> b = ET.SubElement(a, 'b')
>>> c = ET.SubElement(a, 'c')
>>> d = ET.SubElement(c, 'd')
>>> ET.dump(a)
<a><b /><c><d /></c></a>
Parsing XML with Namespaces¶
If the XML input has namespaces, tags and attributes
with prefixes in the form prefix:sometag get expanded to
{uri}sometag where the prefix is replaced by the full URI.
Also, if there is a default namespace,
that full URI gets prepended to all of the non-prefixed tags.
Here is an XML example that incorporates two namespaces, one with the prefix "fictional" and the other serving as the default namespace:
<?xml version="1.0"?>
<actors xmlns:fictional="http://characters.example.com"
xmlns="http://people.example.com">
<actor>
<name>John Cleese</name>
<fictional:character>Lancelot</fictional:character>
<fictional:character>Archie Leach</fictional:character>
</actor>
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>
One way to search and explore this XML example is to manually add the
URI to every tag or attribute in the xpath of a
find() or findall():
root = fromstring(xml_text)
for actor in root.findall('{http://people.example.com}actor'):
name = actor.find('{http://people.example.com}name')
print(name.text)
for char in actor.findall('{http://characters.example.com}character'):
print(' |-->', char.text)
A better way to search the namespaced XML example is to create a dictionary with your own prefixes and use those in the search functions:
ns = {'real_person': 'http://people.example.com',
'role': 'http://characters.example.com'}
for actor in root.findall('real_person:actor', ns):
name = actor.find('real_person:name', ns)
print(name.text)
for char in actor.findall('role:character', ns):
print(' |-->', char.text)
These two approaches both output:
John Cleese
|--> Lancelot
|--> Archie Leach
Eric Idle
|--> Sir Robin
|--> Gunther
|--> Commander Clement
XPath 支援¶
This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.
範例¶
Here's an example that demonstrates some of the XPath capabilities of the
module. We'll be using the countrydata XML document from the
Parsing XML section:
import xml.etree.ElementTree as ET
root = ET.fromstring(countrydata)
# Top-level elements
root.findall(".")
# All 'neighbor' grand-children of 'country' children of the top-level
# elements
root.findall("./country/neighbor")
# Nodes with name='Singapore' that have a 'year' child
root.findall(".//year/..[@name='Singapore']")
# 'year' nodes that are children of nodes with name='Singapore'
root.findall(".//*[@name='Singapore']/year")
# All 'neighbor' nodes that are the second child of their parent
root.findall(".//neighbor[2]")
For XML with namespaces, use the usual qualified {namespace}tag notation:
# All dublin-core "title" tags in the document
root.findall(".//{http://purl.org/dc/elements/1.1/}title")
支援的 XPath 語法¶
語法 |
意義 |
|---|---|
|
Selects all child elements with the given tag.
For example, 在 3.8 版的變更: 新增對星號萬用字元的支援。 |
|
Selects all child elements, including comments and
processing instructions. For example, |
|
Selects the current node. This is mostly useful at the beginning of the path, to indicate that it's a relative path. |
|
Selects all subelements, on all levels beneath the
current element. For example, |
|
Selects the parent element. Returns |
|
選擇所有具有給定屬性的元素。 |
|
Selects all elements for which the given attribute has the given value. The value cannot contain quotes. |
|
Selects all elements for which the given attribute does not have the given value. The value cannot contain quotes. 在 3.10 版被加入. |
|
Selects all elements that have a child named
|
|
Selects all elements whose complete text content,
including descendants, equals the given 在 3.7 版被加入. |
|
Selects all elements whose complete text content,
including descendants, does not equal the given
在 3.10 版被加入. |
|
Selects all elements that have a child named
|
|
Selects all elements that have a child named
在 3.10 版被加入. |
|
Selects all elements that are located at the given
position. The position can be either an integer
(1 is the first position), the expression |
Predicates (expressions within square brackets) must be preceded by a tag
name, an asterisk, or another predicate. position predicates must be
preceded by a tag name.
Reference¶
函式¶
- xml.etree.ElementTree.canonicalize(xml_data=None, *, out=None, from_file=None, **options)¶
C14N 2.0 transformation function.
Canonicalization is a way to normalise XML output in a way that allows byte-by-byte comparisons and digital signatures. It reduces the freedom that XML serializers have and instead generates a more constrained XML representation. The main restrictions regard the placement of namespace declarations, the ordering of attributes, and ignorable whitespace.
This function takes an XML data string (xml_data) or a file path or file-like object (from_file) as input, converts it to the canonical form, and writes it out using the out file(-like) object, if provided, or returns it as a text string if not. The output file receives text, not bytes. It should therefore be opened in text mode with
utf-8encoding.Typical uses:
xml_data = "<root>...</root>" print(canonicalize(xml_data)) with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: canonicalize(xml_data, out=out_file) with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: canonicalize(from_file="inputfile.xml", out=out_file)
The configuration options are as follows:
with_comments: set to true to include comments (default: false)
- strip_text: set to true to strip whitespace before and after text content
(default: false)
- rewrite_prefixes: set to true to replace namespace prefixes by "n{number}"
(default: false)
- qname_aware_tags: a set of qname aware tag names in which prefixes
should be replaced in text content (default: empty)
- qname_aware_attrs: a set of qname aware attribute names in which prefixes
should be replaced in text content (default: empty)
exclude_attrs: a set of attribute names that should not be serialised
exclude_tags: a set of tag names that should not be serialised
In the option list above, "a set" refers to any collection or iterable of strings, no ordering is expected.
在 3.8 版被加入.
- xml.etree.ElementTree.Comment(text=None)¶
Comment element factory. This factory function creates a special element that will be serialized as an XML comment by the standard serializer. The comment string can be either a bytestring or a Unicode string. text is a string containing the comment string. Returns an element instance representing a comment.
Note that
XMLParserskips over comments in the input instead of creating comment objects for them. AnElementTreewill only contain comment nodes if they have been inserted into to the tree using one of theElementmethods.
- xml.etree.ElementTree.dump(elem)¶
Writes an element tree or element structure to sys.stdout. This function should be used for debugging only.
The exact output format is implementation dependent. In this version, it's written as an ordinary XML file.
elem is an element tree or an individual element.
在 3.8 版的變更: The
dump()function now preserves the attribute order specified by the user.
- xml.etree.ElementTree.fromstring(text, parser=None)¶
Parses an XML section from a string constant. Same as
XML(). text is a string containing XML data. parser is an optional parser instance. If not given, the standardXMLParserparser is used. Returns anElementinstance.
- xml.etree.ElementTree.fromstringlist(sequence, parser=None)¶
Parses an XML document from a sequence of string fragments. sequence is a list or other sequence containing XML data fragments. parser is an optional parser instance. If not given, the standard
XMLParserparser is used. Returns anElementinstance.在 3.2 版被加入.
- xml.etree.ElementTree.indent(tree, space=' ', level=0)¶
Appends whitespace to the subtree to indent the tree visually. This can be used to generate pretty-printed XML output. tree can be an Element or ElementTree. space is the whitespace string that will be inserted for each indentation level, two space characters by default. For indenting partial subtrees inside of an already indented tree, pass the initial indentation level as level.
在 3.9 版被加入.
- xml.etree.ElementTree.iselement(element)¶
Check if an object appears to be a valid element object. element is an element instance. Return
Trueif this is an element object.
- xml.etree.ElementTree.iterparse(source, events=None, parser=None)¶
Parses an XML section into an element tree incrementally, and reports what's going on to the user. source is a filename or file object containing XML data. events is a sequence of events to report back. The supported events are the strings
"start","end","comment","pi","start-ns"and"end-ns"(the "ns" events are used to get detailed namespace information). If events is omitted, only"end"events are reported. parser is an optional parser instance. If not given, the standardXMLParserparser is used. parser must be a subclass ofXMLParserand can only use the defaultTreeBuilderas a target. Returns an iterator providing(event, elem)pairs; it has arootattribute that references the root element of the resulting XML tree once source is fully read. The iterator has theclose()method that closes the internal file object if source is a filename.Note that while
iterparse()builds the tree incrementally, it issues blocking reads on source (or the file it names). As such, it's unsuitable for applications where blocking reads can't be made. For fully non-blocking parsing, seeXMLPullParser.備註
iterparse()only guarantees that it has seen the ">" character of a starting tag when it emits a "start" event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. The same applies to the element children; they may or may not be present.If you need a fully populated element, look for "end" events instead.
在 3.4 版之後被棄用: parser 引數。
在 3.8 版的變更: 新增 context 與 check_hostname 事件。
在 3.13 版的變更: 新增
close()方法。
- xml.etree.ElementTree.parse(source, parser=None)¶
Parses an XML section into an element tree. source is a filename or file object containing XML data. parser is an optional parser instance. If not given, the standard
XMLParserparser is used. Returns anElementTreeinstance.
- xml.etree.ElementTree.ProcessingInstruction(target, text=None)¶
PI element factory. This factory function creates a special element that will be serialized as an XML processing instruction. target is a string containing the PI target. text is a string containing the PI contents, if given. Returns an element instance, representing a processing instruction.
Note that
XMLParserskips over processing instructions in the input instead of creating PI objects for them. AnElementTreewill only contain processing instruction nodes if they have been inserted into to the tree using one of theElementmethods.
- xml.etree.ElementTree.register_namespace(prefix, uri)¶
Registers a namespace prefix. The registry is global, and any existing mapping for either the given prefix or the namespace URI will be removed. prefix is a namespace prefix. uri is a namespace uri. Tags and attributes in this namespace will be serialized with the given prefix, if at all possible.
在 3.2 版被加入.
- xml.etree.ElementTree.SubElement(parent, tag, attrib={}, **extra)¶
Subelement factory. This function creates an element instance, and appends it to an existing element.
The element name, attribute names, and attribute values can be either bytestrings or Unicode strings. parent is the parent element. tag is the subelement name. attrib is an optional dictionary, containing element attributes. extra contains additional attributes, given as keyword arguments. Returns an element instance.
- xml.etree.ElementTree.tostring(element, encoding='us-ascii', method='xml', *, xml_declaration=None, default_namespace=None, short_empty_elements=True)¶
Generates a string representation of an XML element, including all subelements. element is an
Elementinstance. encoding [1] is the output encoding (default is US-ASCII). Useencoding="unicode"to generate a Unicode string (otherwise, a bytestring is generated). method is either"xml","html"or"text"(default is"xml"). xml_declaration, default_namespace and short_empty_elements has the same meaning as inElementTree.write(). Returns an (optionally) encoded string containing the XML data.在 3.4 版的變更: 新增 short_empty_elements 參數。
在 3.8 版的變更: 新增 xml_declaration 與 default_namespace 參數。
在 3.8 版的變更: The
tostring()function now preserves the attribute order specified by the user.
- xml.etree.ElementTree.tostringlist(element, encoding='us-ascii', method='xml', *, xml_declaration=None, default_namespace=None, short_empty_elements=True)¶
Generates a string representation of an XML element, including all subelements. element is an
Elementinstance. encoding [1] is the output encoding (default is US-ASCII). Useencoding="unicode"to generate a Unicode string (otherwise, a bytestring is generated). method is either"xml","html"or"text"(default is"xml"). xml_declaration, default_namespace and short_empty_elements has the same meaning as inElementTree.write(). Returns a list of (optionally) encoded strings containing the XML data. It does not guarantee any specific sequence, except thatb"".join(tostringlist(element)) == tostring(element).在 3.2 版被加入.
在 3.4 版的變更: 新增 short_empty_elements 參數。
在 3.8 版的變更: 新增 xml_declaration 與 default_namespace 參數。
在 3.8 版的變更: The
tostringlist()function now preserves the attribute order specified by the user.
- xml.etree.ElementTree.XML(text, parser=None)¶
Parses an XML section from a string constant. This function can be used to embed "XML literals" in Python code. text is a string containing XML data. parser is an optional parser instance. If not given, the standard
XMLParserparser is used. Returns anElementinstance.
- xml.etree.ElementTree.XMLID(text, parser=None)¶
Parses an XML section from a string constant, and also returns a dictionary which maps from element id:s to elements. text is a string containing XML data. parser is an optional parser instance. If not given, the standard
XMLParserparser is used. Returns a tuple containing anElementinstance and a dictionary.
XInclude support¶
This module provides limited support for
XInclude directives, via the xml.etree.ElementInclude helper module. This module can be used to insert subtrees and text strings into element trees, based on information in the tree.
範例¶
Here's an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the {http://www.w3.org/2001/XInclude}include element and set the parse attribute to "xml", and use the href attribute to specify the document to include.
<?xml version="1.0"?>
<document xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="source.xml" parse="xml" />
</document>
By default, the href attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax.
To process this file, load it as usual, and pass the root element to the xml.etree.ElementTree module:
from xml.etree import ElementTree, ElementInclude
tree = ElementTree.parse("document.xml")
root = tree.getroot()
ElementInclude.include(root)
The ElementInclude module replaces the {http://www.w3.org/2001/XInclude}include element with the root element from the source.xml document. The result might look something like this:
<document xmlns:xi="http://www.w3.org/2001/XInclude">
<para>This is a paragraph.</para>
</document>
If the parse attribute is omitted, it defaults to "xml". The href attribute is required.
To include a text document, use the {http://www.w3.org/2001/XInclude}include element, and set the parse attribute to "text":
<?xml version="1.0"?>
<document xmlns:xi="http://www.w3.org/2001/XInclude">
Copyright (c) <xi:include href="year.txt" parse="text" />.
</document>
The result might look something like:
<document xmlns:xi="http://www.w3.org/2001/XInclude">
Copyright (c) 2003.
</document>
Reference¶
函式¶
- xml.etree.ElementInclude.default_loader(href, parse, encoding=None)¶
Default loader. This default loader reads an included resource from disk. href is a URL. parse is for parse mode either "xml" or "text". encoding is an optional text encoding. If not given, encoding is
utf-8. Returns the expanded resource. If the parse mode is"xml", this is anElementinstance. If the parse mode is"text", this is a string. If the loader fails, it can returnNoneor raise an exception.
- xml.etree.ElementInclude.include(elem,