xml.etree.cElementTree --- ElementTree XML API

原始碼:Lib/xml/etree/ElementTree.py


The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data.

在 3.3 版的變更: This module will use a fast implementation whenever available.

在 3.3 版之後被棄用: xml.etree.cElementTree 模組已被棄用。

備註

如果你需要剖析不受信任或未經驗證的資料,請參閱 XML 安全性

教學

This is a short tutorial for using xml.etree.ElementTree (ET in short). The goal is to demonstrate some of the building blocks and basic concepts of the module.

XML tree and elements

XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. ET has two classes for this purpose - ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree. Interactions with the whole document (reading and writing to/from files) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.

剖析 XML

We'll be using the fictive country_data.xml XML document as the sample data for this section:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

We can import this data by reading from a file:

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()

Or directly from a string:

root = ET.fromstring(country_data_as_string)

fromstring() parses XML from a string directly into an Element, which is the root element of the parsed tree. Other parsing functions may create an ElementTree. Check the documentation to be sure.

As an Element, root has a tag and a dictionary of attributes:

>>> root.tag
'data'
>>> root.attrib
{}

It also has children nodes over which we can iterate:

>>> for child in root:
...     print(child.tag, child.attrib)
...
country {'name': 'Liechtenstein'}
country {'name': 'Singapore'}
country {'name': 'Panama'}

Children are nested, and we can access specific child nodes by index:

>>> root[0][1].text
'2008'

備註

Not all elements of the XML input will end up as elements of the parsed tree. Currently, this module skips over any XML comments, processing instructions, and document type declarations in the input. Nevertheless, trees built using this module's API rather than parsing from XML text can have comments and processing instructions in them; they will be included when generating XML output. A document type declaration may be accessed by passing a custom TreeBuilder instance to the XMLParser constructor.

Pull API for non-blocking parsing

Most parsing functions provided by this module require the whole document to be read at once before returning any result. It is possible to use an XMLParser and feed data into it incrementally, but it is a push API that calls methods on a callback target, which is too low-level and inconvenient for most needs. Sometimes what the user really wants is to be able to parse XML incrementally, without blocking operations, while enjoying the convenience of fully constructed Element objects.

The most powerful tool for doing this is XMLPullParser. It does not require a blocking read to obtain the XML data, and is instead fed with data incrementally with XMLPullParser.feed() calls. To get the parsed XML elements, call XMLPullParser.read_events(). Here is an example:

>>> parser = ET.XMLPullParser(['start', 'end'])
>>> parser.feed('<mytag>sometext')
>>> list(parser.read_events())
[('start', <Element 'mytag' at 0x7fa66db2be58>)]
>>> parser.feed(' more text</mytag>')
>>> for event, elem in parser.read_events():
...     print(event)
...     print(elem.tag, 'text=', elem.text)
...
end
mytag text= sometext more text

The obvious use case is applications that operate in a non-blocking fashion where the XML data is being received from a socket or read incrementally from some storage device. In such cases, blocking reads are unacceptable.

Because it's so flexible, XMLPullParser can be inconvenient to use for simpler use-cases. If you don't mind your application blocking on reading XML data but would still like to have incremental parsing capabilities, take a look at iterparse(). It can be useful when you're reading a large XML document and don't want to hold it wholly in memory.

Where immediate feedback through events is wanted, calling method XMLPullParser.flush() can help reduce delay; please make sure to study the related security notes.

Finding interesting elements

Element has some useful methods that help iterate recursively over all the sub-tree below it (its children, their children, and so on). For example, Element.iter():

>>> for neighbor in root.iter('neighbor'):
...     print(neighbor.attrib)
...
{'name': 'Austria', 'direction': 'E'}
{'name': 'Switzerland', 'direction': 'W'}
{'name': 'Malaysia', 'direction': 'N'}
{'name': 'Costa Rica', 'direction': 'W'}
{'name': 'Colombia', 'direction': 'E'}

Element.findall() finds only elements with a tag which are direct children of the current element. Element.find() finds the first child with a particular tag, and Element.text accesses the element's text content. Element.get() accesses the element's attributes:

>>> for country in root.findall('country'):
...     rank = country.find('rank').text
...     name = country.get('name')
...     print(name, rank)
...
Liechtenstein 1
Singapore 4
Panama 68

More sophisticated specification of which elements to look for is possible by using XPath.

改動 XML 檔案

ElementTree provides a simple way to build XML documents and write them to files. The ElementTree.write() method serves this purpose.

Once created, an Element object may be manipulated by directly changing its fields (such as Element.text), adding and modifying attributes (Element.set() method), as well as adding new children (for example with Element.append()).

Let's say we want to add one to each country's rank, and add an updated attribute to the rank element:

>>> for rank in root.iter('rank'):
...     new_rank = int(rank.text) + 1
...     rank.text = str(new_rank)
...     rank.set('updated', 'yes')
...
>>> tree.write('output.xml')

XML 現在看起來像這樣:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

We can remove elements using Element.remove(). Let's say we want to remove all countries with a rank higher than 50:

>>> for country in root.findall('country'):
...     # 使用 root.findall() 來避免在遍歷時移除
...     rank = int(country.find('rank').text)
...     if rank > 50:
...         root.remove(country)
...
>>> tree.write('output.xml')

Note that concurrent modification while iterating can lead to problems, just like when iterating and modifying Python lists or dicts. Therefore, the example first collects all matching elements with root.findall(), and only then iterates over the list of matches.

XML 現在看起來像這樣:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
</data>

建立 XML 文件

The SubElement() function also provides a convenient way to create new sub-elements for a given element:

>>> a = ET.Element('a')
>>> b = ET.SubElement(a, 'b')
>>> c = ET.SubElement(a, 'c')
>>> d = ET.SubElement(c, 'd')
>>> ET.dump(a)
<a><b /><c><d /></c></a>

Parsing XML with Namespaces

If the XML input has namespaces, tags and attributes with prefixes in the form prefix:sometag get expanded to {uri}sometag where the prefix is replaced by the full URI. Also, if there is a default namespace, that full URI gets prepended to all of the non-prefixed tags.

Here is an XML example that incorporates two namespaces, one with the prefix "fictional" and the other serving as the default namespace:

<?xml version="1.0"?>
<actors xmlns:fictional="http://characters.example.com"
        xmlns="http://people.example.com">
    <actor>
        <name>John Cleese</name>
        <fictional:character>Lancelot</fictional:character>
        <fictional:character>Archie Leach</fictional:character>
    </actor>
    <actor>
        <name>Eric Idle</name>
        <fictional:character>Sir Robin</fictional:character>
        <fictional:character>Gunther</fictional:character>
        <fictional:character>Commander Clement</fictional:character>
    </actor>
</actors>

One way to search and explore this XML example is to manually add the URI to every tag or attribute in the xpath of a find() or findall():

root = fromstring(xml_text)
for actor in root.findall('{http://people.example.com}actor'):
    name = actor.find('{http://people.example.com}name')
    print(name.text)
    for char in actor.findall('{http://characters.example.com}character'):
        print(' |-->', char.text)

A better way to search the namespaced XML example is to create a dictionary with your own prefixes and use those in the search functions:

ns = {'real_person': 'http://people.example.com',
      'role': 'http://characters.example.com'}

for actor in root.findall('real_person:actor', ns):
    name = actor.find('real_person:name', ns)
    print(name.text)
    for char in actor.findall('role:character', ns):
        print(' |-->', char.text)

These two approaches both output:

John Cleese
 |--> Lancelot
 |--> Archie Leach
Eric Idle
 |--> Sir Robin
 |--> Gunther
 |--> Commander Clement

XPath 支援

This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.

範例

Here's an example that demonstrates some of the XPath capabilities of the module. We'll be using the countrydata XML document from the Parsing XML section:

import xml.etree.ElementTree as ET

root = ET.fromstring(countrydata)

# Top-level elements
root.findall(".")

# All 'neighbor' grand-children of 'country' children of the top-level
# elements
root.findall("./country/neighbor")

# Nodes with name='Singapore' that have a 'year' child
root.findall(".//year/..[@name='Singapore']")

# 'year' nodes that are children of nodes with name='Singapore'
root.findall(".//*[@name='Singapore']/year")

# All 'neighbor' nodes that are the second child of their parent
root.findall(".//neighbor[2]")

For XML with namespaces, use the usual qualified {namespace}tag notation:

# All dublin-core "title" tags in the document
root.findall(".//{http://purl.org/dc/elements/1.1/}title")

支援的 XPath 語法

語法

意義

tag

Selects all child elements with the given tag. For example, spam selects all child elements named spam, and spam/egg selects all grandchildren named egg in all children named spam. {namespace}* selects all tags in the given namespace, {*}spam selects tags named spam in any (or no) namespace, and {}* only selects tags that are not in a namespace.

在 3.8 版的變更: 新增對星號萬用字元的支援。

*

Selects all child elements, including comments and processing instructions. For example, */egg selects all grandchildren named egg.

.

Selects the current node. This is mostly useful at the beginning of the path, to indicate that it's a relative path.

//

Selects all subelements, on all levels beneath the current element. For example, .//egg selects all egg elements in the entire tree.

..

Selects the parent element. Returns None if the path attempts to reach the ancestors of the start element (the element find was called on).

[@attrib]

選擇所有具有給定屬性的元素。

[@attrib='value']

Selects all elements for which the given attribute has the given value. The value cannot contain quotes.

[@attrib!='value']

Selects all elements for which the given attribute does not have the given value. The value cannot contain quotes.

在 3.10 版被加入.

[tag]

Selects all elements that have a child named tag. Only immediate children are supported.

[.='text']

Selects all elements whose complete text content, including descendants, equals the given text.

在 3.7 版被加入.

[.!='text']

Selects all elements whose complete text content, including descendants, does not equal the given text.

在 3.10 版被加入.

[tag='text']

Selects all elements that have a child named tag whose complete text content, including descendants, equals the given text.

[tag!='text']

Selects all elements that have a child named tag whose complete text content, including descendants, does not equal the given text.

在 3.10 版被加入.

[position]

Selects all elements that are located at the given position. The position can be either an integer (1 is the first position), the expression last() (for the last position), or a position relative to the last position (e.g. last()-1).

Predicates (expressions within square brackets) must be preceded by a tag name, an asterisk, or another predicate. position predicates must be preceded by a tag name.

Reference

函式

xml.etree.ElementTree.canonicalize(xml_data=None, *, out=None, from_file=None, **options)

C14N 2.0 transformation function.

Canonicalization is a way to normalise XML output in a way that allows byte-by-byte comparisons and digital signatures. It reduces the freedom that XML serializers have and instead generates a more constrained XML representation. The main restrictions regard the placement of namespace declarations, the ordering of attributes, and ignorable whitespace.

This function takes an XML data string (xml_data) or a file path or file-like object (from_file) as input, converts it to the canonical form, and writes it out using the out file(-like) object, if provided, or returns it as a text string if not. The output file receives text, not bytes. It should therefore be opened in text mode with utf-8 encoding.

Typical uses:

xml_data = "<root>...</root>"
print(canonicalize(xml_data))

with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
    canonicalize(xml_data, out=out_file)

with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
    canonicalize(from_file="inputfile.xml", out=out_file)

The configuration options are as follows:

  • with_comments: set to true to include comments (default: false)

  • strip_text: set to true to strip whitespace before and after text content

    (default: false)

  • rewrite_prefixes: set to true to replace namespace prefixes by "n{number}"

    (default: false)

  • qname_aware_tags: a set of qname aware tag names in which prefixes

    should be replaced in text content (default: empty)

  • qname_aware_attrs: a set of qname aware attribute names in which prefixes

    should be replaced in text content (default: empty)

  • exclude_attrs: a set of attribute names that should not be serialised

  • exclude_tags: a set of tag names that should not be serialised

In the option list above, "a set" refers to any collection or iterable of strings, no ordering is expected.

在 3.8 版被加入.

xml.etree.ElementTree.Comment(text=None)

Comment element factory. This factory function creates a special element that will be serialized as an XML comment by the standard serializer. The comment string can be either a bytestring or a Unicode string. text is a string containing the comment string. Returns an element instance representing a comment.

Note that XMLParser skips over comments in the input instead of creating comment objects for them. An ElementTree will only contain comment nodes if they have been inserted into to the tree using one of the Element methods.

xml.etree.ElementTree.dump(elem)

Writes an element tree or element structure to sys.stdout. This function should be used for debugging only.

The exact output format is implementation dependent. In this version, it's written as an ordinary XML file.

elem is an element tree or an individual element.

在 3.8 版的變更: The dump() function now preserves the attribute order specified by the user.

xml.etree.ElementTree.fromstring(text, parser=None)

Parses an XML section from a string constant. Same as XML(). text is a string containing XML data. parser is an optional parser instance. If not given, the standard XMLParser parser is used. Returns an Element instance.

xml.etree.ElementTree.fromstringlist(sequence, parser=None)

Parses an XML document from a sequence of string fragments. sequence is a list or other sequence containing XML data fragments. parser is an optional parser instance. If not given, the standard XMLParser parser is used. Returns an Element instance.

在 3.2 版被加入.

xml.etree.ElementTree.indent(tree, space='  ', level=0)

Appends whitespace to the subtree to indent the tree visually. This can be used to generate pretty-printed XML output. tree can be an Element or ElementTree. space is the whitespace string that will be inserted for each indentation level, two space characters by default. For indenting partial subtrees inside of an already indented tree, pass the initial indentation level as level.

在 3.9 版被加入.

xml.etree.ElementTree.iselement(element)

Check if an object appears to be a valid element object. element is an element instance. Return True if this is an element object.

xml.etree.ElementTree.iterparse(source, events=None, parser=None)

Parses an XML section into an element tree incrementally, and reports what's going on to the user. source is a filename or file object containing XML data. events is a sequence of events to report back. The supported events are the strings "start", "end", "comment", "pi", "start-ns" and "end-ns" (the "ns" events are used to get detailed namespace information). If events is omitted, only "end" events are reported. parser is an optional parser instance. If not given, the standard XMLParser parser is used. parser must be a subclass of XMLParser and can only use the default TreeBuilder as a target. Returns an iterator providing (event, elem) pairs; it has a root attribute that references the root element of the resulting XML tree once source is fully read. The iterator has the close() method that closes the internal file object if source is a filename.

Note that while iterparse() builds the tree incrementally, it issues blocking reads on source (or the file it names). As such, it's unsuitable for applications where blocking reads can't be made. For fully non-blocking parsing, see XMLPullParser.

備註

iterparse() only guarantees that it has seen the ">" character of a starting tag when it emits a "start" event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. The same applies to the element children; they may or may not be present.

If you need a fully populated element, look for "end" events instead.

在 3.4 版之後被棄用: parser 引數。

在 3.8 版的變更: 新增 contextcheck_hostname 事件。

在 3.13 版的變更: 新增 close() 方法。

xml.etree.ElementTree.parse(source, parser=None)

Parses an XML section into an element tree. source is a filename or file object containing XML data. parser is an optional parser instance. If not given, the standard XMLParser parser is used. Returns an ElementTree instance.

xml.etree.ElementTree.ProcessingInstruction(target, text=None)

PI element factory. This factory function creates a special element that will be serialized as an XML processing instruction. target is a string containing the PI target. text is a string containing the PI contents, if given. Returns an element instance, representing a processing instruction.

Note that XMLParser skips over processing instructions in the input instead of creating PI objects for them. An ElementTree will only contain processing instruction nodes if they have been inserted into to the tree using one of the Element methods.

xml.etree.ElementTree.register_namespace(prefix, uri)

Registers a namespace prefix. The registry is global, and any existing mapping for either the given prefix or the namespace URI will be removed. prefix is a namespace prefix. uri is a namespace uri. Tags and attributes in this namespace will be serialized with the given prefix, if at all possible.

在 3.2 版被加入.

xml.etree.ElementTree.SubElement(parent, tag, attrib={}, **extra)

Subelement factory. This function creates an element instance, and appends it to an existing element.

The element name, attribute names, and attribute values can be either bytestrings or Unicode strings. parent is the parent element. tag is the subelement name. attrib is an optional dictionary, containing element attributes. extra contains additional attributes, given as keyword arguments. Returns an element instance.

xml.etree.ElementTree.tostring(element, encoding='us-ascii', method='xml', *, xml_declaration=None, default_namespace=None, short_empty_elements=True)

Generates a string representation of an XML element, including all subelements. element is an Element instance. encoding [1] is the output encoding (default is US-ASCII). Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated). method is either "xml", "html" or "text" (default is "xml"). xml_declaration, default_namespace and short_empty_elements has the same meaning as in ElementTree.write(). Returns an (optionally) encoded string containing the XML data.

在 3.4 版的變更: 新增 short_empty_elements 參數。

在 3.8 版的變更: 新增 xml_declarationdefault_namespace 參數。

在 3.8 版的變更: The tostring() function now preserves the attribute order specified by the user.

xml.etree.ElementTree.tostringlist(element, encoding='us-ascii', method='xml', *, xml_declaration=None, default_namespace=None, short_empty_elements=True)

Generates a string representation of an XML element, including all subelements. element is an Element instance. encoding [1] is the output encoding (default is US-ASCII). Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated). method is either "xml", "html" or "text" (default is "xml"). xml_declaration, default_namespace and short_empty_elements has the same meaning as in ElementTree.write(). Returns a list of (optionally) encoded strings containing the XML data. It does not guarantee any specific sequence, except that b"".join(tostringlist(element)) == tostring(element).

在 3.2 版被加入.

在 3.4 版的變更: 新增 short_empty_elements 參數。

在 3.8 版的變更: 新增 xml_declarationdefault_namespace 參數。

在 3.8 版的變更: The tostringlist() function now preserves the attribute order specified by the user.

xml.etree.ElementTree.XML(text, parser=None)

Parses an XML section from a string constant. This function can be used to embed "XML literals" in Python code. text is a string containing XML data. parser is an optional parser instance. If not given, the standard XMLParser parser is used. Returns an Element instance.

xml.etree.ElementTree.XMLID(text, parser=None)

Parses an XML section from a string constant, and also returns a dictionary which maps from element id:s to elements. text is a string containing XML data. parser is an optional parser instance. If not given, the standard XMLParser parser is used. Returns a tuple containing an Element instance and a dictionary.

XInclude support

This module provides limited support for XInclude directives, via the xml.etree.ElementInclude helper module. This module can be used to insert subtrees and text strings into element trees, based on information in the tree.

範例

Here's an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the {http://www.w3.org/2001/XInclude}include element and set the parse attribute to "xml", and use the href attribute to specify the document to include.

<?xml version="1.0"?>
<document xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include href="source.xml" parse="xml" />
</document>

By default, the href attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax.

To process this file, load it as usual, and pass the root element to the xml.etree.ElementTree module:

from xml.etree import ElementTree, ElementInclude

tree = ElementTree.parse("document.xml")
root = tree.getroot()

ElementInclude.include(root)

The ElementInclude module replaces the {http://www.w3.org/2001/XInclude}include element with the root element from the source.xml document. The result might look something like this:

<document xmlns:xi="http://www.w3.org/2001/XInclude">
  <para>This is a paragraph.</para>
</document>

If the parse attribute is omitted, it defaults to "xml". The href attribute is required.

To include a text document, use the {http://www.w3.org/2001/XInclude}include element, and set the parse attribute to "text":

<?xml version="1.0"?>
<document xmlns:xi="http://www.w3.org/2001/XInclude">
  Copyright (c) <xi:include href="year.txt" parse="text" />.
</document>

The result might look something like:

<document xmlns:xi="http://www.w3.org/2001/XInclude">
  Copyright (c) 2003.
</document>

Reference

函式

xml.etree.ElementInclude.default_loader(href, parse, encoding=None)

Default loader. This default loader reads an included resource from disk. href is a URL. parse is for parse mode either "xml" or "text". encoding is an optional text encoding. If not given, encoding is utf-8. Returns the expanded resource. If the parse mode is "xml", this is an Element instance. If the parse mode is "text", this is a string. If the loader fails, it can return None or raise an exception.

xml.etree.ElementInclude.include(elem,