xml.parsers.expat --- 使用 Expat 進行快速 XML 剖析¶
備註
如果你需要剖析不受信任或未經驗證的資料,請參閱 XML 安全性。
The xml.parsers.expat module is a Python interface to the Expat
non-validating XML parser. The module provides a single extension type,
xmlparser, that represents the current state of an XML parser. After
an xmlparser object has been created, various attributes of the object
can be set to handler functions. When an XML document is then fed to the
parser, the handler functions are called for the character data and markup in
the XML document.
This module uses the pyexpat module to provide access to the Expat
parser. Direct use of the pyexpat module is deprecated.
這個模組提供一個例外和一個型別物件:
- exception xml.parsers.expat.ExpatError¶
The exception raised when Expat reports an error. See section ExpatError 例外 for more information on interpreting Expat errors.
- exception xml.parsers.expat.error¶
ExpatError的別名。
- xml.parsers.expat.XMLParserType¶
The type of the return values from the
ParserCreate()function.
xml.parsers.expat 模組包含兩個函式:
- xml.parsers.expat.ErrorString(errno)¶
回傳一個給定錯誤編號 errno 的解釋字串。
- xml.parsers.expat.ParserCreate(encoding=None, namespace_separator=None)¶
Creates and returns a new
xmlparserobject. encoding, if specified, must be a string naming the encoding used by the XML data. Expat doesn't support as many encodings as Python does, and its repertoire of encodings can't be extended; it supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII. If encoding [1] is given it will override the implicit or explicit encoding of the document.Parsers created through
ParserCreate()are called "root" parsers, in the sense that they do not have any parent parser attached. Non-root parsers are created byparser.ExternalEntityParserCreate.Expat can optionally do XML namespace processing for you, enabled by providing a value for namespace_separator. The value must be a one-character string; a
ValueErrorwill be raised if the string has an illegal length (Noneis considered the same as omission). When namespace processing is enabled, element type names and attribute names that belong to a namespace will be expanded. The element name passed to the element handlersStartElementHandlerandEndElementHandlerwill be the concatenation of the namespace URI, the namespace separator character, and the local part of the name. If the namespace separator is a zero byte (chr(0)) then the namespace URI and the local part will be concatenated without any separator.For example, if namespace_separator is set to a space character (
' ') and the following document is parsed:<?xml version="1.0"?> <root xmlns = "http://default-namespace.org/" xmlns:py = "http://www.python.org/ns/"> <py:elem1 /> <elem2 xmlns="" /> </root>
StartElementHandler將會收到每個元素的以下字串:http://default-namespace.org/ root http://www.python.org/ns/ elem1 elem2
Due to limitations in the
Expatlibrary used bypyexpat, thexmlparserinstance returned can only be used to parse a single XML document. CallParserCreatefor each document to provide unique parser instances.
也參考
- Expat XML 剖析器
Expat 專案的首頁。
XMLParser 物件¶
xmlparser 物件擁有以下方法:
- xmlparser.Parse(data[, isfinal])¶
Parses the contents of the string data, calling the appropriate handler functions to process the parsed data. isfinal must be true on the final call to this method; it allows the parsing of a single file in fragments, not the submission of multiple files. data can be the empty string at any time.
- xmlparser.ParseFile(file)¶
Parse XML data reading from the object file. file only needs to provide the
read(nbytes)method, returning the empty string when there's no more data.
- xmlparser.SetBase(base)¶
Sets the base to be used for resolving relative URIs in system identifiers in declarations. Resolving relative identifiers is left to the application: this value will be passed through as the base argument to the
ExternalEntityRefHandler(),NotationDeclHandler(), andUnparsedEntityDeclHandler()functions.
- xmlparser.GetBase()¶
Returns a string containing the base set by a previous call to
SetBase(), orNoneifSetBase()hasn't been called.
- xmlparser.GetInputContext()¶
Returns the input data that generated the current event as a string. The data is in the encoding of the entity which contains the text. When called while an event handler is not active, the return value is
None.
- xmlparser.ExternalEntityParserCreate(context[, encoding])¶
Create a "child" parser which can be used to parse an external parsed entity referred to by content parsed by the parent parser. The context parameter should be the string passed to the
ExternalEntityRefHandler()handler function, described below. The child parser is created with theordered_attributesandspecified_attributesset to the values of this parser.
- xmlparser.SetParamEntityParsing(flag)¶
Control parsing of parameter entities (including the external DTD subset). Possible flag values are
XML_PARAM_ENTITY_PARSING_NEVER,XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONEandXML_PARAM_ENTITY_PARSING_ALWAYS. Return true if setting the flag was successful.
- xmlparser.UseForeignDTD([flag])¶
Calling this with a true value for flag (the default) will cause Expat to call the
ExternalEntityRefHandlerwithNonefor all arguments to allow an alternate DTD to be loaded. If the document does not contain a document type declaration, theExternalEntityRefHandlerwill still be called, but theStartDoctypeDeclHandlerandEndDoctypeDeclHandlerwill not be called.Passing a false value for flag will cancel a previous call that passed a true value, but otherwise has no effect.
This method can only be called before the
Parse()orParseFile()methods are called; calling it after either of those have been called causesExpatErrorto be raised with thecodeattribute set toerrors.codes[errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING].
- xmlparser.SetReparseDeferralEnabled(enabled)¶
警告
Calling
SetReparseDeferralEnabled(False)has security implications, as detailed below; please make sure to understand these consequences prior to using theSetReparseDeferralEnabledmethod.Expat 2.6.0 introduced a security mechanism called "reparse deferral" where instead of causing denial of service through quadratic runtime from reparsing large tokens, reparsing of unfinished tokens is now delayed by default until a sufficient amount of input is reached. Due to this delay, registered handlers may — depending of the sizing of input chunks pushed to Expat — no longer be called right after pushing new input to the parser. Where immediate feedback and taking over responsibility of protecting against denial of service from large tokens are both wanted, calling
SetReparseDeferralEnabled(False)disables reparse deferral for the current Expat parser instance, temporarily or altogether. CallingSetReparseDeferralEnabled(True)allows re-enabling reparse deferral.Note that
SetReparseDeferralEnabled()has been backported to some prior releases of CPython as a security fix. Check for availability of