unicodedata --- Unicode 資料庫


This module provides access to the Unicode Character Database (UCD) which defines character properties for all Unicode characters. The data contained in this database is compiled from the UCD version 16.0.0.

The module uses the same names and symbols as defined by Unicode Standard Annex #44, "Unicode Character Database". It defines the following functions:

也參考

The Unicode HOWTO for more information about Unicode and how to use this module.

unicodedata.lookup(name)

Look up character by name. If a character with the given name is found, return the corresponding character. If not found, KeyError is raised. For example:

>>> unicodedata.lookup('LEFT CURLY BRACKET')
'{'

The characters returned by this function are the same as those produced by \N escape sequence in string literals. For example:

>>> unicodedata.lookup('MIDDLE DOT') == '\N{MIDDLE DOT}'
True

在 3.3 版的變更: 已新增對名稱別名 [1] 和命名序列 [2] 的支援。

unicodedata.name(chr, default=None, /)

Returns the name assigned to the character chr as a string. If no name is defined, default is returned, or, if not given, ValueError is raised. For example:

>>> unicodedata.name('½')
'VULGAR FRACTION ONE HALF'
>>> unicodedata.name('\uFFFF', 'fallback')
'fallback'
unicodedata.decimal(chr, default=None, /)

Returns the decimal value assigned to the character chr as integer. If no such value is defined, default is returned, or, if not given, ValueError is raised. For example:

>>> unicodedata.decimal('\N{ARABIC-INDIC DIGIT NINE}')
9
>>> unicodedata.decimal('\N{SUPERSCRIPT NINE}', -1)
-1
unicodedata.digit(chr, default=None, /)

Returns the digit value assigned to the character chr as integer. If no such value is defined, default is returned, or, if not given, ValueError is raised:

>>> unicodedata.digit('\N{SUPERSCRIPT NINE}')
9
unicodedata.numeric(chr, default=None, /)

Returns the numeric value assigned to the character chr as float. If no such value is defined, default is returned, or, if not given, ValueError is raised:

>>> unicodedata.numeric('½')
0.5
unicodedata.category(chr)

Returns the general category assigned to the character chr as string. General category names consist of two letters. See the General Category Values section of the Unicode Character Database documentation for a list of category codes. For example:

>>> unicodedata.category('A')  # 'L'etter, 'u'ppercase
'Lu'
unicodedata.bidirectional(chr)

Returns the bidirectional class assigned to the character chr as string. If no such value is defined, an empty string is returned. See the Bidirectional Class Values section of the Unicode Character Database documentation for a list of bidirectional codes. For example:

>>> unicodedata.bidirectional('\N{ARABIC-INDIC DIGIT SEVEN}') # 'A'rabic, 'N'umber
'AN'
unicodedata.combining(chr)

Returns the canonical combining class assigned to the character chr as integer. Returns 0 if no combining class is defined. See the Canonical Combining Class Values section of the Unicode Character Database for more information.

unicodedata.east_asian_width(chr)

Returns the east asian width assigned to the character chr as string. For a list of widths and or more information, see the Unicode Standard Annex #11.

unicodedata.mirrored(chr)

Returns the mirrored property assigned to the character