Unicode 物件與編解碼器¶
Unicode 物件¶
Since the implementation of PEP 393 in Python 3.3, Unicode objects internally use a variety of representations, in order to allow handling the complete range of Unicode characters while staying memory efficient. There are special cases for strings where all code points are below 128, 256, or 65536; otherwise, code points must be below 1114112 (which is the full Unicode range).
UTF-8 表示法會在需要時建立並快取在 Unicode 物件中。
備註
自 Python 3.12 起,已移除 Py_UNICODE 表示法,並標示為已棄用的 API。更多資訊請參閱 PEP 623。
Unicode 型別¶
這些是 Python 中用於 Unicode 實作的基本 Unicode 物件型別:
-
PyTypeObject PyUnicode_Type¶
- 為 穩定 ABI 的一部分.
This instance of
PyTypeObjectrepresents the Python Unicode type. It is exposed to Python code asstr.
-
PyTypeObject PyUnicodeIter_Type¶
- 為 穩定 ABI 的一部分.
This instance of
PyTypeObjectrepresents the Python Unicode iterator type. It is used to iterate over Unicode string objects.
-
type Py_UCS4¶
-
type Py_UCS2¶
-
type Py_UCS1¶
- 為 穩定 ABI 的一部分.
These types are typedefs for unsigned integer types wide enough to contain characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with single Unicode characters, use
Py_UCS4.在 3.3 版被加入.
-
type PyASCIIObject¶
-
type PyCompactUnicodeObject¶
-
type PyUnicodeObject¶
These subtypes of
PyObjectrepresent a Python Unicode object. In almost all cases, they shouldn't be used directly, since all API functions that deal with Unicode objects take and returnPyObjectpointers.在 3.3 版被加入.
The structure of a particular object can be determined using the following macros. The macros cannot fail; their behavior is undefined if their argument is not a Python Unicode object.
-
PyUnicode_IS_COMPACT(o)¶
True if o uses the
PyCompactUnicodeObjectstructure.在 3.3 版被加入.
-
PyUnicode_IS_COMPACT_ASCII(o)¶
True if o uses the
PyASCIIObjectstructure.在 3.3 版被加入.
-
PyUnicode_IS_COMPACT(o)¶
The following APIs are C macros and static inlined functions for fast checks and access to internal read-only data of Unicode objects:
-
int PyUnicode_Check(PyObject *obj)¶
Return true if the object obj is a Unicode object or an instance of a Unicode subtype. This function always succeeds.
-
int PyUnicode_CheckExact(PyObject *obj)¶
Return true if the object obj is a Unicode object, but not an instance of a subtype. This function always succeeds.
-
Py_ssize_t PyUnicode_GET_LENGTH(PyObject *unicode)¶
Return the length of the Unicode string, in code points. unicode has to be a Unicode object in the "canonical" representation (not checked).
在 3.3 版被加入.