Unicode 物件與編解碼器

Unicode 物件

Since the implementation of PEP 393 in Python 3.3, Unicode objects internally use a variety of representations, in order to allow handling the complete range of Unicode characters while staying memory efficient. There are special cases for strings where all code points are below 128, 256, or 65536; otherwise, code points must be below 1114112 (which is the full Unicode range).

UTF-8 表示法會在需要時建立並快取在 Unicode 物件中。

備註

自 Python 3.12 起,已移除 Py_UNICODE 表示法,並標示為已棄用的 API。更多資訊請參閱 PEP 623

Unicode 型別

這些是 Python 中用於 Unicode 實作的基本 Unicode 物件型別:

PyTypeObject PyUnicode_Type
穩定 ABI 的一部分.

This instance of PyTypeObject represents the Python Unicode type. It is exposed to Python code as str.

PyTypeObject PyUnicodeIter_Type
穩定 ABI 的一部分.

This instance of PyTypeObject represents the Python Unicode iterator type. It is used to iterate over Unicode string objects.

type Py_UCS4
type Py_UCS2
type Py_UCS1