3. 資料模型

3.1. 物件、數值和型別

物件 是 Python 為資料的抽象表示方式。一個 Python 程式當中的所有資料皆由物件或物件之間的關係來呈現。程式碼也都是以物件呈現的。

每個物件都有一個識別性、型別,和數值。物件的識別性在物件建立後永遠不會改變;你也可以把它想成是該物件在記憶體中的位址。is 運算子會比較兩個物件的識別性是否相同;id() 函式則會回傳代表一個該物件的識別性的整數。

在 CPython 當中,id(x) 就是 x 所儲存在的記憶體位址。

一個物件的型別決定了該物件所支援的操作(例如「它有長度嗎?」),也同時定義該型別的物件能夠擁有的數值。type() 函式會回傳一個物件的型別(而該型別本身也是一個物件)。如同它的識別性,一個物件的型別 (type) 也是不可變的。[1]

某些物件的數值可被改變,這種物件稱作「可變的」(mutable);建立後數值不能變更的物件則稱作「不可變的」(immutable)。(不可變的容器物件中如果包含對於可變物件的參照,則後者的數值改變的時候前者的數值也會跟著一起改變;這種時候該容器仍會被當成是不可變的,因為它包含的物件集合仍然無法變更。因此可變或不可變嚴格說起並不等同於數值是否能被改變,它的定義有其他不明顯的細節。)一個物件是否為可變取決於它的型別;舉例來說,數字、字串和 tuple 是不可變的,而字典與串列則是可變的。

物件永遠不會被明示的摧毀;但當它們變得不再能夠存取的時候可能會被作為垃圾回收。每個實作都能延後垃圾回收或是乾脆忽略它 --- 垃圾回收如何進行完全取決於各個實作,只要沒有被回收的物件仍是可達的。

CPython 目前使用一種參照計數的方案,並提供可選的循環連結垃圾延遲偵測,這個方案會在大部分物件變得不可存取時馬上回收它們,但不保證能夠回收包含循環參照的垃圾。關於控制循環垃圾回收的資訊請見 gc 模組的說明文件。其他實作的行為不會相同,CPython 也有可能改變,因此請不要仰賴物件在變得不可存取時能夠馬上被最終化(亦即你應該總是明確關閉檔案)。

請注意,使用一個實作的追蹤或除錯工具可能會讓原本能夠回收的物件被維持存活。也請注意,使用 try...except 陳述式來抓捕例外也可能會讓物件維持存活。

某些物件包含對於「外部」資源的參照,像是開啟的檔案或是視窗。基本上這些資源會在物件被回收時釋放,但因為垃圾回收不保證會發生,這種物件也會提供明確釋放外部資源的方式 --- 通常是 close() method。強烈建議各個程式明確關閉這種物件。try...finally 陳述式與 with 陳述式提供進行明確關閉的方便手段。

某些物件包含對於其他物件的參照;這種物件被叫做「容器」。容器的範例有 tuple、串列與字典。這些參照是容器的數值的一部分。通常當我們提到容器的數值的時候,我們指的是其中包含的物件的數值,而不是它們的識別性;但當我們提到容器是否可變的時候,我們指的是直接包含在其中的物件的識別性。因此,如果一個不可變的容器(像一個 tuple)包含對於可變物件的參照,該可變物件被變更時該容器的數值也會跟著變更。

型別幾乎影響物件行為的所有面向。就連物件識別性的重要性某種程度上也受型別影響:對於不可變的型別,計算新數值的操作可能其實會回傳一個某個相同型別且相同數值的現存物件的參照;對於可變型別這則不會發生。舉例來說,在進行 a = 1; b = 1 之後,ab 可能會參照同一個物件,也可能不會,取決於所使用的實作。這是因為 int 是不可變的型別,因此 1 的參照可以重複利用。這個行為取決於所使用的實作,因此不應該依賴它,但在進行物件識別性測試的時候還是需要注意有這件事情。而在進行 c = []; d = [] 之後,cd 則保證會參照兩個不同、獨特、且新建立的空白串列。(請注意,e = f = [] 則會將同一個物件同時指派給 ef。)

3.2. 標準型別階層

Below is a list of the types that are built into Python. Extension modules (written in C, Java, or other languages, depending on the implementation) can define additional types. Future versions of Python may add types to the type hierarchy (e.g., rational numbers, efficiently stored arrays of integers, etc.), although such additions will often be provided via the standard library instead.

Some of the type descriptions below contain a paragraph listing 'special attributes.' These are attributes that provide access to the implementation and are not intended for general use. Their definition may change in the future.

3.2.1. None

這個型別只有一個數值。只有一個物件有這個數值。這個物件由內建名稱 None 存取。它用來在許多情況下代表數值不存在,例如沒有明確回傳任何東西的函式就會回傳這個物件。它的真值是 false。

3.2.2. NotImplemented

這個型別只有一個數值。只有一個物件有這個數值。這個物件由內建名稱 NotImplemented 存取。數字方法和 rich comparison 方法應該在沒有為所提供的運算元實作該操作的時候回傳這個數值。(直譯器接下來則會依運算子嘗試反轉的操作或是其他的後備方案。)它不應該在預期布林值的情境中被計算。

更多細節請見 實作算術操作

在 3.9 版的變更: Evaluating NotImplemented in a boolean context was deprecated.

在 3.14 版的變更: 在預期布林值的情境中計算 NotImplemented 現在會引發 TypeError。它先前會計算為 True,並自 Python 3.9 起會發出 DeprecationWarning

3.2.3. Ellipsis

這個型別只有一個數值。只有一個物件有這個數值。這個物件由文本 ... 或內建名稱 Ellipsis 存取。它的真值是 true。

3.2.4. numbers.Number

These are created by numeric literals and returned as results by arithmetic operators and arithmetic built-in functions. Numeric objects are immutable; once created their value never changes. Python numbers are of course strongly related to mathematical numbers, but subject to the limitations of numerical representation in computers.

The string representations of the numeric classes, computed by __repr__() and __str__(), have the following properties:

  • They are valid numeric literals which, when passed to their class constructor, produce an object having the value of the original numeric.

  • The representation is in base 10, when possible.

  • Leading zeros, possibly excepting a single zero before a decimal point, are not shown.

  • Trailing zeros, possibly excepting a single zero after a decimal point, are not shown.

  • A sign is shown only when the number is negative.

Python distinguishes between integers, floating-point numbers, and complex numbers:

3.2.4.1. numbers.Integral

These represent elements from the mathematical set of integers (positive and negative).

備註

The rules for integer representation are intended to give the most meaningful interpretation of shift and mask operations involving negative integers.

There are two types of integers:

Integers (int)

These represent numbers in an unlimited range, subject to available (virtual) memory only. For the purpose of shift and mask operations, a binary representation is assumed, and negative numbers are represented in a variant of 2's complement which gives the illusion of an infinite string of sign bits extending to the left.

Booleans (bool)

These represent the truth values False and True. The two objects representing the values False and True are the only Boolean objects. The Boolean type is a subtype of the integer type, and Boolean values behave like the values 0 and 1, respectively, in almost all contexts, the exception being that when converted to a string, the strings "False" or "True" are returned, respectively.

3.2.4.2. numbers.Real (float)

These represent machine-level double precision floating-point numbers. You are at the mercy of the underlying machine architecture (and C or Java implementation) for the accepted range and handling of overflow. Python does not support single-precision floating-point numbers; the savings in processor and memory usage that are usually the reason for using these are dwarfed by the overhead of using objects in Python, so there is no reason to complicate the language with two kinds of floating-point numbers.

3.2.4.3. numbers.Complex (complex)

These represent complex numbers as a pair of machine-level double precision floating-point numbers. The same caveats apply as for floating-point numbers. The real and imaginary parts of a complex number z can be retrieved through the read-only attributes z.real and z.imag.

3.2.5. Sequences

These represent finite ordered sets indexed by non-negative numbers. The built-in function len() returns the number of items of a sequence. When the length of a sequence is n, the index set contains the numbers 0, 1, ..., n-1. Item i of sequence a is selected by a[i]. Some sequences, including built-in sequences, interpret negative subscripts by adding the sequence length. For example, a[-2] equals a[n-2], the second to last item of sequence a with length n.

The resulting value must be a nonnegative integer less than the number of items in the sequence. If it is not, an IndexError is raised.

Sequences also support slicing: a[start:stop] selects all items with index k such that start <= k < stop. When used as an expression, a slice is a sequence of the same type. The comment above about negative subscripts also applies to negative slice positions. Note that no error is raised if a slice position is less than zero or larger than the length of the sequence.

If start is missing or None, slicing behaves as if start was zero. If stop is missing or None, slicing behaves as if stop was equal to the length of the sequence.

Some sequences also support "extended slicing" with a third "step" parameter: a[i:j:k] selects all items of a with index x where x = i + n*k, n >= 0 and i <= x < j.

Sequences are distinguished according to their mutability:

3.2.5.1. Immutable sequences

An object of an immutable sequence type cannot change once it is created. (If the object contains references to other objects, these other objects may be mutable and may be changed; however, the collection of objects directly referenced by an immutable object cannot change.)

The following types are immutable sequences:

字串 (String)

A string (str) is a sequence of values that represent characters, or more formally, Unicode code points. All the code points in the range 0 to 0x10FFFF can be represented in a string.

Python doesn't have a dedicated character type. Instead, every code point in the string is represented as a string object with length 1.

The built-in function ord() converts a code point from its string form to an integer in the range 0 to 0x10FFFF; chr() converts an integer in the range 0 to 0x10FFFF to the corresponding length 1 string object. str.encode() can be used to convert a str to bytes using the given text encoding, and bytes.decode() can be used to achieve the opposite.

Tuple(元組)

The items of a tuple are arbitrary Python objects. Tuples of two or more items are formed by comma-separated lists of expressions. A tuple of one item (a 'singleton') can be formed by affixing a comma to an expression (an expression by itself does not create a tuple, since parentheses must be usable for grouping of expressions). An empty tuple can be formed by an empty pair of parentheses.

位元組

A bytes object is an immutable array. The items are 8-bit bytes, represented by integers in the range 0 <= x < 256. Bytes literals (like b'abc') and the built-in bytes() constructor can be used to create bytes objects. Also, bytes objects can be decoded to strings via the decode() method.

3.2.5.2. 可變序列

Mutable sequences can be changed after they are created. The subscription and slicing notations can be used as the target of assignment and del (delete) statements.

備註

The collections and array module provide additional examples of mutable sequence types.

There are currently two intrinsic mutable sequence types:

List(串列)

The items of a list are arbitrary Python objects. Lists are formed by placing a comma-separated list of expressions in square brackets. (Note that there are no special cases needed to form lists of length 0 or 1.)

位元組陣列

A bytearray object is a mutable array. They are created by the built-in bytearray() constructor. Aside from being mutable (and hence unhashable), byte arrays otherwise provide the same interface and functionality as immutable bytes objects.

3.2.6. Set(集合)型別

These represent unordered, finite sets of unique, immutable objects. As such, they cannot be indexed by any subscript. However, they can be iterated over, and the built-in function len() returns the number of items in a set. Common uses for sets are fast membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.

For set elements, the same immutability rules apply as for dictionary keys. Note that numeric types obey the normal rules for numeric comparison: if two numbers compare equal (e.g., 1 and 1.0), only one of them can be contained in a set.

There are currently two intrinsic set types:

Set(集合)

These represent a mutable set. They are created by the built-in set() constructor and can be modified afterwards by several methods, such as add().

Frozen set(凍結集合)

These represent an immutable set. They are created by the built-in frozenset() constructor. As a frozenset is immutable and hashable, it can be used again as an element of another set, or as a dictionary key.

3.2.7. 對映

These represent finite sets of objects indexed by arbitrary index sets. The subscript notation a[k] selects the item indexed by k from the mapping a; this can be used in expressions and as the target of assignments or del statements. The built-in function len() returns the number of items in a mapping.

There is currently a single intrinsic mapping type:

3.2.7.1. 字典

These represent finite sets of objects indexed by nearly arbitrary values. The only types of values not acceptable as keys are values containing lists or dictionaries or other mutable types that are compared by value rather than by object identity, the reason being that the efficient implementation of dictionaries requires a key's hash value to remain constant. Numeric types used for keys obey the normal rules for numeric comparison: if two numbers compare equal (e.g., 1 and 1.0) then they can be used interchangeably to index the same dictionary entry.

Dictionaries preserve insertion order, meaning that keys will be produced in the same order they were added sequentially over the dictionary. Replacing an existing key does not change the order, however removing a key and re-inserting it will add it to the end instead of keeping its old place.

Dictionaries are mutable; they can be created by the {} notation (see section Dictionary displays).

The extension modules dbm.ndbm and dbm.gnu provide additional examples of mapping types, as does the collections module.

在 3.7 版的變更: Dictionaries did not preserve insertion order in versions of Python before 3.6. In CPython 3.6, insertion order was preserved, but it was considered an implementation detail at that time rather than a language guarantee.

3.2.8. 可呼叫型別

These are the types to which the function call operation (see section Calls) can be applied:

3.2.8.1. 自訂函式

A user-defined function object is created by a function definition (see section 函式定義). It should be called with an argument list containing the same number of items as the function's formal parameter list.

3.2.8.1.1. 特殊唯讀屬性

屬性

含義

function.__builtins__

A reference to the dictionary that holds the function's builtins namespace.

在 3.10 版被加入.

function.__globals__

A reference to the dictionary that holds the function's global variables -- the global namespace of the module in which the function was defined.

function.__closure__

None or a tuple of cells that contain bindings for the names specified in the co_freevars attribute of the function's code object.

A cell object has the attribute cell_contents. This can be used to get the value of the cell, as well as set the value.

3.2.8.1.2. 特殊可寫屬性

Most of these attributes check the type of the assigned value:

屬性

含義

function.__doc__

函式的文件字串,若不可用則為 None

function.__name__

The function's name. See also: __name__ attributes.

function.__qualname__

The function's qualified name. See also: __qualname__ attributes.

在 3.3 版被加入.

function.__module__

The name of the module the function was defined in, or None if unavailable.

function.__defaults__

A tuple containing default parameter values for those parameters that have defaults, or None if no parameters have a default value.

function.__code__

代表編譯函式主體的程式碼物件

function.__dict__

The namespace supporting arbitrary function attributes. See also: __dict__ attributes.

function.__annotations__

A dictionary containing annotations of parameters. The keys of the dictionary are the parameter names, and 'return' for the return annotation, if provided. See also: object.__annotations__.

在 3.14 版的變更: Annotations are now lazily evaluated. See PEP 649.

function.__annotate__

The annotate function for this function, or None if the function has no annotations. See object.__annotate__.

在 3.14 版被加入.

function.__kwdefaults__

A dictionary containing defaults for keyword-only parameters.

function.__type_params__

A tuple containing the type parameters of a generic function.

在 3.12 版被加入.

Function objects also support getting and setting arbitrary attributes, which can be used, for example, to attach metadata to functions. Regular attribute dot-notation is used to get and set such attributes.

CPython 實作細節: CPython's current implementation only supports function attributes on user-defined functions. Function attributes on built-in functions may be supported in the future.

Additional information about a function's definition can be retrieved from its code object (accessible via the __code__ attribute).

3.2.8.2. 實例方法

An instance method object combines a class, a class instance and any callable object (normally a user-defined function).

特殊唯讀屬性:

method.__self__

Refers to the class instance object to which the method is bound

method.__func__

Refers to the original function object

method.__doc__

The method's documentation (same as method.__func__.__doc__). A string if the original function had a docstring, else None.

method.__name__

The name of the method (same as method.__func__.__name__)

method.__module__

The name of the module the method was defined in, or None if unavailable.

Methods also support accessing (but not setting) the arbitrary function attributes on the underlying function object.

User-defined method objects may be created when getting an attribute of a class (perhaps via an instance of that class), if that attribute is a user-defined function object or a classmethod object.

When an instance method object is created by retrieving a user-defined function object from a class via one of its instances, its __self__ attribute is the instance, and the method object is said to be bound. The new method's __func__ attribute is the original function object.

When an instance method object is created by retrieving a classmethod object from a class or instance, its __self__ attribute is the class itself, and its __func__ attribute is the function object underlying the class method.

When an instance method object is called, the underlying function (__func__) is called, inserting the class instance (__self__) in front of the argument list. For instance, when C is a class which contains a definition for a function f(), and x is an instance of C, calling x.f(1) is equivalent to calling C.f(x, 1).

When an instance method object is derived from a classmethod object, the "class instance" stored in __self__ will actually be the class itself, so that calling either x.f(1) or C.f(1) is equivalent to calling f(C,1) where f is the underlying function.

It is important to note that user-defined functions which are attributes of a class instance are not converted to bound methods; this only happens when the function is an attribute of the class.

3.2.8.3. 產生器函式

A function or method which uses the yield statement (see section yield 陳述式) is called a generator function. Such a function, when called, always returns an iterator object which can be used to execute the body of the function: calling the iterator's