dis --- Python bytecode 的反組譯器

原始碼:Lib/dis.py


dis 模組支援反組譯分析 CPython bytecode。CPython bytecode 作為輸入的模組被定義於 Include/opcode.h 並且被編譯器和直譯器所使用。

CPython 實作細節: Bytecode is an implementation detail of the CPython interpreter. No guarantees are made that bytecode will not be added, removed, or changed between versions of Python. Use of this module should not be considered to work across Python VMs or Python releases.

在 3.6 版的變更: Use 2 bytes for each instruction. Previously the number of bytes varied by instruction.

在 3.10 版的變更: The argument of jump, exception handling and loop instructions is now the instruction offset rather than the byte offset.

在 3.11 版的變更: Some instructions are accompanied by one or more inline cache entries, which take the form of CACHE instructions. These instructions are hidden by default, but can be shown by passing show_caches=True to any dis utility. Furthermore, the interpreter now adapts the bytecode to specialize it for different runtime conditions. The adaptive bytecode can be shown by passing adaptive=True.

在 3.12 版的變更: The argument of a jump is the offset of the target instruction relative to the instruction that appears immediately after the jump instruction's CACHE entries.

As a consequence, the presence of the CACHE instructions is transparent for forward jumps but needs to be taken into account when reasoning about backward jumps.

在 3.13 版的變更: The output shows logical labels rather than instruction offsets for jump targets and exception handlers. The -O command line option and the show_offsets argument were added.

在 3.14 版的變更: The -P command-line option and the show_positions argument were added.

The -S command-line option is added.

Example: Given the function myfunc():

def myfunc(alist):
    return len(alist)

可以使用以下指令來顯示 myfunc() 的反組譯:

>>> dis.dis(myfunc)
  2           RESUME                   0

  3           LOAD_GLOBAL              1 (len + NULL)
              LOAD_FAST_BORROW         0 (alist)
              CALL                     1
              RETURN_VALUE

(The "2" is a line number).

命令列介面

The dis module can be invoked as a script from the command line:

python -m dis [-h] [-C] [-O] [-P] [-S] [infile]

可接受以下選項:

-h, --help

Display usage and exit.

-C, --show-caches

Show inline caches.

在 3.13 版被加入.

-O, --show-offsets

Show offsets of instructions.

在 3.13 版被加入.

-P, --show-positions

Show positions of instructions in the source code.

在 3.14 版被加入.

-S, --specialized

Show specialized bytecode.

在 3.14 版被加入.

If infile is specified, its disassembled code will be written to stdout. Otherwise, disassembly is performed on compiled source code received from stdin.

Bytecode analysis

在 3.4 版被加入.

The bytecode analysis API allows pieces of Python code to be wrapped in a Bytecode object that provides easy access to details of the compiled code.

class dis.Bytecode(x, *, first_line=None, current_offset=None, show_caches=False, adaptive=False, show_offsets=False, show_positions=False)

Analyse the bytecode corresponding to a function, generator, asynchronous generator, coroutine, method, string of source code, or a code object (as returned by compile()).

This is a convenience wrapper around many of the functions listed below, most notably get_instructions(), as iterating over a Bytecode instance yields the bytecode operations as Instruction instances.

If first_line is not None, it indicates the line number that should be reported for the first source line in the disassembled code. Otherwise, the source line information (if any) is taken directly from the disassembled code object.

If current_offset is not None, it refers to an instruction offset in the disassembled code. Setting this means dis() will display a "current instruction" marker against the specified opcode.

If show_caches is True, dis() will display inline cache entries used by the interpreter to specialize the bytecode.

If adaptive is True, dis() will display specialized bytecode that may be different from the original bytecode.

If show_offsets is True, dis() will include instruction offsets in the output.

If show_positions is True, dis() will include instruction source code positions in the output.

classmethod from_traceback(tb, *, show_caches=False)

Construct a Bytecode instance from the given traceback, setting current_offset to the instruction responsible for the exception.

codeobj

The compiled code object.

first_line

The first source line of the code object (if available)

dis()

Return a formatted view of the bytecode operations (the same as printed by dis.dis(), but returned as a multi-line string).

info()

Return a formatted multi-line string with detailed information about the code object, like code_info().

在 3.7 版的變更: This can now handle coroutine and asynchronous generator objects.

在 3.11 版的變更: 新增 show_cachesadaptive 參數。

在 3.13 版的變更: 新增 show_offsets 參數。

在 3.14 版的變更: 新增 show_positions 參數。

範例:

>>> bytecode = dis.Bytecode(myfunc)
>>> for instr in bytecode:
...     print(instr.opname)
...
RESUME
LOAD_GLOBAL
LOAD_FAST_BORROW
CALL
RETURN_VALUE

分析函式

The dis module also defines the following analysis functions that convert the input directly to the desired output. They can be useful if only a single operation is being performed, so the intermediate analysis object isn't useful:

dis.code_info(x)

Return a formatted multi-line string with detailed code object information for the supplied function, generator, asynchronous generator, coroutine, method, source code string or code object.

Note that the exact contents of code info strings are highly implementation dependent and they may change arbitrarily across Python VMs or Python releases.

在 3.2 版被加入.

在 3.7 版的變更: This can now handle coroutine and asynchronous generator objects.

dis.show_code(x, *, file=None)

Print detailed code object information for the supplied function, method, source code string or code object to file (or sys.stdout if file is not specified).

This is a convenient shorthand for print(code_info(x), file=file), intended for interactive exploration at the interpreter prompt.

在 3.2 版被加入.

在 3.4 版的變更: 新增 file 參數。

dis.dis(x=None, *, file=None, depth=None, show_caches=False, adaptive=False, show_offsets=False, show_positions=False)

Disassemble the x object. x can denote either a module, a class, a method, a function, a generator, an asynchronous generator, a coroutine, a code object, a string of source code or a byte sequence of raw bytecode. For a module, it disassembles all functions. For a class, it disassembles all methods (including class and static methods). For a code object or sequence of raw bytecode, it prints one line per bytecode instruction. It also recursively disassembles nested code objects. These can include generator expressions, nested functions, the bodies of nested classes, and the code objects used for annotation scopes. Strings are first compiled to code objects with the compile() built-in function before being disassembled. If no object is provided, this function disassembles the last traceback.

The disassembly is written as text to the supplied file argument if provided and to sys.stdout otherwise.

The maximal depth of recursion is limited by depth unless it is None. depth=0 means no recursion.

If show_caches is True, this function will display inline cache entries used by the interpreter to specialize the bytecode.

If adaptive is True, this function will display specialized bytecode that may be different from the original bytecode.

在 3.4 版的變更: 新增 file 參數。

在 3.7 版的變更: Implemented recursive disassembling and added depth parameter.

在 3.7 版的變更: This can now handle coroutine and asynchronous generator objects.

在 3.11 版的變更: 新增 show_cachesadaptive 參數。

在 3.13 版的變更: 新增 show_offsets 參數。

在 3.14 版的變更: 新增 show_positions 參數。

dis.distb(tb=None, *, file=None, show_caches=False, adaptive=False, show_offset=False, show_positions=False)

Disassemble the top-of-stack function of a traceback, using the last traceback if none was passed. The instruction causing the exception is indicated.

The disassembly is written as text to the supplied file argument if provided and to sys.stdout otherwise.

在 3.4 版的變更: 新增 file 參數。

在 3.11 版的變更: 新增 show_cachesadaptive 參數。

在 3.13 版的變更: 新增 show_offsets 參數。

在 3.14 版的變更: 新增 show_positions 參數。

dis.disassemble(code, lasti=-1, *, file=None, show_caches=False, adaptive=False, show_offsets=False, show_positions=False)
dis.disco(code, lasti=-1, *, file=None, show_caches=False, adaptive=False, show_offsets=False, show_positions=False)

Disassemble a code object, indicating the last instruction if lasti was provided. The output is divided in the following columns:

  1. the source code location of the instruction. Complete location information is shown if show_positions is true. Otherwise (the default) only the line number is displayed.

  2. the current instruction, indicated as -->,

  3. a labelled instruction, indicated with >>,

  4. the address of the instruction,

  5. the operation code name,

  6. operation parameters, and

  7. interpretation of the parameters in parentheses.

The parameter interpretation recognizes local and global variable names, constant values, branch targets, and compare operators.

The disassembly is written as text to the supplied file argument if provided and to sys.stdout otherwise.

在 3.4 版的變更: 新增 file 參數。

在 3.11 版的變更: 新增 show_cachesadaptive 參數。

在 3.13 版的變更: 新增 show_offsets 參數。

在 3.14 版的變更: 新增 show_positions 參數。

dis.get_instructions(x, *, first_line=None, show_caches=False, adaptive=False)

Return an iterator over the instructions in the supplied function, method, source code string or code object.

The iterator generates a series of Instruction named tuples giving the details of each operation in the supplied code.

If first_line is not None, it indicates the line number that should be reported for the first source line in the disassembled code. Otherwise, the source line information (if any) is taken directly from the disassembled code object.

The adaptive parameter works as it does in dis().

在 3.4 版被加入.

在 3.11 版的變更: 新增 show_cachesadaptive 參數。

在 3.13 版的變更: The show_caches parameter is deprecated and has no effect. The iterator generates the Instruction instances with the cache_info field populated (regardless of the value of show_caches) and it no longer generates separate items for the cache entries.

dis.findlinestarts(code)

This generator function uses the co_lines() method of the code object code to find the offsets which are starts of lines in the source code. They are generated as (offset, lineno) pairs.

在 3.6 版的變更: Line numbers can be decreasing. Before, they were always increasing.

在 3.10 版的變更: The PEP 626 co_lines() method is used instead of the co_firstlineno and co_lnotab attributes of the code object.

在 3.13 版的變更: Line numbers can be None for bytecode that does not map to source lines.

dis.findlabels(code)

Detect all offsets in the raw compiled bytecode string code which are jump targets, and return a list of these offsets.

dis.stack_effect(opcode, oparg=None, *, jump=None)

Compute the stack effect of opcode with argument oparg.

If the code has a jump target and jump is True, stack_effect() will return the stack effect of jumping. If jump is False, it will return the stack effect of not jumping. And if jump is None (default), it will return the maximal stack effect of both cases.

在 3.4 版被加入.

在 3.8 版的變更: 新增 jump 參數。

在 3.13 版的變更: If oparg is omitted (or None), the stack effect is now returned for oparg=0. Previously this was an error for opcodes that use their arg. It is also no longer an error to pass an integer oparg when the opcode does not use it; the oparg in this case is ignored.

Python 位元組碼指令

The get_instructions() function and Bytecode class provide details of bytecode instructions as Instruction instances:

class dis.Instruction

位元組碼操作的詳細資訊

opcode

numeric code for operation, corresponding to the opcode values listed below and the bytecode values in the Opcode collections.

opname

操作的可讀名稱

baseopcode

numeric code for the base operation if operation is specialized; otherwise equal to opcode

baseopname

human readable name for the base operation if operation is specialized; otherwise equal to opname

arg

numeric argument to operation (if any), otherwise None

oparg

arg 的別名。

argval

resolved arg value (if any), otherwise None

argrepr

human readable description of operation argument (if any), otherwise an empty string.

offset

start index of operation within bytecode sequence

start_offset

start index of operation within bytecode sequence, including prefixed EXTENDED_ARG operations if present; otherwise equal to offset

cache_offset

start index of the cache entries following the operation

end_offset

end index of the cache entries following the operation

starts_line

True if this opcode starts a source line, otherwise False

line_number

source line number associated with this opcode (if any), otherwise None

is_jump_target

True if other code jumps to here, otherwise False

jump_target

bytecode index of the jump target if this is a jump operation, otherwise None

positions

dis.Positions object holding the start and end locations that are covered by this instruction.

cache_info

Information about the cache entries of this instruction, as triplets of the form (name, size, data), where the name and size describe the cache format and data is the contents of the cache. cache_info is None if the instruction does not have caches.

在 3.4 版被加入.

在 3.11 版的變更: Field positions is added.

在 3.13 版的變更: Changed field starts_line.

Added fields start_offset, cache_offset, end_offset, baseopname, baseopcode, jump_target, oparg, line_number and cache_info.

class dis.Positions

In case the information is not available, some fields might be None.

lineno
end_lineno
col_offset
end_col_offset

在 3.11 版被加入.

The Python compiler currently generates the following bytecode instructions.

一般指令

In the following, We will refer to the interpreter stack as STACK and describe operations on it as if it was a Python list. The top of the stack corresponds to STACK[-1] in this language.

NOP

Do nothing code. Used as a placeholder by the bytecode optimizer, and to generate line tracing events.

NOT_TAKEN

Do nothing code. Used by the interpreter to record BRANCH_LEFT and BRANCH_RIGHT events for sys.monitoring.

在 3.14 版被加入.

POP_ITER

Removes the iterator from the top of the stack.

在 3.14 版被加入.

POP_TOP

Removes the top-of-stack item:

STACK.pop()
END_FOR

Removes the top-of-stack item. Equivalent to POP_TOP. Used to clean up at the end of loops, hence the name.

在 3.12 版被加入.

END_SEND

Implements del STACK[-2]. Used to clean up when a generator exits.

在 3.12 版被加入.

COPY(i)

Push the i-th item to the top of the stack without removing it from its original location:

assert i > 0
STACK.append(STACK[-i])

在 3.11 版被加入.

SWAP(i)

Swap the top of the stack with the i-th element:

STACK[-i], STACK[-1] = STACK[-1], STACK[-i]

在 3.11 版被加入.

CACHE

Rather than being an actual instruction, this opcode is used to mark extra space for the interpreter to cache useful data directly in the bytecode itself. It is automatically hidden by all dis utilities, but can be viewed with show_caches=True.

Logically, this space is part of the preceding instruction. Many opcodes expect to be followed by an exact number of caches, and will instruct the interpreter to skip over them at runtime.

Populated caches can look like arbitrary instructions, so great care should be taken when reading or modifying raw, adaptive bytecode containing quickened data.

在 3.11 版被加入.

Unary operations

Unary operations take the top of the stack, apply the operation, and push the result back on the stack.

UNARY_NEGATIVE

實作 STACK[-1] = -STACK[-1]

UNARY_NOT

實作 STACK[-1] = not STACK[-1]

在 3.13 版的變更: This instruction now requires an exact bool operand.

UNARY_INVERT

實作 STACK[-1] = ~STACK[-1]

GET_ITER

實作 STACK[-1] = iter(STACK[-1])

GET_YIELD_FROM_ITER

If STACK[-1] is a generator iterator or coroutine object it is left as is. Otherwise, implements STACK[-1] = iter(STACK[-1]).

在 3.5 版被加入.

TO_BOOL

Implements STACK[-1] = bool(STACK[-1]).

在 3.13 版被加入.

Binary and in-place operations

Binary operations remove the top two items from the stack (STACK[-1] and STACK[-2]). They perform the operation, then put the result back on the stack.

In-place operations are like binary operations, but the operation is done in-place when STACK[-2] supports it, and the resulting STACK[-1] may be (but does not have to be) the original STACK[-2].

BINARY_OP(op)

Implements the binary and in-place operators (depending on the value of op):

rhs = STACK.pop()
lhs = STACK.pop()
STACK.append(lhs op rhs)

在 3.11 版被加入.

在 3.14 版的變更: With oparg :NB_SUBSCR, implements binary subscript (replaces opcode BINARY_SUBSCR)

STORE_SUBSCR

實作了:

key = STACK.pop()
container = STACK.pop()
value = STACK.pop()
container[key] = value
DELETE_SUBSCR

實作了:

key = STACK.pop()
container = STACK.pop()
del container[key]
BINARY_SLICE

實作了:

end = STACK.pop()
start = STACK.pop()
container = STACK.pop()
STACK.append(container[start:end])

在 3.12 版被加入.

STORE_SLICE

實作了:

end = STACK.pop()
start = STACK.pop()
container = STACK.pop()
value = STACK.pop()
container[start:end] = value

在 3.12 版被加入.

Coroutine opcodes

GET_AWAITABLE(where)

Implements STACK[-1] = get_awaitable(STACK[-1]), where get_awaitable(o) returns o if o is a coroutine object or a generator object with the CO_ITERABLE_COROUTINE flag, or resolves o.__await__.

If the where operand is nonzero, it indicates where the instruction occurs:

  • 1: After a call to __aenter__

  • 2: After a call to __aexit__

在 3.5 版被加入.

在 3.11 版的變更: 先前此指令沒有 oparg。

GET_AITER

Implements STACK[-1] = STACK[-1].__aiter__().

在 3.5 版被加入.

在 3.7 版的變更: Returning awaitable objects from __aiter__ is no longer supported.

GET_ANEXT

Implement STACK.append(get_awaitable(STACK[-1].__anext__())) to the stack. See GET_AWAITABLE for details about get_awaitable.

在 3.5 版被加入.

END_ASYNC_FOR

Terminates an async for loop. Handles an exception raised when awaiting a next item. The stack contains the async iterable in STACK[-2] and the raised exception in STACK[-1]. Both are popped. If the exception is not StopAsyncIteration, it is re-raised.

在 3.8 版被加入.

在 3.11 版的變更: Exception representation on the stack now consist of one, not three, items.

CLEANUP_THROW

Handles an exception raised during a throw() or close() call through the current frame. If STACK[-1] is an instance of StopIteration, pop three values from the stack and push its value member. Otherwise, re-raise STACK[-1].

在 3.12 版被加入.

Miscellaneous opcodes

SET_ADD(i)

實作了:

item = STACK.pop()
set.add(STACK[-i], item)

Used to implement set comprehensions.

LIST_APPEND(i)

實作了:

item = STACK.pop()
list.append(STACK[-i], item)

Used to implement list comprehensions.

MAP_ADD(i)

實作了:

value = STACK.pop()
key = STACK.pop()
dict.__setitem__(STACK[-i], key, value)

Used to implement dict comprehensions.

在 3.1 版被加入.

在 3.8 版的變更: Map value is STACK[-1] and map key is STACK[-2]. Before, those were reversed.

For all of the SET_ADD, LIST_APPEND and MAP_ADD instructions, while the added value or key/value pair is popped off, the container object remains on the stack so that it is available for further iterations of the loop.

RETURN_VALUE

Returns with STACK[-1] to the caller of the function.

YIELD_VALUE

Yields STACK.pop() from a generator.

在 3.11 版的變更: oparg set to be the stack depth.

在 3.12 版的變更: oparg set to be the exception block depth, for efficient closing of generators.

在 3.13 版的變更: oparg is 1 if this instruction is part of a yield-from or await, and 0 otherwise.

SETUP_ANNOTATIONS

Checks whether __annotations__ is defined in locals(), if not it is set up to an empty dict. This opcode is only emitted if a class or module body contains variable annotations statically.

在 3.6 版被加入.

POP_EXCEPT

Pops a value from the stack, which is used to restore the exception state.

在 3.11 版的變更: Exception representation on the stack now consist of one, not three, items.

RERAISE

Re-raises the exception currently on top of the stack. If oparg is non-zero, pops an additional value from the stack which is used to set