dis --- Python bytecode 的反組譯器¶
原始碼:Lib/dis.py
dis 模組支援反組譯分析 CPython bytecode。CPython bytecode 作為輸入的模組被定義於 Include/opcode.h 並且被編譯器和直譯器所使用。
CPython 實作細節: Bytecode is an implementation detail of the CPython interpreter. No guarantees are made that bytecode will not be added, removed, or changed between versions of Python. Use of this module should not be considered to work across Python VMs or Python releases.
在 3.6 版的變更: Use 2 bytes for each instruction. Previously the number of bytes varied by instruction.
在 3.10 版的變更: The argument of jump, exception handling and loop instructions is now the instruction offset rather than the byte offset.
在 3.11 版的變更: Some instructions are accompanied by one or more inline cache entries,
which take the form of CACHE instructions. These instructions
are hidden by default, but can be shown by passing show_caches=True to
any dis utility. Furthermore, the interpreter now adapts the
bytecode to specialize it for different runtime conditions. The
adaptive bytecode can be shown by passing adaptive=True.
在 3.12 版的變更: The argument of a jump is the offset of the target instruction relative
to the instruction that appears immediately after the jump instruction's
CACHE entries.
As a consequence, the presence of the CACHE instructions is
transparent for forward jumps but needs to be taken into account when
reasoning about backward jumps.
在 3.13 版的變更: The output shows logical labels rather than instruction offsets
for jump targets and exception handlers. The -O command line
option and the show_offsets argument were added.
Example: Given the function myfunc():
def myfunc(alist):
return len(alist)
可以使用以下指令來顯示 myfunc() 的反組譯:
>>> dis.dis(myfunc)
2 RESUME 0
3 LOAD_GLOBAL 1 (len + NULL)
LOAD_FAST_BORROW 0 (alist)
CALL 1
RETURN_VALUE
(The "2" is a line number).
命令列介面¶
The dis module can be invoked as a script from the command line:
python -m dis [-h] [-C] [-O] [-P] [-S] [infile]
可接受以下選項:
- -h, --help¶
Display usage and exit.
- -C, --show-caches¶
Show inline caches.
在 3.13 版被加入.
- -O, --show-offsets¶
Show offsets of instructions.
在 3.13 版被加入.
- -P, --show-positions¶
Show positions of instructions in the source code.
在 3.14 版被加入.
- -S, --specialized¶
Show specialized bytecode.
在 3.14 版被加入.
If infile is specified, its disassembled code will be written to stdout.
Otherwise, disassembly is performed on compiled source code received from stdin.
Bytecode analysis¶
在 3.4 版被加入.
The bytecode analysis API allows pieces of Python code to be wrapped in a
Bytecode object that provides easy access to details of the compiled
code.
- class dis.Bytecode(x, *, first_line=None, current_offset=None, show_caches=False, adaptive=False, show_offsets=False, show_positions=False)¶
Analyse the bytecode corresponding to a function, generator, asynchronous generator, coroutine, method, string of source code, or a code object (as returned by
compile()).This is a convenience wrapper around many of the functions listed below, most notably
get_instructions(), as iterating over aBytecodeinstance yields the bytecode operations asInstructioninstances.If first_line is not
None, it indicates the line number that should be reported for the first source line in the disassembled code. Otherwise, the source line information (if any) is taken directly from the disassembled code object.If current_offset is not
None, it refers to an instruction offset in the disassembled code. Setting this meansdis()will display a "current instruction" marker against the specified opcode.If show_caches is
True,dis()will display inline cache entries used by the interpreter to specialize the bytecode.If adaptive is
True,dis()will display specialized bytecode that may be different from the original bytecode.If show_offsets is
True,dis()will include instruction offsets in the output.If show_positions is
True,dis()will include instruction source code positions in the output.- classmethod from_traceback(tb, *, show_caches=False)¶
Construct a
Bytecodeinstance from the given traceback, setting current_offset to the instruction responsible for the exception.
- codeobj¶
The compiled code object.
- first_line¶
The first source line of the code object (if available)
- dis()¶
Return a formatted view of the bytecode operations (the same as printed by
dis.dis(), but returned as a multi-line string).
- info()¶
Return a formatted multi-line string with detailed information about the code object, like
code_info().
在 3.7 版的變更: This can now handle coroutine and asynchronous generator objects.
在 3.11 版的變更: 新增 show_caches 與 adaptive 參數。
在 3.13 版的變更: 新增 show_offsets 參數。
在 3.14 版的變更: 新增 show_positions 參數。
範例:
>>> bytecode = dis.Bytecode(myfunc)
>>> for instr in bytecode:
... print(instr.opname)
...
RESUME
LOAD_GLOBAL
LOAD_FAST_BORROW
CALL
RETURN_VALUE
分析函式¶
The dis module also defines the following analysis functions that convert
the input directly to the desired output. They can be useful if only a single
operation is being performed, so the intermediate analysis object isn't useful:
- dis.code_info(x)¶
Return a formatted multi-line string with detailed code object information for the supplied function, generator, asynchronous generator, coroutine, method, source code string or code object.
Note that the exact contents of code info strings are highly implementation dependent and they may change arbitrarily across Python VMs or Python releases.
在 3.2 版被加入.
在 3.7 版的變更: This can now handle coroutine and asynchronous generator objects.
- dis.show_code(x, *, file=None)¶
Print detailed code object information for the supplied function, method, source code string or code object to file (or
sys.stdoutif file is not specified).This is a convenient shorthand for
print(code_info(x), file=file), intended for interactive exploration at the interpreter prompt.在 3.2 版被加入.
在 3.4 版的變更: 新增 file 參數。
- dis.dis(x=None, *, file=None, depth=None, show_caches=False, adaptive=False, show_offsets=False, show_positions=False)¶
Disassemble the x object. x can denote either a module, a class, a method, a function, a generator, an asynchronous generator, a coroutine, a code object, a string of source code or a byte sequence of raw bytecode. For a module, it disassembles all functions. For a class, it disassembles all methods (including class and static methods). For a code object or sequence of raw bytecode, it prints one line per bytecode instruction. It also recursively disassembles nested code objects. These can include generator expressions, nested functions, the bodies of nested classes, and the code objects used for annotation scopes. Strings are first compiled to code objects with the
compile()built-in function before being disassembled. If no object is provided, this function disassembles the last traceback.The disassembly is written as text to the supplied file argument if provided and to
sys.stdoutotherwise.The maximal depth of recursion is limited by depth unless it is
None.depth=0means no recursion.If show_caches is
True, this function will display inline cache entries used by the interpreter to specialize the bytecode.If adaptive is
True, this function will display specialized bytecode that may be different from the original bytecode.在 3.4 版的變更: 新增 file 參數。
在 3.7 版的變更: Implemented recursive disassembling and added depth parameter.
在 3.7 版的變更: This can now handle coroutine and asynchronous generator objects.
在 3.11 版的變更: 新增 show_caches 與 adaptive 參數。
在 3.13 版的變更: 新增 show_offsets 參數。
在 3.14 版的變更: 新增 show_positions 參數。
- dis.distb(tb=None, *, file=None, show_caches=False, adaptive=False, show_offset=False, show_positions=False)¶
Disassemble the top-of-stack function of a traceback, using the last traceback if none was passed. The instruction causing the exception is indicated.
The disassembly is written as text to the supplied file argument if provided and to
sys.stdoutotherwise.在 3.4 版的變更: 新增 file 參數。
在 3.11 版的變更: 新增 show_caches 與 adaptive 參數。
在 3.13 版的變更: 新增 show_offsets 參數。
在 3.14 版的變更: 新增 show_positions 參數。
- dis.disassemble(code, lasti=-1, *, file=None, show_caches=False, adaptive=False, show_offsets=False, show_positions=False)¶
- dis.disco(code, lasti=-1, *, file=None, show_caches=False, adaptive=False, show_offsets=False, show_positions=False)¶
Disassemble a code object, indicating the last instruction if lasti was provided. The output is divided in the following columns:
the source code location of the instruction. Complete location information is shown if show_positions is true. Otherwise (the default) only the line number is displayed.
the current instruction, indicated as
-->,a labelled instruction, indicated with
>>,the address of the instruction,
the operation code name,
operation parameters, and
interpretation of the parameters in parentheses.
The parameter interpretation recognizes local and global variable names, constant values, branch targets, and compare operators.
The disassembly is written as text to the supplied file argument if provided and to
sys.stdoutotherwise.在 3.4 版的變更: 新增 file 參數。
在 3.11 版的變更: 新增 show_caches 與 adaptive 參數。
在 3.13 版的變更: 新增 show_offsets 參數。
在 3.14 版的變更: 新增 show_positions 參數。
- dis.get_instructions(x, *, first_line=None, show_caches=False, adaptive=False)¶
Return an iterator over the instructions in the supplied function, method, source code string or code object.
The iterator generates a series of
Instructionnamed tuples giving the details of each operation in the supplied code.If first_line is not
None, it indicates the line number that should be reported for the first source line in the disassembled code. Otherwise, the source line information (if any) is taken directly from the disassembled code object.The adaptive parameter works as it does in
dis().在 3.4 版被加入.
在 3.11 版的變更: 新增 show_caches 與 adaptive 參數。
在 3.13 版的變更: The show_caches parameter is deprecated and has no effect. The iterator generates the
Instructioninstances with the cache_info field populated (regardless of the value of show_caches) and it no longer generates separate items for the cache entries.
- dis.findlinestarts(code)¶
This generator function uses the
co_lines()method of the code object code to find the offsets which are starts of lines in the source code. They are generated as(offset, lineno)pairs.在 3.6 版的變更: Line numbers can be decreasing. Before, they were always increasing.
在 3.10 版的變更: The PEP 626
co_lines()method is used instead of theco_firstlinenoandco_lnotabattributes of the code object.在 3.13 版的變更: Line numbers can be
Nonefor bytecode that does not map to source lines.
- dis.findlabels(code)¶
Detect all offsets in the raw compiled bytecode string code which are jump targets, and return a list of these offsets.
- dis.stack_effect(opcode, oparg=None, *, jump=None)¶
Compute the stack effect of opcode with argument oparg.
If the code has a jump target and jump is
True,stack_effect()will return the stack effect of jumping. If jump isFalse, it will return the stack effect of not jumping. And if jump isNone(default), it will return the maximal stack effect of both cases.在 3.4 版被加入.
在 3.8 版的變更: 新增 jump 參數。
在 3.13 版的變更: If
opargis omitted (orNone), the stack effect is now returned foroparg=0. Previously this was an error for opcodes that use their arg. It is also no longer an error to pass an integeropargwhen theopcodedoes not use it; theopargin this case is ignored.
Python 位元組碼指令¶
The get_instructions() function and Bytecode class provide
details of bytecode instructions as Instruction instances:
- class dis.Instruction¶
位元組碼操作的詳細資訊
- opcode¶
numeric code for operation, corresponding to the opcode values listed below and the bytecode values in the Opcode collections.
- opname¶
操作的可讀名稱
- baseopcode¶
numeric code for the base operation if operation is specialized; otherwise equal to
opcode
- baseopname¶
human readable name for the base operation if operation is specialized; otherwise equal to
opname
- arg¶
numeric argument to operation (if any), otherwise
None
- argval¶
resolved arg value (if any), otherwise
None
- argrepr¶
human readable description of operation argument (if any), otherwise an empty string.
- offset¶
start index of operation within bytecode sequence
- start_offset¶
start index of operation within bytecode sequence, including prefixed
EXTENDED_ARGoperations if present; otherwise equal tooffset
- cache_offset¶
start index of the cache entries following the operation
- end_offset¶
end index of the cache entries following the operation
- starts_line¶
Trueif this opcode starts a source line, otherwiseFalse
- line_number¶
source line number associated with this opcode (if any), otherwise
None
- is_jump_target¶
Trueif other code jumps to here, otherwiseFalse
- jump_target¶
bytecode index of the jump target if this is a jump operation, otherwise
None
- positions¶
dis.Positionsobject holding the start and end locations that are covered by this instruction.
- cache_info¶
Information about the cache entries of this instruction, as triplets of the form
(name, size, data), where thenameandsizedescribe the cache format and data is the contents of the cache.cache_infoisNoneif the instruction does not have caches.
在 3.4 版被加入.
在 3.11 版的變更: Field
positionsis added.在 3.13 版的變更: Changed field
starts_line.Added fields
start_offset,cache_offset,end_offset,baseopname,baseopcode,jump_target,oparg,line_numberandcache_info.
- class dis.Positions¶
In case the information is not available, some fields might be
None.- lineno¶
- end_lineno¶
- col_offset¶
- end_col_offset¶
在 3.11 版被加入.
The Python compiler currently generates the following bytecode instructions.
一般指令
In the following, We will refer to the interpreter stack as STACK and describe
operations on it as if it was a Python list. The top of the stack corresponds to
STACK[-1] in this language.
- NOP¶
Do nothing code. Used as a placeholder by the bytecode optimizer, and to generate line tracing events.
- NOT_TAKEN¶
Do nothing code. Used by the interpreter to record
BRANCH_LEFTandBRANCH_RIGHTevents forsys.monitoring.在 3.14 版被加入.
- POP_ITER¶
Removes the iterator from the top of the stack.
在 3.14 版被加入.
- POP_TOP¶
Removes the top-of-stack item:
STACK.pop()
- END_FOR¶
Removes the top-of-stack item. Equivalent to
POP_TOP. Used to clean up at the end of loops, hence the name.在 3.12 版被加入.
- END_SEND¶
Implements
del STACK[-2]. Used to clean up when a generator exits.在 3.12 版被加入.
- COPY(i)¶
Push the i-th item to the top of the stack without removing it from its original location:
assert i > 0 STACK.append(STACK[-i])
在 3.11 版被加入.
- SWAP(i)¶
Swap the top of the stack with the i-th element:
STACK[-i], STACK[-1] = STACK[-1], STACK[-i]
在 3.11 版被加入.
- CACHE¶
Rather than being an actual instruction, this opcode is used to mark extra space for the interpreter to cache useful data directly in the bytecode itself. It is automatically hidden by all
disutilities, but can be viewed withshow_caches=True.Logically, this space is part of the preceding instruction. Many opcodes expect to be followed by an exact number of caches, and will instruct the interpreter to skip over them at runtime.
Populated caches can look like arbitrary instructions, so great care should be taken when reading or modifying raw, adaptive bytecode containing quickened data.
在 3.11 版被加入.
Unary operations
Unary operations take the top of the stack, apply the operation, and push the result back on the stack.
- UNARY_NEGATIVE¶
實作
STACK[-1] = -STACK[-1]。
- UNARY_NOT¶
實作
STACK[-1] = not STACK[-1]。在 3.13 版的變更: This instruction now requires an exact
booloperand.
- UNARY_INVERT¶
實作
STACK[-1] = ~STACK[-1]。
- GET_ITER¶
實作
STACK[-1] = iter(STACK[-1])。
- GET_YIELD_FROM_ITER¶
If
STACK[-1]is a generator iterator or coroutine object it is left as is. Otherwise, implementsSTACK[-1] = iter(STACK[-1]).在 3.5 版被加入.
- TO_BOOL¶
Implements
STACK[-1] = bool(STACK[-1]).在 3.13 版被加入.
Binary and in-place operations
Binary operations remove the top two items from the stack (STACK[-1] and
STACK[-2]). They perform the operation, then put the result back on the stack.
In-place operations are like binary operations, but the operation is done in-place
when STACK[-2] supports it, and the resulting STACK[-1] may be (but does
not have to be) the original STACK[-2].
- BINARY_OP(op)¶
Implements the binary and in-place operators (depending on the value of op):
rhs = STACK.pop() lhs = STACK.pop() STACK.append(lhs op rhs)
在 3.11 版被加入.
在 3.14 版的變更: With oparg :
NB_SUBSCR, implements binary subscript (replaces opcodeBINARY_SUBSCR)
- STORE_SUBSCR¶
實作了:
key = STACK.pop() container = STACK.pop() value = STACK.pop() container[key] = value
- DELETE_SUBSCR¶
實作了:
key = STACK.pop() container = STACK.pop() del container[key]
- BINARY_SLICE¶
實作了:
end = STACK.pop() start = STACK.pop() container = STACK.pop() STACK.append(container[start:end])
在 3.12 版被加入.
- STORE_SLICE¶
實作了:
end = STACK.pop() start = STACK.pop() container = STACK.pop() value = STACK.pop() container[start:end] = value
在 3.12 版被加入.
Coroutine opcodes
- GET_AWAITABLE(where)¶
Implements
STACK[-1] = get_awaitable(STACK[-1]), whereget_awaitable(o)returnsoifois a coroutine object or a generator object with theCO_ITERABLE_COROUTINEflag, or resolveso.__await__.If the
whereoperand is nonzero, it indicates where the instruction occurs:1: After a call to__aenter__2: After a call to__aexit__
在 3.5 版被加入.
在 3.11 版的變更: 先前此指令沒有 oparg。
- GET_AITER¶
Implements
STACK[-1] = STACK[-1].__aiter__().在 3.5 版被加入.
在 3.7 版的變更: Returning awaitable objects from
__aiter__is no longer supported.
- GET_ANEXT¶
Implement
STACK.append(get_awaitable(STACK[-1].__anext__()))to the stack. SeeGET_AWAITABLEfor details aboutget_awaitable.在 3.5 版被加入.
- END_ASYNC_FOR¶
Terminates an
async forloop. Handles an exception raised when awaiting a next item. The stack contains the async iterable inSTACK[-2]and the raised exception inSTACK[-1]. Both are popped. If the exception is notStopAsyncIteration, it is re-raised.在 3.8 版被加入.
在 3.11 版的變更: Exception representation on the stack now consist of one, not three, items.
- CLEANUP_THROW¶
Handles an exception raised during a
throw()orclose()call through the current frame. IfSTACK[-1]is an instance ofStopIteration, pop three values from the stack and push itsvaluemember. Otherwise, re-raiseSTACK[-1].在 3.12 版被加入.
Miscellaneous opcodes
- SET_ADD(i)¶
實作了:
item = STACK.pop() set.add(STACK[-i], item)
Used to implement set comprehensions.
- LIST_APPEND(i)¶
實作了:
item = STACK.pop() list.append(STACK[-i], item)
Used to implement list comprehensions.
- MAP_ADD(i)¶
實作了:
value = STACK.pop() key = STACK.pop() dict.__setitem__(STACK[-i], key, value)
Used to implement dict comprehensions.
在 3.1 版被加入.
在 3.8 版的變更: Map value is
STACK[-1]and map key isSTACK[-2]. Before, those were reversed.
For all of the SET_ADD, LIST_APPEND and MAP_ADD
instructions, while the added value or key/value pair is popped off, the
container object remains on the stack so that it is available for further
iterations of the loop.
- RETURN_VALUE¶
Returns with
STACK[-1]to the caller of the function.
- YIELD_VALUE¶
Yields
STACK.pop()from a generator.在 3.11 版的變更: oparg set to be the stack depth.
在 3.12 版的變更: oparg set to be the exception block depth, for efficient closing of generators.
在 3.13 版的變更: oparg is
1if this instruction is part of a yield-from or await, and0otherwise.
- SETUP_ANNOTATIONS¶
Checks whether
__annotations__is defined inlocals(), if not it is set up to an emptydict. This opcode is only emitted if a class or module body contains variable annotations statically.在 3.6 版被加入.
- POP_EXCEPT¶
Pops a value from the stack, which is used to restore the exception state.
在 3.11 版的變更: Exception representation on the stack now consist of one, not three, items.
- RERAISE¶
Re-raises the exception currently on top of the stack. If oparg is non-zero, pops an additional value from the stack which is used to set