multiprocessing --- 以行程為基礎的平行性¶
可用性: not Android, not iOS, not WASI.
此模組在行動平台或 WebAssembly 平台上不支援。
簡介¶
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping the
Global Interpreter Lock by using
subprocesses instead of threads. Due
to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both POSIX and
Windows.
The multiprocessing module also introduces the
Pool object which offers a convenient means of
parallelizing the execution of a function across multiple input values,
distributing the input data across processes (data parallelism). The following
example demonstrates the common practice of defining such functions in a module
so that child processes can successfully import that module. This basic example
of data parallelism using Pool,
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
will print to standard output
[1, 4, 9]
The multiprocessing module also introduces APIs which do not have
analogs in the threading module, like the ability to terminate, interrupt or kill a running process.
也參考
concurrent.futures.ProcessPoolExecutor offers a higher level interface
to push tasks to a background process without blocking execution of the
calling process. Compared to using the Pool
interface directly, the concurrent.futures API more readily allows
the submission of work to the underlying process pool to be separated from
waiting for the results.
Process 類別¶
In multiprocessing, processes are spawned by creating a Process
object and then calling its start() method. Process
follows the API of threading.Thread. A trivial example of a
multiprocess program is
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
To show the individual process IDs involved, here is an expanded example:
from multiprocessing import Process
import os
def info(title):
print(title)
print('module name:', __name__)
print('parent process:', os.getppid())
print('process id:', os.getpid())
def f(name):
info('function f')
print('hello', name)
if __name__ == '__main__':
info('main line')
p = Process(target=f, args=('bob',))
p.start()
p.join()
For an explanation of why the if __name__ == '__main__' part is
necessary, see Programming guidelines.
The arguments to Process usually need to be unpickleable from within
the child process. If you tried typing the above example directly into a REPL it
could lead to an AttributeError in the child process trying to locate the
f function in the __main__ module.
Contexts and start methods¶
Depending on the platform, multiprocessing supports three ways
to start a process. These start methods are
- spawn
The parent process starts a fresh Python interpreter process. The child process will only inherit those resources necessary to run the process object's
run()method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.Available on POSIX and Windows platforms. The default on Windows and macOS.
- fork
The parent process uses
os.fork()to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.Available on POSIX systems.
在 3.14 版的變更: This is no longer the default start method on any platform. Code that requires fork must explicitly specify that via
get_context()orset_start_method().在 3.12 版的變更: If Python is able to detect that your process has multiple threads, the
os.fork()function that this start method calls internally will raise aDeprecationWarning. Use a different start method. See theos.fork()documentation for further explanation.
- forkserver
When the program starts and selects the forkserver start method, a server process is spawned. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded unless system libraries or preloaded imports spawn threads as a side-effect so it is generally safe for it to use
os.fork(). No unnecessary resources are inherited.Available on POSIX platforms which support passing file descriptors over Unix pipes such as Linux. The default on those.
在 3.14 版的變更: This became the default start method on POSIX platforms.
在 3.4 版的變更: spawn added on all POSIX platforms, and forkserver added for some POSIX platforms. Child processes no longer inherit all of the parents inheritable handles on Windows.
在 3.8 版的變更: On macOS, the spawn start method is now the default. The fork start method should be considered unsafe as it can lead to crashes of the subprocess as macOS system libraries may start threads. See bpo-33725.
在 3.14 版的變更: On POSIX platforms the default start method was changed from fork to forkserver to retain the performance but avoid common multithreaded process incompatibilities. See gh-84559.
On POSIX using the spawn or forkserver start methods will also
start a resource tracker process which tracks the unlinked named
system resources (such as named semaphores or
SharedMemory objects) created
by processes of the program. When all processes
have exited the resource tracker unlinks any remaining tracked object.
Usually there should be none, but if a process was killed by a signal
there may be some "leaked" resources. (Neither leaked semaphores nor shared
memory segments will be automatically unlinked until the next reboot. This is
problematic for both objects because the system allows only a limited number of
named semaphores, and shared memory segments occupy some space in the main
memory.)
To select a start method you use the set_start_method() in
the if __name__ == '__main__' clause of the main module. For
example:
import multiprocessing as mp
def foo(q):
q.put('hello')
if __name__ == '__main__':
mp.set_start_method('spawn')
q = mp.Queue()
p = mp.Process(target=foo, args=(q,))
p.start()
print(q.get())
p.join()
set_start_method() should not be used more than once in the
program.
Alternatively, you can use get_context() to obtain a context
object. Context objects have the same API as the multiprocessing
module, and allow one to use multiple start methods in the same
program.
import multiprocessing as mp
def foo(q):
q.put('hello')
if __name__ == '__main__':
ctx = mp.get_context('spawn')
q = ctx.Queue()
p = ctx.Process(target=foo, args=(q,))
p.start()
print(q.get())
p.join()
Note that objects related to one context may not be compatible with processes for a different context. In particular, locks created using the fork context cannot be passed to processes started using the spawn or forkserver start methods.
Libraries using multiprocessing or
ProcessPoolExecutor should be designed to allow
their users to provide their own multiprocessing context. Using a specific
context of your own within a library can lead to incompatibilities with the
rest of the library user's application. Always document if your library
requires a specific start method.
警告
The 'spawn' and 'forkserver' start methods generally cannot
be used with "frozen" executables (i.e., binaries produced by
packages like PyInstaller and cx_Freeze) on POSIX systems.
The 'fork' start method may work if code does not use threads.
Exchanging objects between processes¶
multiprocessing supports two types of communication channel between
processes:
Queues
The
Queueclass is a near clone ofqueue.Queue. For example:from multiprocessing import Process, Queue def f(q): q.put([42, None, 'hello']) if __name__ == '__main__': q = Queue() p = Process(target=f, args=(q,)) p.start() print(q.get()) # 印出 "[42, None, 'hello']" p.join()Queues are thread and process safe. Any object put into a
multiprocessingqueue will be serialized.
Pipes
The
Pipe()function returns a pair of connection objects connected by a pipe which by default is duplex (two-way). For example:from multiprocessing import Process, Pipe def f(conn): conn.send([42, None, 'hello']) conn.close() if __name__ == '__main__': parent_conn, child_conn = Pipe() p = Process(target=f, args=(child_conn,)) p.start() print(parent_conn.recv()) # 印出 "[42, None, 'hello']" p.join()The two connection objects returned by
Pipe()represent the two ends of the pipe. Each connection object hassend()andrecv()methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.The
send()method serializes the object andrecv()re-creates the object.
Synchronization between processes¶
multiprocessing contains equivalents of all the synchronization
primitives from threading. For instance one can use a lock to ensure
that only one process prints to standard output at a time:
from multiprocessing import Process, Lock
def f(l, i):
l.acquire()
try:
print('hello world', i)
finally:
l.release()
if __name__ == '__main__':
lock = Lock()
for num in range(10):
Process(target=f, args=(lock, num)).start()
Without using the lock output from the different processes is liable to get all mixed up.
Using a pool of workers¶
The Pool class represents a pool of worker
processes. It has methods which allows tasks to be offloaded to the worker
processes in a few different ways.
舉例來說:
from multiprocessing import Pool, TimeoutError
import time
import os
def f(x):
return x*x
if __name__ == '__main__':
# start 4 worker processes
with Pool(processes=4) as pool:
# print "[0, 1, 4,..., 81]"
print(pool.map(f, range(10)))
# print same numbers in arbitrary order
for i in pool.imap_unordered(f, range(10)):
print(i)
# evaluate "f(20)" asynchronously
res = pool.apply_async(f, (20,)) # runs in *only* one process
print(res.get(timeout=1)) # prints "400"
# evaluate "os.getpid()" asynchronously
res = pool.apply_async(os.getpid, ()) # runs in *only* one process
print(res.get(timeout=1)) # prints the PID of that process
# launching multiple evaluations asynchronously *may* use more processes
multiple_results = [pool.apply_async(os.getpid, ()) for i in range(4)]
print([res.get(timeout=1) for res in multiple_results])
# make a single worker sleep for 10 seconds
res = pool.apply_async(time.sleep, (10,))
try:
print(res.get(timeout=1))
except TimeoutError:
print("We lacked patience and got a multiprocessing.TimeoutError")
print("For the moment, the pool remains available for more work")
# exiting the 'with'-block has stopped the pool
print("Now the pool is closed and no longer available")
Note that the methods of a pool should only ever be used by the process which created it.
備註
Functionality within this package requires that the __main__ module be
importable by the children. This is covered in Programming guidelines
however it is worth pointing out here. This means that some examples, such
as the multiprocessing.pool.Pool examples will not work in the
interactive interpreter. For example:
>>> from multiprocessing import Pool
>>> p = Pool(5)
>>> def f(x):
... return x*x
...
>>> with p:
... p.map(f, [1,2,3])
Process PoolWorker-1:
Process PoolWorker-2:
Process PoolWorker-3:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
AttributeError: Can't get attribute 'f' on <module '__main__' (<class '_frozen_importlib.BuiltinImporter'>)>
AttributeError: Can't get attribute 'f' on <module '__main__' (<class '_frozen_importlib.BuiltinImporter'>)>
AttributeError: Can't get attribute 'f' on <module '__main__' (<class '_frozen_importlib.BuiltinImporter'>)>
(If you try this it will actually output three full tracebacks interleaved in a semi-random fashion, and then you may have to stop the parent process somehow.)
Reference¶
The multiprocessing package mostly replicates the API of the
threading module.
Global start method¶
Python supports several ways to create and initialize a process. The global start method sets the default mechanism for creating a process.
Several multiprocessing functions and methods that may also instantiate certain objects will implicitly set the global start method to the system's default, if it hasn’t been set already. The global start method can only be set once. If you need to change the start method from the system default, you must proactively set the global start method before calling functions or methods, or creating these objects.
Process 與例外¶
- class multiprocessing.Process(group=None, target=None, name=None, args=(), kwargs={},