PyMOTW: multiprocessing, part 1

multiprocessing Basics

Purpose:Provides an API for managing processes.
Python Version:2.6

The multiprocessing module includes a relatively simple API for dividing work up between multiple processes. It is based on the API for threading, and in some cases is a drop-in replacement. Due to the similarity, the first few examples here are modified from the threading examples. Features provided by multiprocessing but not available in threading are covered later.

Process objects

The simplest way to use a sub-process is to instantiate it with a target function
and call start() to let it begin working.

import multiprocessing

def worker():
"""worker function"""
print 'Worker'
return

if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()

The output includes the word “Worker” printed five times, although it may not be entirely clean depending on the order of execution. A later example illustrates using a lock to ensure that only one worker can print to stdout at a time.

$ python multiprocessing_simple.py
Worker
Worker
Worker
Worker
Worker

It usually more useful to be able to spawn a process with arguments to tell it what
work to do. Unlike with threading, to pass arguments to a multiprocessing Process the argument must be able to be pickled. As a simple example we could pass each
worker a number so the output is a little more interesting in the second
example.

import multiprocessing

def worker(num):
"""thread worker function"""
print 'Worker:', num
return

if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
jobs.append(p)
p.start()

The integer argument is now included in the message printed by each worker:

$ python multiprocessing_simpleargs.py
Worker: 0
Worker: 1
Worker: 2
Worker: 3
Worker: 4

Importable Target Functions

One difference you will notice between the threading and multiprocessing examples is the extra protection for __main__ used here. Due to the way the new processes are started, the child process needs to be able to import the script containing the target function. In these examples I accomplish that by wrapping the main part of the application so it is not run recursively in each child. You could also import the target function from a separate script.

For example, this main program:

import multiprocessing
import multiprocessing_import_worker

if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=multiprocessing_import_worker.worker)
jobs.append(p)
p.start()

uses this worker function, defined in a separate module:

def worker():
"""worker function"""
print 'Worker'
return

and produces output like the first example above:

$ python multiprocessing_import_main.py
Worker
Worker
Worker
Worker
Worker

Determining the Current Process

Passing arguments to identify or name the process is cumbersome, and unnecessary.
Each Process instance has a name with a default value that you can change as
the process is created. Naming processes is useful if you have a server
with multiple service children handling different operations.

import multiprocessing
import time

def worker():
name = multiprocessing.current_process().name
print name, 'Starting'
time.sleep(2)
print name, 'Exiting'

def my_service():
name = multiprocessing.current_process().name
print name, 'Starting'
time.sleep(3)
print name, 'Exiting'

if __name__ == '__main__':
service = multiprocessing.Process(name='my_service', target=my_service)
worker_1 = multiprocessing.Process(name='worker 1', target=worker)
worker_2 = multiprocessing.Process(target=worker) # use default name

worker_1.start()
worker_2.start()
service.start()

The debug output includes the name of the current process on each line. The
lines with “Process-3” in the name column correspond to the unnamed
process w2.

$ python multiprocessing_names.py
worker 1 Starting
worker 1 Exiting
Process-3 Starting
Process-3 Exiting
my_service Starting
my_service Exiting


Daemon Processes

By default the main program will not exit until all of the children have exited. There are
times when you want to start a background process and let it run without blocking the main
program from exiting. Using daemon processes like this is useful for services where there may
not be an easy way to interrupt the worker or where letting it die in the middle of its work
does not lose or corrupt data (for example, a task that generates “heart beats” for a service
monitoring tool). To mark a process as a daemon, set its daemon attribute with a boolean
value. The default is for processes to not be daemons, so passing True turns the daemon mode
on.

import multiprocessing
import time
import sys

def daemon():
print 'Starting:', multiprocessing.current_process().name
time.sleep(2)
print 'Exiting :', multiprocessing.current_process().name

def non_daemon():
print 'Starting:', multiprocessing.current_process().name
print 'Exiting :', multiprocessing.current_process().name

if __name__ == '__main__':
d = multiprocessing.Process(name='daemon', target=daemon)
d.daemon = True

n = multiprocessing.Process(name='non-daemon', target=non_daemon)
n.daemon = False

d.start()
time.sleep(1)
n.start()

Notice that the output does not include the “Exiting” message from the daemon
process, since all of the non-daemon processes (including the main program) exit
before the daemon process wakes up from its 2 second sleep.

$ python multiprocessing_daemon.py
Starting: non-daemon
Exiting : non-daemon

The daemon process is terminated before the main program exits, to avoid leaving orphaned processes running.


Waiting for Processes

To wait until a process has completed its work and exited, use the join() method.

import multiprocessing
import time
import sys

def daemon():
print 'Starting:', multiprocessing.current_process().name
time.sleep(2)
print 'Exiting :', multiprocessing.current_process().name

def non_daemon():
print 'Starting:', multiprocessing.current_process().name
print 'Exiting :', multiprocessing.current_process().name

if __name__ == '__main__':
d = multiprocessing.Process(name='daemon', target=daemon)
d.daemon = True

n = multiprocessing.Process(name='non-daemon', target=non_daemon)
n.daemon = False

d.start()
time.sleep(1)
n.start()

d.join()
n.join()

Since we wait for the daemon to exit using join(), we do see its
“Exiting” message.

$ python multiprocessing_daemon_join.py
Starting: non-daemon
Exiting : non-daemon
Starting: daemon
Exiting : daemon

By default, join() blocks indefinitely. It is also possible to pass a timeout
argument (a float representing the number of seconds to wait for the process to
become inactive). If the process does not complete within the timeout period,
join() returns anyway.

import multiprocessing
import time
import sys

def daemon():
print 'Starting:', multiprocessing.current_process().name
time.sleep(2)
print 'Exiting :', multiprocessing.current_process().name

def non_daemon():
print 'Starting:', multiprocessing.current_process().name
print 'Exiting :', multiprocessing.current_process().name

if __name__ == '__main__':
d = multiprocessing.Process(name='daemon', target=daemon)
d.daemon = True

n = multiprocessing.Process(name='non-daemon', target=non_daemon)
n.daemon = False

d.start()
n.start()

d.join(1)
print 'd.is_alive()', d.is_alive()
n.join()

Since the timeout passed is less than the amount of time the daemon
sleeps, the process is still “alive” after join() returns.

$ python multiprocessing_daemon_join_timeout.py
Starting: non-daemon
Exiting : non-daemon
d.is_alive() True


Terminating Processes

Although it is better to use the poison pill method of signaling to a process that it should exit, if a process appears hung or deadlocked it can be useful to be able to kill it forcibly. Calling terminate() on a process object kills the child process.

import multiprocessing
import time

def slow_worker():
print 'Starting worker'
time.sleep(0.1)
print 'Finished worker'

if __name__ == '__main__':
p = multiprocessing.Process(target=slow_worker)
print 'BEFORE:', p, p.is_alive()

p.start()
print 'DURING:', p, p.is_alive()

p.terminate()
print 'TERMINATED:', p, p.is_alive()

p.join()
print 'JOINED:', p, p.is_alive()

Note

It is important to join() the process after terminating it in order to give the background machinery time to update the status of the object to reflect the termination.

$ python multiprocessing_terminate.py
BEFORE: False
DURING: True
TERMINATED: True
JOINED: False

Process Exit Status

The status code produced when the process exits can be accessed via the exitcode attribute.

For exitcode values


  • == 0 – no error was produced
  • > 0 – the process had an error, and exited with that code
  • < 0 – the process was killed with a signal of -1 * exitcode
import multiprocessing
import sys
import time

def exit_error():
sys.exit(1)

def exit_ok():
return

def return_value():
return 1

def raises():
raise RuntimeError('There was an error!')

def terminated():
time.sleep(3)

if __name__ == '__main__':
jobs = []
for f in [exit_error, exit_ok, return_value, raises, terminated]:
print 'Starting process for', f.func_name
j = multiprocessing.Process(target=f, name=f.func_name)
jobs.append(j)
j.start()

jobs[-1].terminate()

for j in jobs:
j.join()
print '%s.exitcode = %s' % (j.name, j.exitcode)

Processes that raise an exception automatically get an exitcode of 1.

$ python multiprocessing_exitcode.py
Starting process for exit_error
Starting process for exit_ok
Starting process for return_value
Starting process for raises
Process raises:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py", line 231, in _bootstrap
self.run()
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "multiprocessing_exitcode.py", line 24, in raises
raise RuntimeError('There was an error!')
RuntimeError: There was an error!
Starting process for terminated
exit_error.exitcode = 1
exit_ok.exitcode = 0
return_value.exitcode = 0
raises.exitcode = 1
terminated.exitcode = -15

Logging

When debugging concurrency issues, it can be useful to have access to the internals of the objects provided by multiprocessing. There is a convenient module-level function to enable logging called log_to_stderr(). It sets up a logger object using logging and adds a handler so that log messages are sent to the standard error channel.

import multiprocessing
import logging
import sys

def worker():
print 'Doing some work'
sys.stdout.flush()

if __name__ == '__main__':
multiprocessing.log_to_stderr(logging.DEBUG)
p = multiprocessing.Process(target=worker)
p.start()
p.join()

By default the logging level is set to NOTSET so no messages are produced. Pass a different level to initialize the logger to the level of detail you want.

$ python multiprocessing_log_to_stderr.py
[INFO/Process-1] child process calling self.run()
Doing some work
[INFO/Process-1] process shutting down
[DEBUG/Process-1] running all "atexit" finalizers with priority >= 0
[DEBUG/Process-1] running the remaining "atexit" finalizers
[INFO/Process-1] process exiting with exitcode 0
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] running the remaining "atexit" finalizers

To manipulate the logger directly (change its level setting or add handlers), use get_logger().

import multiprocessing
import logging
import sys

def worker():
print 'Doing some work'
sys.stdout.flush()

if __name__ == '__main__':
multiprocessing.log_to_stderr()
logger = multiprocessing.get_logger()
logger.setLevel(logging.INFO)
p = multiprocessing.Process(target=worker)
p.start()
p.join()

The logger can also be configured through the logging configuration file API, using the name multiprocessing.

$ python multiprocessing_get_logger.py
[INFO/Process-1] child process calling self.run()
Doing some work
[INFO/Process-1] process shutting down
[INFO/Process-1] process exiting with exitcode 0
[INFO/MainProcess] process shutting down

Subclassing Process

Although the simplest way to start a job in a separate process is to use Process and pass a target function, it is also possible to use a custom subclass. The derived class should override run() to do its work.

import multiprocessing

class Worker(multiprocessing.Process):

def run(self):
print 'In %s' % self.name
return

if __name__ == '__main__':
jobs = []
for i in range(5):
p = Worker()
jobs.append(p)
p.start()
for j in jobs:
j.join()
$ python multiprocessing_subclass.py
In Worker-1
In Worker-2
In Worker-3
In Worker-4
In Worker-5

See also

multiprocessing
The standard library documentation for this module.
threading
High-level API for working with threads.

PyMOTW Home

  • Anonymous

    I think in the logging section that the effective logging level is WARNING as that is what is set by the logging package for the root logger. Any child loggers such as the multiprocessing logger have a NOTSET level which means use the parent’s level – which will normally be WARNING when the default configuration is used.

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    A quick test without including the log level value shows that no output is printed. That may depend on whether any earlier configuration is done for the root logger, though. I didn’t do anything to set up logging in my sample script other than calling log_to_stderr().

  • Anonymous

    A later example illustrates using a lock to ensure that only one worker can print to stdout at a timeWhere is it? :)

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    D’oh! You found a spot where I copied some of the text around the threading example into the new post. I’ll clean that up in the official version of this document on my site.

    As far as the specific problem, the logging module doesn’t ensure cross-process locking. It would be possible to use the locks provided by multiprocessing, but you’d have to do it yourself. It would be easier and safer to have the processes log to separate files or go through a log daemon such as syslogd.

  • http://www.blogger.com/profile/01178295953897193794 Vitaly Babiy

    Hey can you explain this more I am not sure I get what you mean in this section
    The daemon process is terminated before the main program exits, to avoid leaving orphaned processes running.Is this done automatically?

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    @Vitaly – Yes, multiprocessing takes care of killing your daemon subprocesses automatically.