PyMOTW: profile, cProfile, pstats

The profile and cProfile modules provide APIs for collecting and analyzing statistics about how Python source consumes processor resources.

Module: profile
Purpose: Performance analysis of Python programs
Python Version: 1.4 and later, these examples are for Python 2.5

run():

The most basic starting point in the profile module is run(). It takes a string statement as argument, and creates a report of the time spent executing different lines of code while running the statement.

import profile

def fib(n):
# from http://en.literateprograms.org/Fibonacci_numbers_(Python)
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)

def fib_seq(n):
seq = [ ]
if n > 0:
seq.extend(fib_seq(n-1))
seq.append(fib(n))
return seq

print 'RAW'
print '=' * 80
profile.run('print fib_seq(20); print')


This recursive version of a fibonacci sequence calculator is especially useful for demonstrating the profile because we can improve the performance so much. The standard report format shows a summary and then details for each function executed.


$ python profile_fibonacci_raw.py
RAW
================================================================================
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

57356 function calls (66 primitive calls) in 0.746 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
21 0.000 0.000 0.000 0.000 :0(append)
20 0.000 0.000 0.000 0.000 :0(extend)
1 0.001 0.001 0.001 0.001 :0(setprofile)
1 0.000 0.000 0.744 0.744 :1()
1 0.000 0.000 0.746 0.746 profile:0(print fib_seq(20); print)
0 0.000 0.000 profile:0(profiler)
57291/21 0.743 0.000 0.743 0.035 profile_fibonacci_raw.py:13(fib)
21/1 0.001 0.000 0.744 0.744 profile_fibonacci_raw.py:22(fib_seq)


As you can see, it takes 57356 separate function calls and 3/4 of a second to run. Since there are only 66 *primitive* calls, we know that the vast majority of those 57k calls were recursive. The details about where time was spent are broken out by function in the listing showing the number of calls, total time spent in the function, time per call (tottime/ncalls), cumulative time spent in a function, and the ratio of cumulative time to primitive calls.

Not surprisingly, most of the time here is spent calling fib() repeatedly. We can add a memoize decorator to reduce the number of recursive calls and have a big impact on the performance of this function.

import profile

class memoize:
# from http://avinashv.net/2008/04/python-decorators-syntactic-sugar/
def __init__(self, function):
self.function = function
self.memoized = {}

def __call__(self, *args):
try:
return self.memoized[args]
except KeyError:
self.memoized[args] = self.function(*args)
return self.memoized[args]

@memoize
def fib(n):
# from http://en.literateprograms.org/Fibonacci_numbers_(Python)
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)

def fib_seq(n):
seq = [ ]
if n > 0:
seq.extend(fib_seq(n-1))
seq.append(fib(n))
return seq

if __name__ == '__main__':
print 'MEMOIZED'
print '=' * 80
profile.run('print fib_seq(20); print')


By remembering the Fibonacci value at each level we can avoid most of the recursion and drop down to 145 calls that only take 0.003 seconds. Also notice that the ncalls count for fib() shows that it *never* recurses.


$ python profile_fibonacci_memoized.py
MEMOIZED
================================================================================
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

145 function calls (87 primitive calls) in 0.003 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
21 0.000 0.000 0.000 0.000 :0(append)
20 0.000 0.000 0.000 0.000 :0(extend)
1 0.001 0.001 0.001 0.001 :0(setprofile)
1 0.000 0.000 0.002 0.002 :1()
1 0.000 0.000 0.003 0.003 profile:0(print fib_seq(20); print)
0 0.000 0.000 profile:0(profiler)
59/21 0.001 0.000 0.001 0.000 profile_fibonacci_memoized.py:19(__call__)
21 0.000 0.000 0.001 0.000 profile_fibonacci_memoized.py:26(fib)
21/1 0.001 0.000 0.002 0.002 profile_fibonacci_memoized.py:36(fib_seq)


runctx():

Of course, it’s not always easy to construct the expression to pass to run(). Sometimes it is easier to build a simple expression and run it in a context with globals and locals, using runctx().

import profile
from profile_fibonacci_memoized import fib, fib_seq

if __name__ == '__main__':
profile.runctx('print fib_seq(n); print', globals(), {'n':20})


In this example, the value of “n” is passed through the local variable context instead of being embedded directly in the statement passed to runctx().


$ python profile_runctx.py
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

145 function calls (87 primitive calls) in 0.003 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
21 0.000 0.000 0.000 0.000 :0(append)
20 0.000 0.000 0.000 0.000 :0(extend)
1 0.001 0.001 0.001 0.001 :0(setprofile)
1 0.000 0.000 0.002 0.002 :1()
1 0.000 0.000 0.003 0.003 profile:0(print fib_seq(n); print)
0 0.000 0.000 profile:0(profiler)
59/21 0.001 0.000 0.001 0.000 profile_fibonacci_memoized.py:19(__call__)
21 0.000 0.000 0.001 0.000 profile_fibonacci_memoized.py:26(fib)
21/1 0.001 0.000 0.002 0.002 profile_fibonacci_memoized.py:36(fib_seq)


pstats: Saving and Working With Statistics:

If the standard report is not formatted the way you need it to be, both run() and runctx() take a filename argument to save the raw data to a file instead of printing the text report. The Stats class from the pstats module knows how to read the file and can be used to manipulate the data.

For example, to run several iterations of the same test and combine the results, you might do something like this:

import profile
import pstats
from profile_fibonacci_memoized import fib, fib_seq

# Create 5 set of stats
filenames = []
for i in range(5):
filename = 'profile_stats_%d.stats' % i
profile.run('print %d, fib_seq(20)' % i, filename)

# Read all 5 stats files into a single object
stats = pstats.Stats('profile_stats_0.stats')
for i in range(1, 5):
stats.add('profile_stats_%d.stats' % i)

# Clean up filenames for the report
stats.strip_dirs()

# Sort the statistics by the cumulative time spent in the function
stats.sort_stats('cumulative')

stats.print_stats()


The output report is sorted in descending order of cumulative time spent in the function and the directory names are removed from the printed filenames to conserve horizontal space.


$ python profile_stats.py
0 [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
1 [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
2 [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
3 [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
4 [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
Sun Aug 31 11:29:36 2008 profile_stats_0.stats
Sun Aug 31 11:29:36 2008 profile_stats_1.stats
Sun Aug 31 11:29:36 2008 profile_stats_2.stats
Sun Aug 31 11:29:36 2008 profile_stats_3.stats
Sun Aug 31 11:29:36 2008 profile_stats_4.stats

489 function calls (351 primitive calls) in 0.008 CPU seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
5 0.000 0.000 0.007 0.001 :1()
105/5 0.004 0.000 0.007 0.001 profile_fibonacci_memoized.py:36(fib_seq)
1 0.000 0.000 0.003 0.003 profile:0(print 0, fib_seq(20))
143/105 0.001 0.000 0.002 0.000 profile_fibonacci_memoized.py:19(__call__)
1 0.000 0.000 0.001 0.001 profile:0(print 4, fib_seq(20))
1 0.000 0.000 0.001 0.001 profile:0(print 1, fib_seq(20))
1 0.000 0.000 0.001 0.001 profile:0(print 2, fib_seq(20))
1 0.000 0.000 0.001 0.001 profile:0(print 3, fib_seq(20))
21 0.000 0.000 0.001 0.000 profile_fibonacci_memoized.py:26(fib)
100 0.001 0.000 0.001 0.000 :0(extend)
105 0.001 0.000 0.001 0.000 :0(append)
5 0.001 0.000 0.001 0.000 :0(setprofile)
0 0.000 0.000 profile:0(profiler)


Limiting Report Contents:

Since we are studying the performance of fib() and fib_seq(), we can also restrict the output report to only include those functions using a regular expression to match the filename:lineno(function) values we want.

import profile
import pstats
from profile_fibonacci_memoized import fib, fib_seq

# Read all 5 stats files into a single object
stats = pstats.Stats('profile_stats_0.stats')
for i in range(1, 5):
stats.add('profile_stats_%d.stats' % i)
stats.strip_dirs()
stats.sort_stats('cumulative')

# limit output to lines with "(fib" in them
stats.print_stats('\(fib')


The regular expression includes a literal left paren (() to match against the function name portion of the location value.


$ python profile_stats_restricted.py
Sun Aug 31 11:29:36 2008 profile_stats_0.stats
Sun Aug 31 11:29:36 2008 profile_stats_1.stats
Sun Aug 31 11:29:36 2008 profile_stats_2.stats
Sun Aug 31 11:29:36 2008 profile_stats_3.stats
Sun Aug 31 11:29:36 2008 profile_stats_4.stats

489 function calls (351 primitive calls) in 0.008 CPU seconds

Ordered by: cumulative time
List reduced from 13 to 2 due to restriction

ncalls tottime percall cumtime percall filename:lineno(function)
105/5 0.004 0.000 0.007 0.001 profile_fibonacci_memoized.py:36(fib_seq)
21 0.000 0.000 0.001 0.000 profile_fibonacci_memoized.py:26(fib)


Caller / Callee Graphs:

Stats also includes methods for printing the callers and callees of functions.

import profile
import pstats
from profile_fibonacci_memoized import fib, fib_seq

# Read all 5 stats files into a single object
stats = pstats.Stats('profile_stats_0.stats')
for i in range(1, 5):
stats.add('profile_stats_%d.stats' % i)
stats.strip_dirs()
stats.sort_stats('cumulative')

print 'INCOMING CALLERS:'
stats.print_callers('\(fib')

print 'OUTGOING CALLEES:'
stats.print_callees('\(fib')


The arguments to print_callers() and print_callees() work the same as the restriction arguments to print_stats(). The output shows the caller, callee, and cumulative time.


$ python profile_stats_callers.py
INCOMING CALLERS:
Ordered by: cumulative time
List reduced from 13 to 2 due to restriction

Function was called by...
profile_fibonacci_memoized.py:36(fib_seq) :1()(5) 0.007
profile_fibonacci_memoized.py:36(fib_seq)(100) 0.007
profile_fibonacci_memoized.py:26(fib)

OUTGOING CALLEES:
Ordered by: cumulative time
List reduced from 13 to 2 due to restriction

Function called...
profile_fibonacci_memoized.py:36(fib_seq) -> :0(append)(105) 0.001
:0(extend)(100) 0.001
profile_fibonacci_memoized.py:19(__call__)(105) 0.002
profile_fibonacci_memoized.py:36(fib_seq)(100) 0.007
profile_fibonacci_memoized.py:26(fib) -> profile_fibonacci_memoized.py:19(__call__)(38) 0.002


References:

Python Module of the Week Home
Download Sample Code
profile and cProfile
pstats

The Fibonacci example implementation came from Fibonacci numbers (Python) – LiteratePrograms.

The memoize decorator came from Python Decorators: Syntactic Sugar | avinash.vora.

History:

Updated 1 September 2008 to add content of profile_runctx.py.


Technorati Tags:
,


  • http://www.blogger.com/profile/06984896416340857963 Chad

    I do lots of profiling in Python, and for any nontrivial application, I always found using the pstats module directly to be severely lacking. Therefore, I whipped up this simple tool to allow you to browse through the pstats from a web browser:

    http://imvu.svn.sourceforge.net/viewvc/imvu/imvu_open_source/tools/pstats_viewer.py?revision=2&view=markup

    Launch the program via “python pstats_viewer.py

    Hopefully someone finds it useful.

  • http://www.blogger.com/profile/05937623368605761118 ddaa

    A long time ago (Jan. 2006) I hacked a converter to feed the output of lsprof (now cProfile) into kcachegrind.

    This bit of code had its own life since then. The last I heard of it, it was merged into the pypy repository (appears to be broken at time of writing).

    Anyone interested is free to take it. Just be nice and drop me a note and a comment on the blog.

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    @chad – Thanks, that looks really handy for browsing result sets.

    @ddaa – I had never heard of kcachegrind, but it looks like an excellent visualization tool. If someone fixes the converter, I hope they post it online.

  • http://www.blogger.com/profile/12703660821260306458 José Fonseca

    Yet another pstats visualizer is Gprof2Dot, which generates a dot graph out of many profiling data formats, including pstats.

    You can view and interact with the generated dot graph using the XDot viewer.

    Both Gprof2Dot and XDot are Python-powered single file scripts.

  • http://www.blogger.com/profile/12703660821260306458 José Fonseca

    BTW, something wrong happened with profile_runctx.py example. It is not being inlined.

    But besides that, it is an extensive and thorough tutorial on profiling with Python, excellent like the whole PyMOTW series, Doug.

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    @José – Thanks, that was a copy-and-paste error. I’ve moved to writing these articles in reST first and then converting them to HTML, but I haven’t worked out a good tool-chain for automating that yet.

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    @José – Oh, and thanks for the pointer to gprof2dot! I wish I had known about that the last time I had to chase down a call sequence at work.