Testing Tools for Python

Test Driven Development and Test Automation are all the rage, and
Python developers have no shortage of tools for testing their own
applications.

This month I am starting a series of columns on tools for developing
in Python. I intend to cover areas such as version control, utility
modules, build, and packaging tools. Eventually I may even work up
the courage to address the issue of editors and IDEs while, I hope,
avoiding a religious war. But let’s start out by looking at tools for
testing your libraries and applications. This isn’t a comprehensive
survey, but it should give you some idea of the kinds of tools
available.

Testing Frameworks

The standard library includes two separate testing frameworks for
writing automated tests: doctest and unittest. Each framework has
its own style that works better in certain circumstances. Each also
has proponents arguing in its favor. The discussion is not quite up
to the level of the vi vs. emacs argument, but it’s getting
there in some circles.

When run, doctest looks at the documentation strings for your modules,
classes, and methods and uses interpreter prompts (>>>) to
identify test cases. Almost anything you can do with the interactive
prompt from the interpreter can be saved as a test, making it easy to
create new tests as you experiment with and document your code. Since
doctest scans your documentation strings for tests, a nice benefit of
using it is your documentation is tested along with your code. This
is especially handy for library developers who need to provide
examples of using their library modules.

For other situations, where you need more formalized tests or where
extra test fixtures are needed, the unittest module is a better
choice. The standard library version of unittest started out as part
of the PyUnit project.
The name PyUnit is an homage to its origin in the XUnit API, originally created by Kent Beck for use in
Smalltalk and now available for many other languages.

Tests are implemented using methods of classes derived from
unittest.TestCase. TestCase supports methods to configure
fixtures, run the tests, and then clean up after the test runs. The
separate methods for configuring and cleaning up the fixtures are
useful for extending your automated tests beyond the unit level, when
you might have several tests that depend on the same basic
configuration (such as having a few different objects which need to be
connected to one another, temporary files to create and clean up, or
even a database transaction to manage). Separating the fixture
management from the test clarifies the distinction between test code
and setup code and makes it easier to ensure that the fixtures are
cleaned up after a test failure.

Testing the Python Standard Library

Both test frameworks provide a function to serve as a main program to
make it easy to run the tests in a module. Running your tests one
module at a time does not scale up well when you start trying to run
all of the tests for a project as it grows to encompass several
packages. The Python standard library is a prime example of this.

All of the automated tests for the Python standard library are part of
the source distribution and can be found in the test module. Each standard
library module has at least one associated test file named after the
original module and prefixed with test_. The actual test code is
a mixture of doctest and unittest tests, depending on the module and
the nature of the tests. Some tests are easier to express using
doctest, but unittest is the preferred framework for new tests, and
most of the doctest tests are being converted. In fact, quite a few
are being converted as part of the Google Highly Open Participation
(GHOP) contest, which I discussed last month.

The test package also includes a tool for running the tests, called
regrtest. Since the standard library includes some
platform-specific packages and several other modules for which tests
might require special resources (audio hardware devices, for example),
the test.regrtest framework needed to take those requirements into
account. The solution was to extend the unittest framework to allow
tests with special requirements to be enabled or disabled depending on
the resources actually available at the time the tests are run. In
order to avoid false negatives, tests requiring special resources are
disabled by default, and must be enabled explicitly.

To run the standard library tests, just run regrtest.py from the
command line, inside the test directory. You can specify a single
module to test as an argument, or run all of the tests by omitting the
argument.

$ pwd
/python-trunk/Lib/test
$ python regrtest.py test_re.py
test_re
1 test OK.

The output reports that a single test was run, but that is a little
misleading. In fact it indicates that one module was run. Using
the -v option with regrtest.py will show the individual unit
tests that have been run, and their status.

To run tests with special resources enabled, use the -u option to
regrtest.py. For a complete list of the resource types, see the
help output (-h). This example illustrates running the curses
module tests, with curses disabled (the default):

$ python regrtest.py  test_curses.py
test_curses
test_curses skipped -- Use of the `curses'
resource not enabled
1 test skipped:
    test_curses
Those skips are all expected on darwin.

And now with curses enabled:

$ python regrtest.py -ucurses test_curses.py
[screen paints black and cursor moves around]
1 test OK.
$

Running Your Tests

The test.regrtest runner is not the only solution available for
managing large test suites. There are several alternative test
runners, including py.test, nose, and
Proctor.

The first, py.test, works by scanning your source files for functions
or methods that start with test_ and running any that it finds.
It depends on your tests using assert, and so does not require a
special test class hierarchy or file layout. It has support for
fixtures at granularity ranging all the way from module to class, and
even the method level. It does not directly support the unittest
framework, and has only experimental support for doctest tests. One
unique feature is the ability for a test function to act as a
generator to produce additional tests. The example given on the
py.test web site illustrates this nicely:

def test_generative():
    for x in (42,17,49):
        yield check, x

def check(arg):
    assert arg % 7 == 0   # second generated tests fails!

Using a generator like this can save repetitive code in your tests.

Nose is billed as a “discovery-based unittest extension”. It scans
your test files to identify test cases to run automatically by
detecting classes derived from unittest.TestCase or matching a
name pattern. This lets you avoid writing explicit test suites by
hand. Configuration is handled through command line flags or a config
file, which makes it easy to set verbosity and other options
consistently for all of your projects. Nose can be extended through a
plugin API, and there are several nose-related packages available for
producing output in different formats, filtering or capturing output,
and otherwise adapting it to work the way you do.

Proctor is another tool that works by scanning your source files.
(Disclosure: I wrote Proctor.) It works closely with the unittest
framework, and has features for managing test suites spanning multiple
directories using tagging. I won’t go into Proctor too deeply here,
since I have an entire article dedicated to it
planned for the near future. I will say that it is less extensible
than nose because it lacks a plugin system, but it has been used in
production for several years now. It specializes in producing output
that is machine parsable, to make it easy to produce reports from test
logs with many tests (several thousand at a time). Although the
feature sets of nose and Proctor overlap, neither implements a clear
super-set of the other.

IDEs and GUIs

In addition to the command line test runners described here, any
number of editors and IDEs support running tests. There are too many
to name, but all have some form of quick access to the tests, usually
bound to a single keystroke. Feedback ranges from text output, to
HTML, to a progress bar style report.

I’ve recently started experimenting with TextMate as a development
editor, and I have to say I like the “PyMate” feature for running
scripts. The output is parsed and presented in an HTML pane, with
tracebacks linking back to the original source line in the editor.
This makes it easy to jump from tracebacks in the test output directly
to your source, making the edit-test-repeat cycle shorter. Other
editors with support for running test suites include Eclipse, WingIDE,
Eric, and Komodo – I’m sure that list could be a lot longer. Each
has its own particular take on finding, running, and presenting the
results of your tests. This subject deserves more in-depth discussion
than I can provide in a single month’s space, so look for more details
in a future column.

Domain-Specific Testing

The basic frameworks offer a strong foundation for testing, but at
times you will benefit from extensions for testing in a particular
problem domain. For example, both webunit and webtest add
web browser simulation features to the existing unittest framework.
Taking a different approach, twill
implements a domain-specific language for “navigating Web pages,
posting forms and asserting conditions”. It can be used from within a
Python module, so it can be embedded in your tests as well. Some of
the modern web application frameworks, such as Django and Zope,
include built-in support for testing of your code with an environment
configured by the framework.

…Django and Zope include built-in support for testing of your
code with an environment configured by the framework.

Although web development is hot, not every application is necessarily
web-enabled. Testing GUI applications has traditionally required
special consideration. In the old days, support for driving GUI
toolkits programmatically was spotty and tests had to depend on image
captures to look for regression errors. Times are changing, though,
and modern toolkits with accessibility APIs support tools like
dogtail and ldtp with test frameworks for driving
GUIs without such limitations.

Code Coverage

Once you establish the habit of writing tests, the next thing you’ll
want to know about is whether you are testing all of your
application. To answer that question, you will need a code coverage
analysis library, and Python has two primary options: Ned Batchelder’s
excellent coverage.py and Titus
Brown’s newer figleaf. They take
different approaches to detecting potentially executable lines in your
source, resulting in different “lines of code” counts. Otherwise they
produce similar statistics about the execution of your program. You
can run the coverage tools yourself, or use them integrated with one
of the test runners described above. Both tools collect statistics as
your program (or test suite) runs and enable you to prepare reports
showing the percent coverage for modules, as well as a list of lines
never run. Both also support creating annotated versions of the
source, highlighting lines which are not run. With a report of source
lines never run, it is easy to identify dead branches in conditional
statements containing code that can be removed or areas that need more
unit tests.

Source Checkers

One class of testing tools not necessarily used in Python development
is a source all that often is a source code checker or linter. Many
of you will recognize lint as the anal-retentive C programmer’s
friend. Python has several similar offerings for examining your
source and automatically detecting error patterns such as creating
variables which are never used, over-writing builtin symbol names
(like using type as a variable), and other such common issues that
are easy to introduce and difficult to ferret out. Pylint is an actively maintained
extension to the original PyChecker. In addition to looking for
coding errors, Pylint will check your code against a style guide,
verify that your interfaces are actually implemented, and a whole
series of other common problems. Think of it as an automatic
code-review.

Conclusion

I only had space this month for a quick overview of the many testing
tools available for use with Python. I encourage you to check out
pycheesecake.org’s excellent taxonomy page for more a complete list
(see Related Links). Most of the tools described here are just an
easy_install away, so download a few and start working on your own
code quality today.

As always, if there is something you would like for me to cover in
this column, send a note with the details to doug dot hellmann at
pythonmagazine dot com and let me know, or add the link to your
del.icio.us account with the tag pymagdifferent. I’m particularly
interested in hearing from you about the tools you consider essential
for developing with Python. Anything from your favorite editor or IDE
to that obscure module you find yourself using over and over –
anything you just can’t seem to live without. I’ll be sharing links
and tips for development tools over the course of the next several
issues as I continue this series.

Originally published in Python Magazine Volume 2 Issue 1 , January, 2008