Testing Tools for Python

Originally posted Jan 1, 2008 · 11 min read

Test Driven Development and Test Automation are all the rage, and Python developers have no shortage of tools for testing their own applications.

This month I am starting a series of columns on tools for developing in Python. I intend to cover areas such as version control, utility modules, build, and packaging tools. Eventually I may even work up the courage to address the issue of editors and IDEs while, I hope, avoiding a religious war. But let’s start out by looking at tools for testing your libraries and applications. This isn’t a comprehensive survey, but it should give you some idea of the kinds of tools available.

Testing Frameworks

The standard library includes two separate testing frameworks for writing automated tests: doctest and unittest. Each framework has its own style that works better in certain circumstances. Each also has proponents arguing in its favor. The discussion is not quite up to the level of the vi vs. emacs argument, but it’s getting there in some circles.

When run, doctest looks at the documentation strings for your modules, classes, and methods and uses interpreter prompts (>>>) to identify test cases. Almost anything you can do with the interactive prompt from the interpreter can be saved as a test, making it easy to create new tests as you experiment with and document your code. Since doctest scans your documentation strings for tests, a nice benefit of using it is your documentation is tested along with your code. This is especially handy for library developers who need to provide examples of using their library modules.

For other situations, where you need more formalized tests or where extra test fixtures are needed, the unittest module is a better choice. The standard library version of unittest started out as part of the PyUnit project. The name PyUnit is an homage to its origin in the XUnit API, originally created by Kent Beck for use in Smalltalk and now available for many other languages.

Tests are implemented using methods of classes derived from unittest.TestCase. TestCase supports methods to configure fixtures, run the tests, and then clean up after the test runs. The separate methods for configuring and cleaning up the fixtures are useful for extending your automated tests beyond the unit level, when you might have several tests that depend on the same basic configuration (such as having a few different objects which need to be connected to one another, temporary files to create and clean up, or even a database transaction to manage). Separating the fixture management from the test clarifies the distinction between test code and setup code and makes it easier to ensure that the fixtures are cleaned up after a test failure.

Testing the Python Standard Library

Both test frameworks provide a function to serve as a main program to make it easy to run the tests in a module. Running your tests one module at a time does not scale up well when you start trying to run all of the tests for a project as it grows to encompass several packages. The Python standard library is a prime example of this.

All of the automated tests for the Python standard library are part of the source distribution and can be found in the test module. Each standard library module has at least one associated test file named after the original module and prefixed with test_. The actual test code is a mixture of doctest and unittest tests, depending on the module and the nature of the tests. Some tests are easier to express using doctest, but unittest is the preferred framework for new tests, and most of the doctest tests are being converted. In fact, quite a few are being converted as part of the Google Highly Open Participation (GHOP) contest, which I discussed last month.

The test package also includes a tool for running the tests, called regrtest. Since the standard library includes some platform-specific packages and several other modules for which tests might require special resources (audio hardware devices, for example), the test.regrtest framework needed to take those requirements into account. The solution was to extend the unittest framework to allow tests with special requirements to be enabled or disabled depending on the resources actually available at the time the tests are run. In order to avoid false negatives, tests requiring special resources are disabled by default, and must be enabled explicitly.

To run the standard library tests, just run regrtest.py from the command line, inside the test directory. You can specify a single module to test as an argument, or run all of the tests by omitting the argument.

$ pwd
/python-trunk/Lib/test
$ python regrtest.py test_re.py
test_re
1 test OK.

The output reports that a single test was run, but that is a little misleading. In fact it indicates that one module was run. Using the -v option with regrtest.py will show the individual unit tests that have been run, and their status.

To run tests with special resources enabled, use the -u option to regrtest.py. For a complete list of the resource types, see the help output (-h). This example illustrates running the curses module tests, with curses disabled (the default):

$ python regrtest.py  test_curses.py
test_curses
test_curses skipped -- Use of the `curses'
resource not enabled
1 test skipped:
    test_curses
Those skips are all expected on darwin.

And now with curses enabled:

$ python regrtest.py -ucurses test_curses.py
[screen paints black and cursor moves around]
1 test OK.
$

Running Your Tests

The test.regrtest runner is not the only solution available for managing large test suites. There are several alternative test runners, including py.test, nose, and Proctor.

The first, py.test, works by scanning your source files for functions or methods that start with test_ and running any that it finds. It depends on your tests using assert, and so does not require a special test class hierarchy or file layout. It has support for fixtures at granularity ranging all the way from module to class, and even the method level. It does not directly support the unittest framework, and has only experimental support for doctest tests. One unique feature is the ability for a test function to act as a generator to produce additional tests. The example given on the py.test web site illustrates this nicely:

def test_generative():
    for x in (42,17,49):
        yield check, x

def check(arg):
    assert arg % 7 ==    # second generated tests fails!

Using a generator like this can save repetitive code in your tests.

Nose is billed as a “discovery-based unittest extension”. It scans your test files to identify test cases to run automatically by detecting classes derived from unittest.TestCase or matching a name pattern. This lets you avoid writing explicit test suites by hand. Configuration is handled through command line flags or a config file, which makes it easy to set verbosity and other options consistently for all of your projects. Nose can be extended through a plugin API, and there are several nose-related packages available for producing output in different formats, filtering or capturing output, and otherwise adapting it to work the way you do.

Proctor is another tool that works by scanning your source files. (Disclosure: I wrote Proctor.) It works closely with the unittest framework, and has features for managing test suites spanning multiple directories using tagging. I won’t go into Proctor too deeply here, since I have an entire article dedicated to it planned for the near future. I will say that it is less extensible than nose because it lacks a plugin system, but it has been used in production for several years now. It specializes in producing output that is machine parsable, to make it easy to produce reports from test logs with many tests (several thousand at a time). Although the feature sets of nose and Proctor overlap, neither implements a clear super-set of the other.

IDEs and GUIs

In addition to the command line test runners described here, any number of editors and IDEs support running tests. There are too many to name, but all have some form of quick access to the tests, usually bound to a single keystroke. Feedback ranges from text output, to HTML, to a progress bar style report.

I’ve recently started experimenting with TextMate as a development editor, and I have to say I like the “PyMate” feature for running scripts. The output is parsed and presented in an HTML pane, with tracebacks linking back to the original source line in the editor. This makes it easy to jump from tracebacks in the test output directly to your source, making the edit-test-repeat cycle shorter. Other editors with support for running test suites include Eclipse, WingIDE, Eric, and Komodo – I’m sure that list could be a lot longer. Each has its own particular take on finding, running, and presenting the results of your tests. This subject deserves more in-depth discussion than I can provide in a single month’s space, so look for more details in a future column.

Domain-Specific Testing

The basic frameworks offer a strong foundation for testing, but at times you will benefit from extensions for testing in a particular problem domain. For example, both webunit and webtest add web browser simulation features to the existing unittest framework. Taking a different approach, twill implements a domain-specific language for “navigating Web pages, posting forms and asserting conditions”. It can be used from within a Python module, so it can be embedded in your tests as well. Some of the modern web application frameworks, such as Django and Zope, include built-in support for testing of your code with an environment configured by the framework.

…Django and Zope include built-in support for testing of your code with an environment configured by the framework.

Although web development is hot, not every application is necessarily web-enabled. Testing GUI applications has traditionally required special consideration. In the old days, support for driving GUI toolkits programmatically was spotty and tests had to depend on image captures to look for regression errors. Times are changing, though, and modern toolkits with accessibility APIs support tools like dogtail and ldtp with test frameworks for driving GUIs without such limitations.

Code Coverage

Once you establish the habit of writing tests, the next thing you’ll want to know about is whether you are testing all of your application. To answer that question, you will need a code coverage analysis library, and Python has two primary options: Ned Batchelder’s excellent coverage.py and Titus Brown’s newer figleaf. They take different approaches to detecting potentially executable lines in your source, resulting in different “lines of code” counts. Otherwise they produce similar statistics about the execution of your program. You can run the coverage tools yourself, or use them integrated with one of the test runners described above. Both tools collect statistics as your program (or test suite) runs and enable you to prepare reports showing the percent coverage for modules, as well as a list of lines never run. Both also support creating annotated versions of the source, highlighting lines which are not run. With a report of source lines never run, it is easy to identify dead branches in conditional statements containing code that can be removed or areas that need more unit tests.

Source Checkers

One class of testing tools not necessarily used in Python development is a source all that often is a source code checker or linter. Many of you will recognize lint as the anal-retentive C programmer’s friend. Python has several similar offerings for examining your source and automatically detecting error patterns such as creating variables which are never used, over-writing builtin symbol names (like using type as a variable), and other such common issues that are easy to introduce and difficult to ferret out. Pylint is an actively maintained extension to the original PyChecker. In addition to looking for coding errors, Pylint will check your code against a style guide, verify that your interfaces are actually implemented, and a whole series of other common problems. Think of it as an automatic code-review.

Conclusion

I only had space this month for a quick overview of the many testing tools available for use with Python. I encourage you to check out pycheesecake.org’s excellent taxonomy page for more a complete list (see Related Links). Most of the tools described here are just an easy_install away, so download a few and start working on your own code quality today.

As always, if there is something you would like for me to cover in this column, send a note with the details to doug dot hellmann at pythonmagazine dot com and let me know, or add the link to your del.icio.us account with the tag pymagdifferent. I’m particularly interested in hearing from you about the tools you consider essential for developing with Python. Anything from your favorite editor or IDE to that obscure module you find yourself using over and over – anything you just can’t seem to live without. I’ll be sharing links and tips for development tools over the course of the next several issues as I continue this series.