Book Review: Python Testing Cookbook

I received a review copy of Python Testing Cookbook by Greg
Turnquist
late last week. The timing was perfect, since we have been
talking a lot about increasing the amount of automation we use in our
testing at work. I scanned through the book quickly to get a sense for
whether I should give it to our QA team, and I liked what I found.

Quick Review

My review for the impatient reader:

Why I picked it up: The timing was perfect. We are applying more
test automation techniques as part of improving our QA process at
work, and I hoped to find some useful tools.

Why I finished it: It’s a fast read, especially when you skim over
the code. There are a lot of good tips, and it introduces a range of
tools, which is just what I needed.

I’d give it to: Our QA team. It will explain some of the tools the
development team is already using and introduce others that are well
suited for the needs of the QA team.

Review

I’m not normally a fan of “cookbook” style technology books. The ones
I have read were disorganized grab-bags of random suggestions only
useful to someone with a background in the topic they covered. This
book is an exception. The author did a good job of organizing Python
Testing Cookbook
to avoid the grab-bag feeling. Each chapter
introduces a new concept, and the sections and recipes build on each
other in a steady progression. Because it introduces basic concepts
such as the unittest and doctest frameworks before moving to
test runners like nose and advanced tools such as mock,
Pycurracy, robotframework, lettuce, and Jenkins, the book
is useful for someone approaching test automation for the first time
who wants to read straight through, without sacrificing the
quick-reference goal of the cookbook format.

I’m not normally a fan of “cookbook” style technology books. This
book is an exception.

The book stands up well as a beginner- or intermediate-level text
because it takes the reader beyond a strict “how to” structure. In
addition to the step-by-step instructions you expect, each recipe
explains how the techniques employed work and gives advice about when
and why to use them. Some of the best tips in the book are included
in those sections.

I think our QA team will find the book especially helpful because it
covers more than green field development work. Chapter 9 in particular
talks about how to apply the techniques covered earlier with legacy
systems. Those tips encourage you to start testing, even if it is a
big project and you’re unlikely to reach 100% coverage in the
short-term.

“Don’t get caught up in the purity of total isolation or worry about
obscure test methods. First thing, start testing.”

Chapter 4 covers a few interesting techniques and tools for “Behavior
Driven Development”. Lettuce piques my interest, as I hadn’t heard
of it before. The ability to describe tests independently of the code
that performs them will be useful for non-programmers (as long as they
have some help to bind the test descriptions to those checks, of
course).

I only have two negative comments to make. Most or all of the program
output is presented as screen shots. While I’m sure that made their
editing process easier (there’s no need to worry about whether a copy
editor broke the output formatting or “corrected” the language in an
error message), the white-on-black images don’t come out well in the
printing. In a few cases is looks like the opacity for the terminal
window was not set high enough to mask the text from other
windows. It’s distracting, but the important text is clearly legible,
so in the end it isn’t a big problem.

The other point is related to scope. A few of the more advanced tools
(Jenkins and TeamCity), are introduced without a lot of depth. Basic
use patterns are presented to get you started, and reference links are
included for readers who want to find out more about the tools. I
think this is a limitation of the size and format of the book, rather
than an oversight, and so I can’t complain too much, but those
sections do stand out as thinner than the others.

Recommendation

I recommend this book. It exceeded my expectations (both of the
format, and of Packt’s reputation), and provides a clear introduction
to tools and techniques for testing your application from a variety of
perspectives.

Disclaimer: I received a free review copy of this book from the
publisher.

PyMOTW: doctest – Testing through documentation

doctest lets you test your code by running examples embedded in
the documentation and verifying that they produce the expected
results. It works by parsing the help text to find examples, running
them, then comparing the output text against the expected value. Many
developers find doctest easier than unittest because in
its simplest form, there is no API to learn before using it. However,
as the examples become more complex the lack of fixture management can
make writing doctest tests more cumbersome than using
unittest.

Read more at pymotw.com: doctest

Automated Testing with unittest and Proctor

Originally published in Python Magazine Volume 2 Issue 3 , March, 2008

Automated testing is an important part of Agile development
methodologies, and the practice is seeing increasing adoption even in
environments where other Agile tools are not used. This article
discusses testing techniques for you to use with the open source tool
Proctor. By using Proctor, you will not only manage your automated
test suite more effectively, but you will also obtain better results
in the process.

What is Proctor?

Proctor is a tool for running automated tests in Python source
code. It scans input source files looking for classes based on the
TestCase class from the unittest module in the Python standard
library. You can use arbitrary organization schemes for tests defined
in separate source modules or directories by applying user defined
categories to test classes. Proctor constructs test suites dynamically
at run time based on your categories, making it easy to run a subset
of the tests even if they are not in the same location on the
filesystem. Proctor has been specifically designed to operate on a
large number of tests (more than 3500 at one site). Although it
depends on the unittest module, Proctor is also ideally suited for use
with integration or higher level tests, because it is easy to
configure it to run unattended.

Installation

Proctor uses the standard distutils module tools for installation
support. If you have previously installed easy_install, using it
is the simplest way to install packages such as Proctor that are
listed in the Python Package Index.

$ sudo easy_install Proctor

Running easy_install will download and install the most recent
version by default. If you do not have easy_install, download the
latest version of the Proctor source code from the home page (see the
references list for this article), then install it as you would any
other Python package:

$ tar zxvf Proctor-1.2.tar.gz
$ cd Proctor-1.2
$ sudo python setup.py install

Once Proctor is installed, you will find a command line program,
proctorbatch, in your shell execution path. Listing 1 shows the
command syntax for proctorbatch. I will examine the command line
options in more detail throughout the rest of this article using a few
simple tests.

Listing 1

proctorbatch



    Proctor is a tool for running unit tests.  It enhances the
    existing unittest module to provide the ability to find all tests
    in a set of code, categorize them, and run some or all of them.
    Test output may be generated in a variety of formats to support
    parsing by another tool or simple, nicely formatted, reports for
    human review.



SYNTAX:

    proctorbatch [<options>] [<directory name> ...]

        --category=categoryName
        --coverage-exclude=pattern
        --coverage-file=filename
        --debug
        --interleaved
        --list
        --list-categories
        --no-coverage
        --no-gc
        --no-run
        --parsable
        -q
        -v


OPTIONS:

    -h             Displays abbreviated help message.

    --help         Displays complete usage information.

    --category=categoryName
                   Run only the tests in the specified category.

                   Warning: If there are no tests in a category,
                   an error will not be produced.  The test suite
                   will appear to be empty.


    --coverage-exclude=pattern
                   Add a line exclude pattern
                   (can be a regular expression).


    --coverage-file=filename
                   Write coverage statistics to the specified file.


    --debug        Turn on debug mode to see tracebacks.


    --interleaved  Interleave error and failure messages
                   with the test list.


    --list         List tests.


    --list-categories
                   List test categories.


    --no-coverage  Disable coverage analysis.


    --no-gc        Disable garbage collection and leak reporting.


    --no-run       Do not run the tests


    --parsable     Format output to make it easier to parse.


    -q             Turn on quiet mode.


    -v             Increment the verbose level.
                   Higher levels are more verbose.
                   The default is 1.

Sample Tests and Standard unittest Features

The simplest sample set of test cases needs to include at least three
tests: one to pass, one to fail, and one to raise an exception
indicating an error. For this example, I have separated the tests into
three classes and provided two test methods on each class. Listing 2
shows the code to define the tests, including the standard
unittest boilerplate code for running them directly.

Listing 2

#!/usr/bin/env python
# Sample tests for exercising Proctor.

import unittest

class PassingTests(unittest.TestCase):

    def test1(self):
        return

    def test2(self):
        return

class FailingTests(unittest.TestCase):

    def test1(self):
        self.fail('Always fails 1')
        return

    def test2(self):
        self.fail('Always fails 2')
        return

class ErrorTests(unittest.TestCase):

    def test1(self):
        raise RuntimeError('test1 error')

    def test2(self):
        raise RuntimeError('test2 error')

if __name__ == '__main__': # pragma: no cover
    unittest.main()

When python Listing2.py is run, it invokes the unittest module’s
main() function. As main() runs, the standard test loader is
used to find tests in the current module, and all of the discovered
tests are executed one after the other. It is also possible to name
individual tests or test classes to be run using arguments on the
command line. For example, python Listing2.py PassingTests runs
both of the tests in the PassingTests class. This standard
behavior is provided by the unittest module and is useful if you know
where the tests you want to run are located in your code base.

It is also possible to organize tests from different classes into
“suites”. You can create the suites using any criteria you like –
themes, feature areas, level of abstraction, specific bugs, etc. For
example, this code sets up a suite containing two test cases:

import unittest
from Listing2  import *

suite1 = unittest.TestSuite([PassingTests('test1'), FailingTests('test1')])
unittest.main(defaultTest='suite1')

When run, the above code would execute the tests
PassingTests.test1 and FailingTests.test1, since those are
explicitly included in the suite. The trouble with creating test
suites in this manner is that you have to maintain them by hand. Any
time new tests are created or obsolete tests are removed, the suite
definitions must be updated as well. This may not be a lot of work for
small projects, but as project size and test coverage increases, the
extra work can become unmanageable very quickly.

Proctor was developed to make working with tests across classes,
modules, and directories easier by eliminating the manual effort
involved in building suites of related tests. Over the course of
several years, the set of automated tests we have written for my
company’s product has grown to contain over 3500 individual tests.

Our code is organized around functional areas, with user interface and
back-end code separated into different modules and packages. In order
to run the automated tests for all aspects of a specific feature, a
developer may need to run tests in several modules from different
directories in their sandbox. By building on the standard library
features of unittest, Proctor makes it easy to manage all of the tests
no matter where they are located in the source tree.

Running Tests with Proctor

The first improvement Proctor makes over the standard unittest test
loader is that Proctor can scan multiple files to find all of the
tests, then run them in a single batch. Each Python module specified
on the command line is imported, one at a time. After a module is
loaded, it is scanned for classes derived from unittest.TestCase,
just as with the standard test loader. All of the tests are added to a
test suite, and when the scanner is finished loading the test modules,
all of the tests in the suite are run.

For example, to scan all Python files in the installed version of
proctorlib for tests you would run:

$ cd /usr/lib/python2.5/site-packages
$ proctorbatch proctorlib/*.py

Proctor also accepts directory names as arguments, so the command can
be written:

$ cd /usr/lib/python2.5/site-packages
$ proctorbatch proctorlib

Proctor will search recursively down through any directories given to
find all of the tests in any modules in subdirectories. The file or
directory names are converted to importable package names, so that
directory/file.py is imported as directory.file. If your code
is organized under a single Python package, and you wish to run all of
the tests in the package, you only need to specify the root directory
for that package.

Expanding Automated Testing Beyond Unit Tests

Tests are often categorized by developers based on the scope of the
functionality being tested as either “unit” or “integration” tests. A
unit test is usually a very low level test that requires few or no
external resources and verifies the functionality of an isolated
feature (such as a single method of a single class). An integration
test, by contrast, depends on the interaction of several classes or
instances, and is designed to ensure that the API between the objects
works as expected. For example, an integration test might verify that
an ORM library stores data in the expected tables, or that temporary
files are managed correctly when some filesystem actions are
performed.

At the company where I work, we use the unittest framework found in
the Python standard library for all of our automated unit and
integration tests. It is convenient for us to use a single framework,
because it means the developers only have to manage one set of test
tools. Another benefit is the nightly batch job that runs the
integration tests also includes all of the unit tests at the same
time. By running the unit and integration tests automatically every
night, we can identify regression errors we might not have otherwise
detected until later in the testing cycle. The integration tests use
database and filesystem resources created as fixtures by the test
setUp and tearDown hooks. Our developers can run unit-level
tests directly with unittest.main(), or test entire source
packages with Proctor. The code for integration tests may be mingled
in the same modules with the unit tests or in separate modules,
depending on how the developer responsible for the area of code in
question has it organized.

Some of the tests we have written need to use hardware that may not
always be available in the test environment. We write automated tests
for all of the “device driver” modules that are used to integrate
infrastructure devices such as network switches, load balancers, and
storage arrays with our product. These tests typically require an
actual device for the test to run successfully, since the tests
reconfigure it and then verify the results match what is
expected. This situation poses a problem, since the test equipment is
not always present in every test environment. Sometimes we only have
one compatible device. At other times, a device was on loan and so may
have been returned to the original vendor after the driver was
finished. In both of these cases, the test setup code will not be able
to find the equipment required for testing. Under these circumstances,
the tests will produce an error every time they run, and it is useful
to be able to skip over them and thus avoid false alarms and wasted
test time.

Proctor solves this problem with a flag that causes it to ignore the
tests in a module. To tell Proctor to ignore a specific module, add
the module level variable __proctor_ignore_module__ in the
source. Listing 3 shows an example with this flag set. Proctor still
imports the module, but when it sees the flag set to True, it does
not scan the contents of the module for tests. When the resource
needed by the tests becomes available in our lab and it is time to
test a device driver, we simply run the test file directly instead of
using Proctor.

Listing 3

#!/usr/bin/env python
# The tests in this module are ignored by Proctor

import unittest

# Tell Proctor to ignore the tests in this module.
__proctor_ignore_module__ = True

class IgnoredTest(unittest.TestCase):

    def testShouldNotBeRun(self):
        self.fail('This test will not be run by Proctor')
        return

if __name__ == '__main__':
    # If this file is run directly, the tests are still executed.
    unittest.main()

Some of our other tests use resources available when the tests are run
on a developer workstation, but not when they are run as part of a
nightly batch job. For example, some portions of the graphical user
interface for our product have automated tests, but since it is an X
Windows application, those tests cannot be run without an X Windows
server, which is not present on the automated test server. Since all
of the GUI code is in one directory, it is easier to instruct Proctor
to ignore all of the modules in that directory instead of setting the
ignore flag separately for each file.

Proctor supports ignoring entire directories through a configuration
file named .proctor. The configuration file in each directory can
be used to specify modules or subdirectories to be ignored by Proctor
when scanning for tests. The files or directories specified in the
ignore variable are not imported at all, so if importing some
modules would fail without resources like an X Windows server
available, you can use a .proctor file as a more effective method
of ignoring them rather than setting the ignore flag inside the
source. All of the file or directory names in the ignore list are
relative to the directory containing the configuration file. For
example, to ignore the productname.gui package, create a file in
the productname directory containing ignore = [“gui”], like
this:

# Proctor instructions file ".proctor"

# Importing the gui module requires an X server,
# which is not available for the nightly test batch job.
ignore = [ 'gui' ]

The .proctor file uses Python syntax and can contain any legal
Python code. This means you can use modules such as os and
glob to build up the list of files to be ignored, following any
rules you want to establish. Here is a more sophisticated example
which only disables the GUI tests if it cannot find the X server they
require:

import os

ignore = []
if os.environ.get('DISPLAY') is None:
    ignore.append('gui')

Organizing Tests Beyond Classes and Modules

The usual way to organize related test functions is by placing them
together in the same class, and then by placing related classes
together in the same module. Such a neat organizational scheme is not
always possible, however, and related tests might be in different
modules or even in different directories. Sometimes, test modules grow
too large and need to be broken up so they are easier to maintain. In
other cases, when a feature is implemented, different aspects of the
code may be spread among files in multiple source directories,
reflecting the different layers of the application. Proctor can use
test categories to dynamically construct a test suite of related tests
without requiring the test authors to know about all of the tests in
advance or to update a test suite manually.

Proctor uses simple string identifiers as test categories, much like
the tags commonly found in a Web 2.0 application. It is easy to add
categories to your existing tests by setting the class attribute
PROCTOR_TEST_CATEGORIES to a sequence of strings; no special base
class is needed. Then tell proctorbatch to limit the test suite to
tests in specific category using the –category option.

Using proctorbatch

Listing 4 shows some new test classes with categories that are useful
as examples to demonstrate how the command line options to
proctorbatch work. The first class, FeatureOneTests, is
categorized as being related to “feature1”. The tests in the second
class, FeatureOneAndTwoTests, are categorized as being related to
both “feature1” and “feature2”, representing a set of integration
level tests verifying the interface between the two features. The
UncategorizedTests class is not included in any category. Now that
the test classes are defined, I will show how to use proctorbatch
to work with them in a variety of ways.

Listing 4

#!/usr/bin/env python
# Categorized tests.

import unittest

class FeatureOneTests(unittest.TestCase):
    "Unit tests for feature1"

    PROCTOR_TEST_CATEGORIES = ( 'feature1',)

    def test(self):
        return

class FeatureOneAndTwoTests(unittest.TestCase):
    "Integration tests for feature1 and feature2"

    PROCTOR_TEST_CATEGORIES = ( 'feature1', 'feature2', )

    def test1(self):
        return

    def test2(self):
        return

class UncategorizedTests(unittest.TestCase):
    "Not in any category"

    def test(self):
        return

if __name__ == '__main__':
    unittest.main()

Proctor provides several command line options that are useful for
examining a test suite without actually running the tests.

Proctor provides several command line options that are useful for
examining a test suite without actually running the tests. To print a
list of the categories for all tests in the module, use the
–list-categories option:

$ proctorbatch -q --list-categories Listing4.py
All
Unspecified
feature1
feature2

The output is an alphabetical listing of all of the test category
names for all of the tests found in the input files. Proctor creates
two categories automatically every time it is run. The category named
“All” contains every test discovered. The “Unspecified” category
includes any test that does not have a specific category, making it
easy to find uncategorized tests when a test set starts to become
unwieldy or more complex. When a test class does not have any
categories defined, its tests are run when no –category option is
specified on the command line to proctorbatch, or when the “All”
category is used (the default).

To examine the test set to see which tests are present, use the
–list option instead:

$ proctorbatch -q --list Listing4.py
test: Listing4.FeatureOneAndTwoTests.test1
test: Listing4.FeatureOneAndTwoTests.test2
test: Listing4.FeatureOneTests.test
test: Listing4.UncategorizedTests.test

And to see the tests only for a specific category, use the
–category and –list options together:

$ proctorbatch -q --list --category feature2 Listing4.py
test: Listing4.FeatureOneAndTwoTests.test1
test: Listing4.FeatureOneAndTwoTests.test2

To see the list of uncategorized tests, use the category
“Unspecified”:

$ proctorbatch -q --list --category Unspecified Listing4.py
test: Listing4.UncategorizedTests.test

After verifying that a category includes the right tests, to run the
tests in the category, use the –category option without the
–list option:

$ proctorbatch --category feature2 Listing4.py
Writing coverage output to .coverage
Scanning: .
test1 (test: Listing4.FeatureOneAndTwoTests) ... ok
test2 (test: Listing4.FeatureOneAndTwoTests) ... ok

---------------------------------------------------
Ran 2 tests in 0.002s

OK

Identifying Test Categories

While test category names can hold any meaning you want to give them,
over time I have found that using broad categories is more desirable
than using narrowly defined categories. When a category is too
narrowly focused, the tests are more likely to be in the same module
or directory anyway. In that case, there is not as much purpose to be
served by defining the category, since it is easy enough to just run
the tests in that file or directory.

When using a broad category, it is more likely that the tests involved
will span multiple directories. At that point, having a single
category to encompass them becomes a useful way to consolidate the
tests. Suppose, for example, there is an application that
authenticates a user before allowing an action. It has a User
class to manage users and verify their credentials. It also has a
command line interface that depends on the User class to perform
authentication. There are unit tests for methods of the User
class, and integration tests to ensure that authentication works
properly in the command line program. Since the command line program
is unlikely to be in the same section of the source tree as the
low-level module containing the User class, it would be beneficial
to define a test category for “authentication” tests so all of the
related tests can be run together.

These sorts of broad categories are also useful when a feature
involves many aspects of a system at the same level. For example, when
the user edits data through a web application, the user
authentication, session management, cookie handling, and database
aspects might all be involved at different points. A “login” category
could be applied to unit tests from each aspect, so the tests can be
run individually or as a group. Adding categories makes it immensely
easier to run the right tests to identify regression errors when
changes could affect multiple areas of a large application.

Monitoring Test Progress By Hand

Proctor accepts several command line options to control the format of
the output of your test run, depending on your preference or need.
The default output format uses the same style as the unittest test
runner. The verbosity level is set to 1 by default, so the full
names of all tests are printed along with the test outcome. To see
only the pass/fail status for the tests, reduce the verbosity level by
using the -q option. See Listing 5 for an example of the default
output.

Listing 5

$ proctorbatch  Listing2.py
Writing coverage output to .coverage
Scanning: .
test1 (test: Listing2.FailingTests) ... FAIL
test2 (test: Listing2.FailingTests) ... FAIL
test1 (test: Listing2.PassingTests) ... ok
test2 (test: Listing2.PassingTests) ... ok

======================================================================
FAIL: test1 (test: Listing2.FailingTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dhellmann/Documents/PythonMagazine/Articles/Proctor/trunk/Listing2.py", line 17, in test1
    self.fail('Always fails 1')
AssertionError: Always fails 1

======================================================================
FAIL: test2 (test: Listing2.FailingTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dhellmann/Documents/PythonMagazine/Articles/Proctor/trunk/Listing2.py", line 21, in test2
    self.fail('Always fails 2')
AssertionError: Always fails 2

----------------------------------------------------------------------
Ran 4 tests in 0.006s

FAILED (failures=2)

$ proctorbatch  -q Listing2.py
FF..
======================================================================
FAIL: test1 (test: Listing2.FailingTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dhellmann/Documents/PythonMagazine/Articles/Proctor/trunk/Listing2.py", line 17, in test1
    self.fail('Always fails 1')
AssertionError: Always fails 1

======================================================================
FAIL: test2 (test: Listing2.FailingTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dhellmann/Documents/PythonMagazine/Articles/Proctor/trunk/Listing2.py", line 21, in test2
    self.fail('Always fails 2')
AssertionError: Always fails 2

----------------------------------------------------------------------
Ran 4 tests in 0.007s

FAILED (failures=2)

When using the default format, Proctor does not print any failure or
error messages until all of the tests have run. If your test suite is
very large, or the integration tests require fixtures that take a lot
of time to configure, you may not want to wait for the tests to finish
before discovering which tests have not passed. When that is the case,
you can use the –interleaved option to show the tests results
along with the name of the test as the test runs, as illustrated in
Listing 6.

Listing 6

$ proctorbatch  --interleaved --no-gc Listing2.py
Writing coverage output to .coverage
Scanning: .
  1/  6 test: Listing2.ErrorTests.test1 ...ERROR in test: Listing2.ErrorTests.test1
Traceback (most recent call last):
  File "Listing2.py", line 27, in test1
    raise RuntimeError('test1 error')
RuntimeError: test1 error

  2/  6 test: Listing2.ErrorTests.test2 ...ERROR in test: Listing2.ErrorTests.test2
Traceback (most recent call last):
  File "Listing2.py", line 30, in test2
    raise RuntimeError('test2 error')
RuntimeError: test2 error

  3/  6 test: Listing2.FailingTests.test1 ...FAIL in test: Listing2.FailingTests.test1
Traceback (most recent call last):
  File "Listing2.py", line 17, in test1
    self.fail('Always fails 1')
AssertionError: Always fails 1

  4/  6 test: Listing2.FailingTests.test2 ...FAIL in test: Listing2.FailingTests.test2
Traceback (most recent call last):
  File "Listing2.py", line 21, in test2
    self.fail('Always fails 2')
AssertionError: Always fails 2

  5/  6 test: Listing2.PassingTests.test1 ...ok
  6/  6 test: Listing2.PassingTests.test2 ...ok

Ran 6 tests in 0.013s

FAILED (failures=2, errors=2)

Automatic Test Output Processing

For especially large test runs, or if you are committed to more
complete test automation, you may not want to examine the test results
by hand at all. Proctor can also produce a simple parsable output
format suitable for automatic processing. The output format can be
processed by another program to summarize the results or even open
tickets in your defect tracking system. To have Proctor report the
test results in this format, pass the –parsable option to
proctorbatch on the command line. Listing 7 includes a sample of
the parsable output format.

Listing 7

$ proctorbatch  --parsable --no-gc Listing2.py
Writing coverage output to .coverage
Scanning: .
__PROCTOR__ Start run
__PROCTOR__ Start test
test: Listing2.ErrorTests.test1
Traceback (most recent call last):
  File "Listing2.py", line 27, in test1
    raise RuntimeError('test1 error')
RuntimeError: test1 error

__PROCTOR__ Start results
ERROR in test: Listing2.ErrorTests.test1
__PROCTOR__ End results
__PROCTOR__ End test
__PROCTOR__ Start progress
  1/  6
__PROCTOR__ End progress
__PROCTOR__ Start test
test: Listing2.ErrorTests.test2
Traceback (most recent call last):
  File "Listing2.py", line 30, in test2
    raise RuntimeError('test2 error')
RuntimeError: test2 error

__PROCTOR__ Start results
ERROR in test: Listing2.ErrorTests.test2
__PROCTOR__ End results
__PROCTOR__ End test
__PROCTOR__ Start progress
  2/  6
__PROCTOR__ End progress
__PROCTOR__ Start test
test: Listing2.FailingTests.test1
Traceback (most recent call last):
  File "Listing2.py", line 17, in test1
    self.fail('Always fails 1')
AssertionError: Always fails 1
__PROCTOR__ Start results
FAIL in test: Listing2.FailingTests.test1
__PROCTOR__ End results
__PROCTOR__ End test
__PROCTOR__ Start progress
  3/  6
__PROCTOR__ End progress
__PROCTOR__ Start test
test: Listing2.FailingTests.test2
Traceback (most recent call last):
  File "Listing2.py", line 21, in test2
    self.fail('Always fails 2')
AssertionError: Always fails 2
__PROCTOR__ Start results
FAIL in test: Listing2.FailingTests.test2
__PROCTOR__ End results
__PROCTOR__ End test
__PROCTOR__ Start progress
  4/  6
__PROCTOR__ End progress
__PROCTOR__ Start test
test: Listing2.PassingTests.test1
__PROCTOR__ Start results
ok
__PROCTOR__ End results
__PROCTOR__ End test
__PROCTOR__ Start progress
  5/  6
__PROCTOR__ End progress
__PROCTOR__ Start test
test: Listing2.PassingTests.test2
__PROCTOR__ Start results
ok
__PROCTOR__ End results
__PROCTOR__ End test
__PROCTOR__ Start progress
  6/  6
__PROCTOR__ End progress
__PROCTOR__ End run
__PROCTOR__ Start summary
Failures: 2
Errors: 2
Successes: 2
Tests: 6
Elapsed time (sec): 0.014
Status: FAILED
__PROCTOR__ End summary

Since the test results may be part of a larger log file that includes
other information such as build output and installation messages,
Proctor uses easily identifiable delimiters to separate the sections
in its output. Each delimiter appears on a line by itself, and begins
with __PROCTOR__ to make it less likely that the output of any
other program will be misinterpreted as test output.

Proctor assumes there is no need to automatically process the output
of the scanning phase, so the first delimiter (__PROCTOR__ Start
run
) is printed at the beginning of the test execution phase. The
string __PROCTOR__ Start test appears at the beginning of each
test, followed on the next line by the name of the test. Any output
produced by the test appears beginning on the line immediately
following the name. The test output is followed by a traceback, if the
test does not pass.

The text between the __PROCTOR__ Start results and __PROCTOR__
End results
delimiters always begins with one of ok, ERROR,
or FAIL, depending on the outcome of the test. If the test did not
pass, the rest of the text in the results section consists of the full
name of the test. The string __PROCTOR__ End test follows each
test result. Between the results for each test, a progress section
shows the current test number and the total number of tests being run.

Proctor comes with proctorfilter, a simple command line program to
process a log file and print the names of tests with certain status
codes. It accepts three command line options, –ok, –error,
and –fail, to control which tests are listed in the output. For
example, to find the tests which failed in the sample output, run:

$ proctorfilter --fail Listing7.txt
test: Listing2.FailingTests.test1: FAIL
test: Listing2.FailingTests.test2: FAIL

The default behavior for proctorfilter, when no command line
options are given, is to print a list of tests that either had an
error or failed.

Building Your Own Results Parser

Using proctorfilter to summarize a set of test results is only one
way to automate results processing for your tests. Another way to
handle the test results is to open a new ticket in a bug tracking
system for each test that does not pass during the nightly test
run. When the ticket is opened, it should include all of the
information available, including output from the test and the
traceback from the failure or error. Although proctorfilter does
not include all of that information, the Proctor library also includes
a result module, with classes useful for building your own test
result processing program.

Listing 8 shows a sample program that recreates the default
functionality of proctorfilter using the proctorlib.result
module. The ResultFactory class parses input text passed to
feed() and creates TestResult instances. Each time a complete
test result has been fed in, a new TestResult is constructed and
passed as an argument to the callback given to the ResultFactory
constructor. In the sample program, the callback function
show_test_result() looks at the status code for the test before
deciding whether to print out the summary.

Listing 8

#!/usr/bin/env python
# Print a list of tests which did not pass.

import fileinput
from proctorlib.result import ResultFactory, TestResult

def show_test_result(test_result):
    "Called for each test result parsed from the input data."
    if not test_result.passed():
        print test_result
    return

# Set up the parser
parser = ResultFactory(show_test_result)

# Process data from stdin or files named via sys.argv
for line in fileinput.input():
    parser.feed(line)

A TestResult instance has several attributes of interest. The
name attribute uniquely identifies the test. The name includes the
full import path for the module, as well as the class and method name
of the test. The output attribute includes all of the text
appearing between the __PROCTOR__ Start test and __PROCTOR__
Start results
delimiters, including the traceback, if any. The
result attribute includes the full text from between __PROCTOR__
Start results
and __PROCTOR__ End results, while status
contains only the status code. The status will be the same as one of
TestResult.OK, TestResult.ERROR, or TestResult.FAIL. The
passed() method returns True if the test status is
TestResult.OK and False otherwise.

Code Coverage

At the same time it is running the automated tests, Proctor uses Ned
Batchelder’s coverage module to collect information about which
statements in the source files are actually executed. The code
coverage statistics gathered by coverage can be used to identify areas
of the code that need to have more automated tests written.

By default, proctorbatch writes the code coverage statistics to
the file ./.coverage. Use the –coverage-file option to change
the filename used. To disable coverage statistics entirely, use the
–no-coverage option.

Statistics are normally collected for every line of the source being
run. Some lines should not be included in the statistics, though, if
the code includes debugging sections that are disabled while the tests
are running. In that case, use the –coverage-exclude option to
specify regular expressions to be compared against the source code.
If the source matches the pattern, the line is not included in the
statistics counts. To disable checking for lines that match the
pattern if DEBUG:, for example, add –coverage-exclude=”if
DEBUG:”
to the command line. The –coverage-exclude option can
be repeated for each pattern to be ignored.

Once the test run is complete, use coverage.py to produce a report
with information about the portions of the code that were not executed
and the percentage that was. For example, in the following listing the
return statements in the test methods of the FailingTests
class from Listing 2 are never executed. They were skipped because
both of the tests fail before reaching the end of the function.

$ coverage.py -r -m Listing2.py
Name       Stmts   Exec  Cover   Missing
----------------------------------------
Listing2      18     16    88%   18, 22

Refer to the documentation provided by coverage.py –help for more
information on how to print code coverage reports.

Garbage Collection

Proctor can also be used to help identify the source of memory
leaks. When using the interleaved or parsable output formats, Proctor
uses the gc module functions for garbage collection to report on
objects that have not been cleaned up.

Listing 9 defines a test that introduces a circular reference between
two lists, a and b, by appending each to the other. Normally,
when processing leaves a function’s scope, the local variables are
marked so they can be deleted and their memory reclaimed. In this
case, however, since both lists are still referenced from an object
that has not been deleted, the lists are not automatically cleaned up
when the test function returns. The gc standard library module
includes an interface to discover uncollected garbage objects like
these lists, and Proctor includes a garbage collection report in the
output for each test, as in Listing 10. The garbage collection
information can be used to determine which test was being run when the
memory leaked, and then to narrow down the source of the leak.

Listing 9

#!/usr/bin/env python
# Test code with circular reference to illustrate garbage collection

import unittest

class CircularReferenceTest(unittest.TestCase):

    def test(self):
        a = []
        b = []
        b.append(a)
        a.append(b)
        return

Listing 10

$  proctorbatch --interleaved Listing9.py
Writing coverage output to .coverage
Scanning: .
  0/  1 test: Listing9.CircularReferenceTest.test ...ok
GC: Collecting...
GC: Garbage objects:
<type 'list'>
  [[[...]]]
<type 'list'>
  [[[...]]]

Ran 1 tests in 0.180s

OK

Conclusion

Automated testing is perhaps one of the biggest productivity
enhancements to come out of the Agile development movement. Even if
you are not doing Test Driven Development, using automated testing to
identify regression errors can provide great peace of mind. The basic
tools provided in the Python standard library do support automated
testing, but they tend to be targeted at library or module developers
rather than large scale projects. I hope this introduction to Proctor
has suggested a few new ideas for expanding your own use of automated
tests, and for managing those tests as your project size and scope
grows.

I would like to offer a special thanks to Ned Batchelder for his help
with integrating coverage.py and Proctor, and Mrs. PyMOTW for her help
editing this article.

Testing Tools for Python

Test Driven Development and Test Automation are all the rage, and
Python developers have no shortage of tools for testing their own
applications.

This month I am starting a series of columns on tools for developing
in Python. I intend to cover areas such as version control, utility
modules, build, and packaging tools. Eventually I may even work up
the courage to address the issue of editors and IDEs while, I hope,
avoiding a religious war. But let’s start out by looking at tools for
testing your libraries and applications. This isn’t a comprehensive
survey, but it should give you some idea of the kinds of tools
available.

Testing Frameworks

The standard library includes two separate testing frameworks for
writing automated tests: doctest and unittest. Each framework has
its own style that works better in certain circumstances. Each also
has proponents arguing in its favor. The discussion is not quite up
to the level of the vi vs. emacs argument, but it’s getting
there in some circles.

When run, doctest looks at the documentation strings for your modules,
classes, and methods and uses interpreter prompts (>>>) to
identify test cases. Almost anything you can do with the interactive
prompt from the interpreter can be saved as a test, making it easy to
create new tests as you experiment with and document your code. Since
doctest scans your documentation strings for tests, a nice benefit of
using it is your documentation is tested along with your code. This
is especially handy for library developers who need to provide
examples of using their library modules.

For other situations, where you need more formalized tests or where
extra test fixtures are needed, the unittest module is a better
choice. The standard library version of unittest started out as part
of the PyUnit project.
The name PyUnit is an homage to its origin in the XUnit API, originally created by Kent Beck for use in
Smalltalk and now available for many other languages.

Tests are implemented using methods of classes derived from
unittest.TestCase. TestCase supports methods to configure
fixtures, run the tests, and then clean up after the test runs. The
separate methods for configuring and cleaning up the fixtures are
useful for extending your automated tests beyond the unit level, when
you might have several tests that depend on the same basic
configuration (such as having a few different objects which need to be
connected to one another, temporary files to create and clean up, or
even a database transaction to manage). Separating the fixture
management from the test clarifies the distinction between test code
and setup code and makes it easier to ensure that the fixtures are
cleaned up after a test failure.

Testing the Python Standard Library

Both test frameworks provide a function to serve as a main program to
make it easy to run the tests in a module. Running your tests one
module at a time does not scale up well when you start trying to run
all of the tests for a project as it grows to encompass several
packages. The Python standard library is a prime example of this.

All of the automated tests for the Python standard library are part of
the source distribution and can be found in the test module. Each standard
library module has at least one associated test file named after the
original module and prefixed with test_. The actual test code is
a mixture of doctest and unittest tests, depending on the module and
the nature of the tests. Some tests are easier to express using
doctest, but unittest is the preferred framework for new tests, and
most of the doctest tests are being converted. In fact, quite a few
are being converted as part of the Google Highly Open Participation
(GHOP) contest, which I discussed last month.

The test package also includes a tool for running the tests, called
regrtest. Since the standard library includes some
platform-specific packages and several other modules for which tests
might require special resources (audio hardware devices, for example),
the test.regrtest framework needed to take those requirements into
account. The solution was to extend the unittest framework to allow
tests with special requirements to be enabled or disabled depending on
the resources actually available at the time the tests are run. In
order to avoid false negatives, tests requiring special resources are
disabled by default, and must be enabled explicitly.

To run the standard library tests, just run regrtest.py from the
command line, inside the test directory. You can specify a single
module to test as an argument, or run all of the tests by omitting the
argument.

$ pwd
/python-trunk/Lib/test
$ python regrtest.py test_re.py
test_re
1 test OK.

The output reports that a single test was run, but that is a little
misleading. In fact it indicates that one module was run. Using
the -v option with regrtest.py will show the individual unit
tests that have been run, and their status.

To run tests with special resources enabled, use the -u option to
regrtest.py. For a complete list of the resource types, see the
help output (-h). This example illustrates running the curses
module tests, with curses disabled (the default):

$ python regrtest.py  test_curses.py
test_curses
test_curses skipped -- Use of the `curses'
resource not enabled
1 test skipped:
    test_curses
Those skips are all expected on darwin.

And now with curses enabled:

$ python regrtest.py -ucurses test_curses.py
[screen paints black and cursor moves around]
1 test OK.
$

Running Your Tests

The test.regrtest runner is not the only solution available for
managing large test suites. There are several alternative test
runners, including py.test, nose, and
Proctor.

The first, py.test, works by scanning your source files for functions
or methods that start with test_ and running any that it finds.
It depends on your tests using assert, and so does not require a
special test class hierarchy or file layout. It has support for
fixtures at granularity ranging all the way from module to class, and
even the method level. It does not directly support the unittest
framework, and has only experimental support for doctest tests. One
unique feature is the ability for a test function to act as a
generator to produce additional tests. The example given on the
py.test web site illustrates this nicely:

def test_generative():
    for x in (42,17,49):
        yield check, x

def check(arg):
    assert arg % 7 == 0   # second generated tests fails!

Using a generator like this can save repetitive code in your tests.

Nose is billed as a “discovery-based unittest extension”. It scans
your test files to identify test cases to run automatically by
detecting classes derived from unittest.TestCase or matching a
name pattern. This lets you avoid writing explicit test suites by
hand. Configuration is handled through command line flags or a config
file, which makes it easy to set verbosity and other options
consistently for all of your projects. Nose can be extended through a
plugin API, and there are several nose-related packages available for
producing output in different formats, filtering or capturing output,
and otherwise adapting it to work the way you do.

Proctor is another tool that works by scanning your source files.
(Disclosure: I wrote Proctor.) It works closely with the unittest
framework, and has features for managing test suites spanning multiple
directories using tagging. I won’t go into Proctor too deeply here,
since I have an entire article dedicated to it
planned for the near future. I will say that it is less extensible
than nose because it lacks a plugin system, but it has been used in
production for several years now. It specializes in producing output
that is machine parsable, to make it easy to produce reports from test
logs with many tests (several thousand at a time). Although the
feature sets of nose and Proctor overlap, neither implements a clear
super-set of the other.

IDEs and GUIs

In addition to the command line test runners described here, any
number of editors and IDEs support running tests. There are too many
to name, but all have some form of quick access to the tests, usually
bound to a single keystroke. Feedback ranges from text output, to
HTML, to a progress bar style report.

I’ve recently started experimenting with TextMate as a development
editor, and I have to say I like the “PyMate” feature for running
scripts. The output is parsed and presented in an HTML pane, with
tracebacks linking back to the original source line in the editor.
This makes it easy to jump from tracebacks in the test output directly
to your source, making the edit-test-repeat cycle shorter. Other
editors with support for running test suites include Eclipse, WingIDE,
Eric, and Komodo – I’m sure that list could be a lot longer. Each
has its own particular take on finding, running, and presenting the
results of your tests. This subject deserves more in-depth discussion
than I can provide in a single month’s space, so look for more details
in a future column.

Domain-Specific Testing

The basic frameworks offer a strong foundation for testing, but at
times you will benefit from extensions for testing in a particular
problem domain. For example, both webunit and webtest add
web browser simulation features to the existing unittest framework.
Taking a different approach, twill
implements a domain-specific language for “navigating Web pages,
posting forms and asserting conditions”. It can be used from within a
Python module, so it can be embedded in your tests as well. Some of
the modern web application frameworks, such as Django and Zope,
include built-in support for testing of your code with an environment
configured by the framework.

…Django and Zope include built-in support for testing of your
code with an environment configured by the framework.

Although web development is hot, not every application is necessarily
web-enabled. Testing GUI applications has traditionally required
special consideration. In the old days, support for driving GUI
toolkits programmatically was spotty and tests had to depend on image
captures to look for regression errors. Times are changing, though,
and modern toolkits with accessibility APIs support tools like
dogtail and ldtp with test frameworks for driving
GUIs without such limitations.

Code Coverage

Once you establish the habit of writing tests, the next thing you’ll
want to know about is whether you are testing all of your
application. To answer that question, you will need a code coverage
analysis library, and Python has two primary options: Ned Batchelder’s
excellent coverage.py and Titus
Brown’s newer figleaf. They take
different approaches to detecting potentially executable lines in your
source, resulting in different “lines of code” counts. Otherwise they
produce similar statistics about the execution of your program. You
can run the coverage tools yourself, or use them integrated with one
of the test runners described above. Both tools collect statistics as
your program (or test suite) runs and enable you to prepare reports
showing the percent coverage for modules, as well as a list of lines
never run. Both also support creating annotated versions of the
source, highlighting lines which are not run. With a report of source
lines never run, it is easy to identify dead branches in conditional
statements containing code that can be removed or areas that need more
unit tests.

Source Checkers

One class of testing tools not necessarily used in Python development
is a source all that often is a source code checker or linter. Many
of you will recognize lint as the anal-retentive C programmer’s
friend. Python has several similar offerings for examining your
source and automatically detecting error patterns such as creating
variables which are never used, over-writing builtin symbol names
(like using type as a variable), and other such common issues that
are easy to introduce and difficult to ferret out. Pylint is an actively maintained
extension to the original PyChecker. In addition to looking for
coding errors, Pylint will check your code against a style guide,
verify that your interfaces are actually implemented, and a whole
series of other common problems. Think of it as an automatic
code-review.

Conclusion

I only had space this month for a quick overview of the many testing
tools available for use with Python. I encourage you to check out
pycheesecake.org’s excellent taxonomy page for more a complete list
(see Related Links). Most of the tools described here are just an
easy_install away, so download a few and start working on your own
code quality today.

As always, if there is something you would like for me to cover in
this column, send a note with the details to doug dot hellmann at
pythonmagazine dot com and let me know, or add the link to your
del.icio.us account with the tag pymagdifferent. I’m particularly
interested in hearing from you about the tools you consider essential
for developing with Python. Anything from your favorite editor or IDE
to that obscure module you find yourself using over and over –
anything you just can’t seem to live without. I’ll be sharing links
and tips for development tools over the course of the next several
issues as I continue this series.

Originally published in Python Magazine Volume 2 Issue 1 , January, 2008

Racemi press on ZDNet

Dan Kusnetzky from ZDNet has posted this morning about Racemi
and our product, DynaCenter.

DynaCenter repurposes servers on-the-fly from the iron up, making it
easy to turn your disaster recovery assets into extra computing
resources. When the production data center goes offline, the DR site can
be brought online in little more than the amount of time it takes to
reboot the servers. Similarly, in a test/development lab setup you can
use DynaCenter to test an application under several operating systems on
the same hardware, without manually swapping drives or re-installing
anything. Point, click, reboot.

Kusnetzky is spot-on when he points out that our software does more
than we market it as doing, though. Since we control the server’s power
and boot process, DynaCenter can also be used to manage a utility or
grid computing environment and conserve power in a regular data center
by powering down idle servers until load on an application rises to a
point that they are actually needed.

In his conversation with our team, one little detail was not covered:
We’re writing the whole thing in Python.

[Updated: Python didn’t come up in their conversation, so it’s no
surprise it wasn’t mentioned in Dan’s post.]

PyMOTW: unittest

Python’s unittest module, sometimes referred to as PyUnit, is
based on the XUnit framework design by Kent Beck and Erich Gamma.
The same pattern is repeated in many other languages, including C, perl,
Java, and Smalltalk. The framework implemented by unittest supports
fixtures, test suites, and a test runner to enable automated testing for
your code.

Read more at pymotw.com: unittest

Unexpectedly broken, and fixed: svnbackup

Yesterday Pierre Lemay sent me one of the clearest bug reports I’ve
seen in quite a while, and a patch to fix the problem. He was having
trouble with svnbackup duplicating changesets in the dump files. It
turns out every changeset that appeared on a “boundary” (at the end of
one dump file and the beginning of the next) was included in both dumps.
Oops.

When I tested the script, I was able to recover the repository without
any trouble. I didn’t check 2 cases that Pierre encountered. First,
the changeset revision numbers did not stay consistent. When the
changeset was duplicated, that threw off all of the subsequent
changeset ids by 1. For each duplicate. That in itself is only
annoying. The more troubling problem is when a duplicate changeset
includes a delete operation. The second delete would fail while
restoring, which prevented Pierre from importing the rest of the
backup.

In his email describing all of that, he gave me great details about
how he had tested, the specific scenario that caused the problem, and
then provided the fix!

So, if you are using version 1.0, go on over and download version
1.1 with Pierre’s fixes.

PyMOTW: StringIO and cStringIO

The StringIO class provides a convenient means of working with text
in-memory using the file API (read, write. etc.). There are 2 separate
implementations. The cStringIO module is written in C for speed, while
the StringIO module is written in Python for portability. Using
cStringIO to build large strings can offer performance savings over some
other string conctatenation techniques.

Read more at pymotw.com: StringIO

Proctor 1.0

I’ve moved Proctor development from sourceforge to my own server
and released version 1.0.

We have been using proctor successfully for several years now at work,
and it makes automating our nightly tests very easy. The build is
automatic, the software is installed automatically, and then proctor
runs the test suite. All 3000+ tests take several hours to run, mostly
because they aren’t all strictly “unit” tests.