Static Code Analizers for Python

Old-school developers remember lint, the static code analysis
tool for C programs. There are several similar programs available
for Python, and they can all help you clean up your act.

This month we continue examining Python development tools you have
told me you can’t live without. A fair number of you have mentioned
that you use a static analysis tool such as PyChecker, pylint, or PyFlakes. I have to admit, I
was a bit skeptical of how useful any of them would be with Python.
In a past life, when I used to write a lot of C, I used lint
occasionally. Unfortunately, it offered so many false positives,
especially when the X11 or Motif headers were included, that the
output frequently was useless. Eventually the Gnu compiler became
sophisticated (and prevalent) enough that I stopped using lint
altogether. But after looking into the code analysis tools available
for Python, I’m reconsidering that position.

The Test Program

A static analysis tool reads your source code without executing it
and looks for common mistakes. In C programs the types of things lint
found were usually bad pointer casts or array references. Since
Python is a dynamic language, there are different sorts of problems to
watch for. Common examples are redefining functions or methods,
overriding builtin names, and importing modules without using them.
Some of the tools even test your code against style guides (such as
those defined in the official Python style guide, PEP 8). These
are the sorts of common problems that are difficult to find unless you
have a very comprehensive test suite.

In order to compare the three tools I’ll be discussing this month, I
needed to write some sample code with known mistakes in it. I’m sure
I could have used some of my existing code, but I wanted to see how
the tools responded to pre-arranged situations. Listing 1 shows the
carefully crafted bad code I’ll be using for all of the tests. Take a
minute now to see how many of the problems you can spot yourself, then
compare your results with what the tools found.

Listing 1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/usr/bin/env python
# encoding: utf-8
"""
"""

import string

module_variable = 0

def functionName(self, int):
    local = 5 + 5
    module_variable = 5*5
    return module_variable

class my_class(object):
    
    def __init__(self, arg1, string):
        self.value = True
        return

    def method1(self, str):
        self.s = str
        return self.value

    def method2(self):
        return
        print 'How did we get here?'
    
    def method1(self):
        return self.value + 1
    method2 = method1
    
class my_subclass(my_class):
    
    def __init__(self, arg1, string):
        self.value = arg1
        return

PyChecker

One of the oldest lint tools for Python is PyChecker, by Eric Newton,
John Shue, and Neal Norwitz. The last official release of PyChecker
was in 2006, but it looks like more recent work has been done in the
CVS repository (I tested version 0.8.17 for this article). Although
the project is only “lightly maintained”, many readers reported using
it, and the authors intend to provide a release with better Python 2.5
support soon.

I downloaded the tar.gz file from sourceforge.net manually, and I was
able to install the program by unpacking the tarball and running
python setup.py install (in a fresh virtualenv environment, of
course). Once I had it installed, I ran it with the default settings
to produce this output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
$ pychecker Listing1.py
Processing Listing1...

Warnings...

Listing1.py:6: Imported module (string) not used
Listing1.py:10: Parameter (int) not used
Listing1.py:10: self is argument in function
Listing1.py:11: Local variable (local) not used
Listing1.py:12: Local variable (module_variable)
shadows global defined on line 8
Listing1.py:13: Local variable (module_variable)
shadows global defined on line 8
Listing1.py:17: Parameter (arg1) not used
Listing1.py:17: Parameter (string) not used
Listing1.py:29: Redefining attribute (method1)
original line (21)
Listing1.py:35: Parameter (string) not used

As you see, it found quite a few of the creative problems I inserted
into the test code. It did not warn me, however, that I was
overriding the builtin int() with the argument to my function on
line 10, or the imported module string with the argument to
__init__() on line 17. It caught one example of redefining a
method but not the second.

The help text for the program (accessed with the usual -h option)
indicates that there are quite a few checks not enabled by default.
Some of these include tests for unused class member variables,
unreachable code, and missing docstrings. Adding the –var
option, for example, exposes the unused module-level variable on line
eight.

Specifying options on the command line can be a bit cumbersome,
however, so PyChecker supports three other ways to specify
preferences. First, you can include a __pychecker__ string in
your code to enable or disable the options you want to use. The
second way to pass options to PyChecker is by using the PYCHECKER
environment variable using the same syntax as __pychecker__.

The third means of controlling the tests performed uses a
configuration file for site or project-wide parameters. By default
the file is $HOME/.pycheckerrc, and there is a command line option
to specify a separate file (if, for example, you want to include the
file in your version control system with your source code). The
.pycheckerrc config file uses Python syntax to set the options,
but the names may be different from the names used on the command line
(allVariablesUsed instead of var, for this example). The
–rcfile option prints out a complete set of the options given in
a format easy to capture and save as your configuration file.

pylint

The second program I looked at for this column is pylint, from a team
of developers organized through Logilab. The documentation for pylint
refers directly to PyChecker as a predecessor, but it claims to also
test code against a style guide or coding standard. pylint also
supports a plugin system for adding your own custom checks.

Version 0.14.0 of pylint depends on a few other libraries from
Logilab. All the links you need are available on the README page for
pylint. I tried installing the packages with easy_install, but
the results didn’t work, so I resorted to downloading the tarballs and
installing them manually. That did the trick, and I was able to
produce a nice report about my test code, the beginning of which
appears in Listing 2.

Listing 2

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
No config file found, using default configuration
************* Module Listing1
C:  1: Empty docstring
W:  6: Uses of a deprecated module 'string'
C:  8: Invalid name "module_variable" (should match (([A-Z_][A-Z1-9_]*)|(__.*__))$)
C: 10:functionName: Invalid name "functionName" (should match [a-z_][a-z0-9_]{2,30}$)
C: 10:functionName: Missing docstring
W: 10:functionName: Redefining built-in 'int'
W: 12:functionName: Redefining name 'module_variable' from outer scope (line 8)
W: 10:functionName: Unused argument 'int'
W: 10:functionName: Unused argument 'self'
W: 11:functionName: Unused variable 'local'
C: 15:my_class: Invalid name "my_class" (should match [A-Z_][a-zA-Z0-9]+$)
C: 15:my_class: Missing docstring
C: 22:my_class.method1: Invalid name "s" (should match [a-z_][a-z0-9_]{2,30}$)
W: 17:my_class.__init__: Redefining name 'string' from outer scope (line 6)
W: 17:my_class.__init__: Unused argument 'arg1'
W: 17:my_class.__init__: Unused argument 'string'
C: 21:my_class.method1: Missing docstring
W: 21:my_class.method1: Redefining built-in 'str'
C: 25:my_class.method2: Missing docstring
W: 27:my_class.method2: Unreachable code
R: 25:my_class.method2: Method could be a function
C: 29:my_class.method1: Missing docstring
E: 29:my_class.method1: method already defined line 21
W: 22:my_class.method1: Attribute 's' defined outside __init__
C: 33:my_subclass: Invalid name "my_subclass" (should match [A-Z_][a-zA-Z0-9]+$)
C: 33:my_subclass: Missing docstring
W: 35:my_subclass.__init__: Redefining name 'string' from outer scope (line 6)
W: 35:my_subclass.__init__: __init__ method from base class 'my_class' is not called
W: 35:my_subclass.__init__: Unused argument 'string'
W:  6: Unused import string

The first thing I noticed was the size and scope of the output
report produced. The full report was over 150 lines and included
several ASCII tables with statistics about the results (see Listing 3
for one example). Data for the previous and current runs are
available along with the difference, making it easy to track your
progress as you clean up your code. pylint identified almost all of
the same problems PyChecker did, and many it did not. The one warning
I see that PyChecker gave me that pylint did not is that my
module-level function uses an argument self but is not a method.

The first thing I noticed was the size and scope of the output
report produced.

Listing 3

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
Messages by category
--------------------

+-----------+-------+---------+-----------+
|type       |number |previous |difference |
+===========+=======+=========+===========+
|convention |12     |NC       |NC         |
+-----------+-------+---------+-----------+
|refactor   |1      |NC       |NC         |
+-----------+-------+---------+-----------+
|warning    |16     |NC       |NC         |
+-----------+-------+---------+-----------+
|error      |1      |NC       |NC         |
+-----------+-------+---------+-----------+

An especially nice feature of pylint is that each check has an
assigned identifier, so it is easy to enable or disable that
particular warning in a consistent manner – no guessing about the
name to use in the config file. Simply specify –enable-msg or
–disable-msg and the message id. Given the sheer number of tests
performed by pylint, I can see that this consistency is going to be
key in setting it up in a meaningful way on any real project.

Since I approached this review with a jump-right-in attitude, all of
these impressions were formed before I had spent any time reading the
documentation delivered with the program, so that was my next step.
The docs provided cover a complete list of the different types of
warnings produced, how to enable/disable them and set other options,
interpret the report output, and all the other sorts of information
you need to really integrate the tool into your daily routine. There
are some holes, as you would expect from a work-in-progress, but the
basic information is there in more detail than for either of the other
projects.

In addition to enabling or disabling individual messages, there are a
wide range of command line options available for fine-grained control
over expectations for the tests performed. These range from regular
expressions to enforce naming conventions to various settings to watch
for “complexity” issues within classes and functions. It will be
interesting to see how those settings work out against some of my
older code.

As with PyChecker, pylint also supports setting options within your
source code and with a config file. Perhaps assuming a shared
development environment, it looks first in /etc/pylintrc and then
in $HOME/.pylintrc for settings. This lets a team install a
global configuration file on a development server, so everyone sees
the same options. To print the settings being used, use the
–generate-rcfile option. The output includes comments for each
option, so saving it to a file makes it easy for you start customizing
it to create your own specialized config file.

PyFlakes

The last program I examined was PyFlakes from divmod.org. The
installation process for PyFlakes was the easiest of the three. After
a quick “easy_install PyFlakes”, I was up and running (yay!). The
experience after that was a bit of a letdown, though:

$ pyflakes Listing1.py
Listing1.py:6: 'string' imported but unused

It found almost none of the errors I was hoping it would identify.
The PyFlakes web site says there are two categories of errors
reported:

  • Names which are used but not defined or used before they are defined
  • Names which are redefined without having been used

In the case of this sample file, there are several unused names that
weren’t reported.

PyFlakes is much simpler than either pylint or PyChecker. There don’t
seem to be any command line options for controlling the tests that are
run. Running it with arguments that don’t refer to valid filenames
results in a traceback and error message.

The one feature of PyFlakes mentioned by the users who recommended it
is its speed. My test code is obviously too small to make any real
performance tests, but I have heard from several readers who use it in
conjunction with an IDE like PyDev to look for errors in the
background while they edit.

Conclusions

Both pylint and PyFlakes analyze your source but do not actually
import it. Everything they need they derive from the parse tree.
This gives them an advantage in situations where importing the code
might have undesirable side effects. I used the same approach in
HappyDoc for extracting documentation from Zope-related source code.

All of the tools I tested found some of the errors in the sample code,
but pylint was by far the most comprehensive. The PyChecker output
was more terse, and it doesn’t include the style checks that pylint
has, but that omission may itself constitute a feature for some users.

All of the tools I tested found some of the errors in the sample code.

Of the three tools, only PyFlakes installed correctly with
easy_install. That is annoying, but not a show-stopper for using the
other tools, especially given how much more comprehensive their output
is. All of the tools worked correctly when installed via setup.py,
which is certainly better than having to install them entirely by
hand.

For my own projects, I intend to continue looking into pylint for now.
Its consistent configuration and exhaustive reporting are appealing
for larger code bases such as I encounter at my day job.

Configuring these tools in your code is useful for suppressing false
positives or warnings you know it is safe to ignore. Use a
configuration file to enable checks you want applied to all of your
code. It is probably best to use a separate configuration file for
each project, since different projects will have different coding
standards and styles.

Next month I will continue this series by introducing you to more
tools to enhance your programming productivity. I haven’t decided on
the topic yet, so if you have a tip to share, feedback on something
I’ve written, or if there is anything you would like for me to cover
in this column, send a note with the details to doug dot hellmann at
pythonmagazine dot com and let me know. You can also add the link to
your del.icio.us account with the tag pymagdifferent, and I’ll see
it there.

Originally published in Python Magazine Volume 2 Issue 3 , March, 2008