Static Code Analizers for PythonK

Old-school developers remember lint, the static code analysis tool for C programs. There are several similar programs available for Python, and they can all help you clean up your act.

This month we continue examining Python development tools you have told me you can’t live without. A fair number of you have mentioned that you use a static analysis tool such as PyChecker, pylint, or PyFlakes. I have to admit, I was a bit skeptical of how useful any of them would be with Python. In a past life, when I used to write a lot of C, I used lint occasionally. Unfortunately, it offered so many false positives, especially when the X11 or Motif headers were included, that the output frequently was useless. Eventually the Gnu compiler became sophisticated (and prevalent) enough that I stopped using lint altogether. But after looking into the code analysis tools available for Python, I’m reconsidering that position.

The Test ProgramK

A static analysis tool reads your source code without executing it and looks for common mistakes. In C programs the types of things lint found were usually bad pointer casts or array references. Since Python is a dynamic language, there are different sorts of problems to watch for. Common examples are redefining functions or methods, overriding builtin names, and importing modules without using them. Some of the tools even test your code against style guides (such as those defined in the official Python style guide, PEP 8). These are the sorts of common problems that are difficult to find unless you have a very comprehensive test suite.

In order to compare the three tools I’ll be discussing this month, I needed to write some sample code with known mistakes in it. I’m sure I could have used some of my existing code, but I wanted to see how the tools responded to pre-arranged situations. Listing 1 shows the carefully crafted bad code I’ll be using for all of the tests. Take a minute now to see how many of the problems you can spot yourself, then compare your results with what the tools found.

Listing 1K

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/usr/bin/env python
# encoding: utf-8
"""
"""

import string

module_variable = 0

def functionName(self, int):
    local = 5 + 5
    module_variable = 5*5
    return module_variable

class my_class(object):
    
    def __init__(self, arg1, string):
        self.value = True
        return

    def method1(self, str):
        self.s = str
        return self.value

    def method2(self):
        return
        print 'How did we get here?'
    
    def method1(self):
        return self.value + 1
    method2 = method1
    
class my_subclass(my_class):
    
    def __init__(self, arg1, string):
        self.value = arg1
        return

PyCheckerK

One of the oldest lint tools for Python is PyChecker, by Eric Newton, John Shue, and Neal Norwitz. The last official release of PyChecker was in 2006, but it looks like more recent work has been done in the CVS repository (I tested version 0.8.17 for this article). Although the project is only “lightly maintained”, many readers reported using it, and the authors intend to provide a release with better Python 2.5 support soon.

I downloaded the tar.gz file from sourceforge.net manually, and I was able to install the program by unpacking the tarball and running python setup.py install (in a fresh virtualenv environment, of course). Once I had it installed, I ran it with the default settings to produce this output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
$ pychecker Listing1.py
Processing Listing1...

Warnings...

Listing1.py:6: Imported module (string) not used
Listing1.py:10: Parameter (int) not used
Listing1.py:10: self is argument in function
Listing1.py:11: Local variable (local) not used
Listing1.py:12: Local variable (module_variable)
shadows global defined on line 8
Listing1.py:13: Local variable (module_variable)
shadows global defined on line 8
Listing1.py:17: Parameter (arg1) not used
Listing1.py:17: Parameter (string) not used
Listing1.py:29: Redefining attribute (method1)
original line (21)
Listing1.py:35: Parameter (string) not used

As you see, it found quite a few of the creative problems I inserted into the test code. It did not warn me, however, that I was overriding the builtin int() with the argument to my function on line 10, or the imported module string with the argument to __init__() on line 17. It caught one example of redefining a method but not the second.

The help text for the program (accessed with the usual -h option) indicates that there are quite a few checks not enabled by default. Some of these include tests for unused class member variables, unreachable code, and missing docstrings. Adding the --var option, for example, exposes the unused module-level variable on line eight.

Specifying options on the command line can be a bit cumbersome, however, so PyChecker supports three other ways to specify preferences. First, you can include a __pychecker__ string in your code to enable or disable the options you want to use. The second way to pass options to PyChecker is by using the PYCHECKER environment variable using the same syntax as __pychecker__.

The third means of controlling the tests performed uses a configuration file for site or project-wide parameters. By default the file is $HOME/.pycheckerrc, and there is a command line option to specify a separate file (if, for example, you want to include the file in your version control system with your source code). The .pycheckerrc config file uses Python syntax to set the options, but the names may be different from the names used on the command line (allVariablesUsed instead of var, for this example). The --rcfile option prints out a complete set of the options given in a format easy to capture and save as your configuration file.

pylintK

The second program I looked at for this column is pylint, from a team of developers organized through Logilab. The documentation for pylint refers directly to PyChecker as a predecessor, but it claims to also test code against a style guide or coding standard. pylint also supports a plugin system for adding your own custom checks.

Version 0.14.0 of pylint depends on a few other libraries from Logilab. All the links you need are available on the README page for pylint. I tried installing the packages with easy_install, but the results didn’t work, so I resorted to downloading the tarballs and installing them manually. That did the trick, and I was able to produce a nice report about my test code, the beginning of which appears in Listing 2.

Listing 2K

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
No config file found, using default configuration
************* Module Listing1
C:  1: Empty docstring
W:  6: Uses of a deprecated module 'string'
C:  8: Invalid name "module_variable" (should match (([A-Z_][A-Z1-9_]*)|(__.*__))$)
C: 10:functionName: Invalid name "functionName" (should match [a-z_][a-z0-9_]{2,30}$)
C: 10:functionName: Missing docstring
W: 10:functionName: Redefining built-in 'int'
W: 12:functionName: Redefining name 'module_variable' from outer scope (line 8)
W: 10:functionName: Unused argument 'int'
W: 10:functionName: Unused argument 'self'
W: 11:functionName: Unused variable 'local'
C: 15:my_class: Invalid name "my_class" (should match [A-Z_][a-zA-Z0-9]+$)
C: 15:my_class: Missing docstring
C: 22:my_class.method1: Invalid name "s" (should match [a-z_][a-z0-9_]{2,30}$)
W: 17:my_class.__init__: Redefining name 'string' from outer scope (line 6)
W: 17:my_class.__init__: Unused argument 'arg1'
W: 17:my_class.__init__: Unused argument 'string'
C: 21:my_class.method1: Missing docstring
W: 21:my_class.method1: Redefining built-in 'str'
C: 25:my_class.method2: Missing docstring
W: 27:my_class.method2: Unreachable code
R: 25:my_class.method2: Method could be a function
C: 29:my_class.method1: Missing docstring
E: 29:my_class.method1: method already defined line 21
W: 22:my_class.method1: Attribute 's' defined outside __init__
C: 33:my_subclass: Invalid name "my_subclass" (should match [A-Z_][a-zA-Z0-9]+$)
C: 33:my_subclass: Missing docstring
W: 35:my_subclass.__init__: Redefining name 'string' from outer scope (line 6)
W: 35:my_subclass.__init__: __init__ method from base class 'my_class' is not called
W: 35:my_subclass.__init__: Unused argument 'string'
W:  6: Unused import string

The first thing I noticed was the size and scope of the output report produced. The full report was over 150 lines and included several ASCII tables with statistics about the results (see Listing 3 for one example). Data for the previous and current runs are available along with the difference, making it easy to track your progress as you clean up your code. pylint identified almost all of the same problems PyChecker did, and many it did not. The one warning I see that PyChecker gave me that pylint did not is that my module-level function uses an argument self but is not a method.

The first thing I noticed was the size and scope of the output report produced.

Listing 3K

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
Messages by category
--------------------

+-----------+-------+---------+-----------+
|type       |number |previous |difference |
+===========+=======+=========+===========+
|convention |12     |NC       |NC         |
+-----------+-------+---------+-----------+
|refactor   |1      |NC       |NC         |
+-----------+-------+---------+-----------+
|warning    |16     |NC       |NC         |
+-----------+-------+---------+-----------+
|error      |1      |NC       |NC         |
+-----------+-------+---------+-----------+

An especially nice feature of pylint is that each check has an assigned identifier, so it is easy to enable or disable that particular warning in a consistent manner – no guessing about the name to use in the config file. Simply specify --enable-msg or --disable-msg and the message id. Given the sheer number of tests performed by pylint, I can see that this consistency is going to be key in setting it up in a meaningful way on any real project.

Since I approached this review with a jump-right-in attitude, all of these impressions were formed before I had spent any time reading the documentation delivered with the program, so that was my next step. The docs provided cover a complete list of the different types of warnings produced, how to enable/disable them and set other options, interpret the report output, and all the other sorts of information you need to really integrate the tool into your daily routine. There are some holes, as you would expect from a work-in-progress, but the basic information is there in more detail than for either of the other projects.

In addition to enabling or disabling individual messages, there are a wide range of command line options available for fine-grained control over expectations for the tests performed. These range from regular expressions to enforce naming conventions to various settings to watch for “complexity” issues within classes and functions. It will be interesting to see how those settings work out against some of my older code.

As with PyChecker, pylint also supports setting options within your source code and with a config file. Perhaps assuming a shared development environment, it looks first in /etc/pylintrc and then in $HOME/.pylintrc for settings. This lets a team install a global configuration file on a development server, so everyone sees the same options. To print the settings being used, use the --generate-rcfile option. The output includes comments for each option, so saving it to a file makes it easy for you start customizing it to create your own specialized config file.

PyFlakesK

The last program I examined was PyFlakes from divmod.org. The installation process for PyFlakes was the easiest of the three. After a quick “easy_install PyFlakes”, I was up and running (yay!). The experience after that was a bit of a letdown, though:

$ pyflakes Listing1.py
Listing1.py:6: 'string' imported but unused

It found almost none of the errors I was hoping it would identify. The PyFlakes web site says there are two categories of errors reported:

  • Names which are used but not defined or used before they are defined
  • Names which are redefined without having been used

In the case of this sample file, there are several unused names that weren’t reported.

PyFlakes is much simpler than either pylint or PyChecker. There don’t seem to be any command line options for controlling the tests that are run. Running it with arguments that don’t refer to valid filenames results in a traceback and error message.

The one feature of PyFlakes mentioned by the users who recommended it is its speed. My test code is obviously too small to make any real performance tests, but I have heard from several readers who use it in conjunction with an IDE like PyDev to look for errors in the background while they edit.

ConclusionsK

Both pylint and PyFlakes analyze your source but do not actually import it. Everything they need they derive from the parse tree. This gives them an advantage in situations where importing the code might have undesirable side effects. I used the same approach in HappyDoc for extracting documentation from Zope-related source code.

All of the tools I tested found some of the errors in the sample code, but pylint was by far the most comprehensive. The PyChecker output was more terse, and it doesn’t include the style checks that pylint has, but that omission may itself constitute a feature for some users.

All of the tools I tested found some of the errors in the sample code.

Of the three tools, only PyFlakes installed correctly with easy_install. That is annoying, but not a show-stopper for using the other tools, especially given how much more comprehensive their output is. All of the tools worked correctly when installed via setup.py, which is certainly better than having to install them entirely by hand.

For my own projects, I intend to continue looking into pylint for now. Its consistent configuration and exhaustive reporting are appealing for larger code bases such as I encounter at my day job.

Configuring these tools in your code is useful for suppressing false positives or warnings you know it is safe to ignore. Use a configuration file to enable checks you want applied to all of your code. It is probably best to use a separate configuration file for each project, since different projects will have different coding standards and styles.

Next month I will continue this series by introducing you to more tools to enhance your programming productivity. I haven’t decided on the topic yet, so if you have a tip to share, feedback on something I’ve written, or if there is anything you would like for me to cover in this column, send a note with the details to doug dot hellmann at pythonmagazine dot com and let me know. You can also add the link to your del.icio.us account with the tag pymagdifferent, and I’ll see it there.

Originally published in Python Magazine Volume 2 Issue 3 , March, 2008