PyMOTW: csv

The csv module is very useful for working with data exported from
spreadsheets and databases into text files. There is no well-defined
standard, so the csv module uses “dialects” to support parsing using
different parameters. Along with a generic reader and writer, the module
includes a dialect for working with Microsoft Excel.

Read more at pymotw.com: csv

PyMOTW: getopt

The getopt module is the old-school command line option parser which
supports the conventions established by the Unix function getopt().
It parses an argument sequence, such as sys.argv and returns a
sequence of (option, argument) pairs and a sequence of non-option
arguments.

Read more at pymotw.com: getopt

feedcache 1.0

I am happy with the API for feedcache, so I bumped the release to 1.0
and beta status. The package includes tests and 2 separate examples (one
with shelve and another using threads and shove).

CommandLineApp

Back when Python 1.5.4 was hot and new, I wrote a class to serve as a
basis for the many command line programs I was working on for myself and
my employer. This was long before the Option Parsing Wars that resulted
in the addition of optparse to the standard library. If optparse had
been around, I probably wouldn’t have written CommandLineApp, but
since all I had to work with at the time was getopt, and it operated at
such a low level, I hacked together a helper class.

The difference between CommandLineApp and optparse is that
CommandLineApp treats your application as an object, just like
everything else in the application. The application class is responsible
for option processing, although it collaborates with getopt to do the
parsing work.

To use it, you subclass CommandLineApp and define option handler
methods and a main(). To invoke the program, call run(). The option
handlers are identified by name, and used to build the list of supported
options. optionHandler_myopt() is called when –myopt is encountered.
If the method takes an argument, so does your option. The docstrings for
the callback methods are used to create the help output. Support for
short-form usage (via -h) and long-form help (via –help) are built-in
to the base class.

The old version (released as 1.0), which had not received a
substantial rewrite in many years (mostly because it still worked fine
and I had more important projects to work on) can run under Python 1.4
through 2.5. It was some of the earliest complex Python code I ever
wrote, and that is clear from the code quality (both style and
substance). The new version has been tested under Python 2.5. It feels
less hack-ish, since it uses inspect instead of scanning the class
hierarchy and method signatures directly.

The 2.0 rewrite works in essentially the same way as 1.0, but is much
more compact and (I think) the code is cleaner. I called it 2.0 because
the class API is different in a few important ways from the earlier
version. I still want to add argument validation (for non-option
arguments to the program), but that will take a little more time.

ORM envy

Jonathan LaCour presented an overview of SQLAlchemy at last night’s
PyATL meeting, and now I have ORM envy. It’s too bad I can’t afford the
effort that would be involved in replacing the in-house ORM we use at
work, but I’ll definitely consider using it for my own projects.

PyMOTW: shelve

The shelve module can be used as a simple persistent storage option
for Python objects when a relational database is overkill. The shelf is
accessed by keys, just as with a dictionary. The values are pickled and
written to a database created and managed by anydbm.

Read more at pymotw.com: shelve

New project: feedcache

Back in mid-June I promised Jeremy Jones that I would clean up some
of the code I use for CastSampler.com to cache RSS and Atom feeds so
he could look at it for his podgrabber project. I finally found some
time to work on it this weekend (not quite 2 months later, sorry
Jeremy).

The result is feedcache, which I have released in “alpha” status,
for now. I don’t usually bother releasing my code in alpha state,
because that usually means I’m not actually using it anywhere with
enough regularity to ensure that it is robust. I am going ahead and
releasing feedcache early because I am hoping for some feedback on the
API. I realized that the way I cache feeds for CastSampler.com is not
the way all applications will want to cache them, so the design might be
biased.

The Design

There are two aspects to handling caching the feed data. The high
level code that knows it is working with RSS or Atom feeds, and low
level code that saves the data with a timestamp. The high level Cache
class is responsible for fetching, updating, and expiring feed content.
The low level storage classes are responsible for saving and restoring
feed content.

Since the storage handling is separated from the cache management, it
is possible to adapt the Cache to whatever sort of storage option might
work best for you. So far, I have implemented two backend storage
options. MemoryStorage keeps everything in memory, and is mostly useful
for testing. ShelveStorage option uses the shelve module to store all
of the feed data in one file using pickles. I hope that the API for the
backend storage manager is simple enough to make it easy for you to tie
in your own backend if neither of these options is appealing. Something
that uses memcached would be very interesting, for example.

The Cache class uses a fairly simple algorithm to decide if it needs
to update the stored data:

  1. If there is nothing stored for the URL, fetch the data.
  2. If there is something stored for the URL and its time-to-live has not
    passed, use that data. (This throttles repeated requests for the same
    feed content.)
  3. If the stored data has expired, use any available ETag and
    modification time header data to perform a conditional GET of the
    data. If new data is returned, update the stored data. If no new data
    is returned, update the time-to-live for the stored data and return
    what is stored.

The feed data is retrieved and parsed by Mark Pilgrim’s feedparser
module
, so the Cache really does just manage the contents of the
backend storage.

Another benefit of separating the cache manager from the storage
handler is only the storage handler needs to be thread-safe. The storage
handler is given to each Cache as an argument to the constructor. In a
multi-threaded app, each thread can have its own Cache (which does the
fetching, when needed) and share a single backend storage handler.

Example

Here is a simple example program that uses a shelve file for storage.
The example does not use multiple threads, but should still illustrate
how to use the cache.

def main(urls=[]):
    print 'Saving feed data to ./.feedcache'
    storage = shelvestorage.ShelveStorage('.feedcache')
    storage.open()
    try:
        fc = cache.Cache(storage)
        for url in urls:
            parsed_data = fc[url]
            print parsed_data.feed.title
            for entry in parsed_data.entries:
                print 't', entry.title
    finally:
        storage.close()
    return

Additional Work

This project is still a work in process, but I would appreciate any
feedback you have, good or bad. And of course, report bugs if you find
them!

PyMOTW: glob

Even though the glob API is very simple, the module packs a lot of
power. It is useful in any situation where your program needs to look
for a list of files on the filesystem with names matching a pattern. If
you need a list of filenames that all have a certain extension, prefix,
or any common string in the middle, use glob instead of writing code to
scan the directory contents yourself.

Read more at pymotw.com: glob

AstronomyPictureOfTheDay 2.0

There is a new release of AstronomyPictureOfTheDay available this
morning. Version 2.0 is a substantial rewrite of the 1.1 version, but
retains essentially the same functionality. The primary difference is
that you no longer have to run a script to “personalize” it during
installation. Now that it is distributed as an Application instead of an
Automator workflow, you can just drag it to your Applications folder.

The source is still created with Automator, but some of the Finder
actions for working with directories have been replaced by Shell Script
actions to perform the same operations. The Finder actions are
hard-coded to specific folder names, selected in the Automator UI when
the action is configured. With a Shell Script action, I can use
environment variables such as “$HOME” to make the action more flexible.
Using variables also avoids the problem of having everything in the
workflow tied to my own home directory, so that the paths needed to be
modified before the workflow was usable by anyone else.

I could, of course, have written the entire program as a shell script,
Python program, or whatever. But the point of building it with
Automator in the first place was that it was easy. I suppose I am
stretching the boundaries of where Automator is the easiest tool for
this particular job, but at least now the hack is hidden inside the
app instead of hanging out where everyone can see it in the
installation instructions.