CommandLineApp

Back when Python 1.5.4 was hot and new, I wrote a class to serve as a
basis for the many command line programs I was working on for myself and
my employer. This was long before the Option Parsing Wars that resulted
in the addition of optparse to the standard library. If optparse had
been around, I probably wouldn’t have written CommandLineApp, but
since all I had to work with at the time was getopt, and it operated at
such a low level, I hacked together a helper class.

The difference between CommandLineApp and optparse is that
CommandLineApp treats your application as an object, just like
everything else in the application. The application class is responsible
for option processing, although it collaborates with getopt to do the
parsing work.

To use it, you subclass CommandLineApp and define option handler
methods and a main(). To invoke the program, call run(). The option
handlers are identified by name, and used to build the list of supported
options. optionHandler_myopt() is called when –myopt is encountered.
If the method takes an argument, so does your option. The docstrings for
the callback methods are used to create the help output. Support for
short-form usage (via -h) and long-form help (via –help) are built-in
to the base class.

The old version (released as 1.0), which had not received a
substantial rewrite in many years (mostly because it still worked fine
and I had more important projects to work on) can run under Python 1.4
through 2.5. It was some of the earliest complex Python code I ever
wrote, and that is clear from the code quality (both style and
substance). The new version has been tested under Python 2.5. It feels
less hack-ish, since it uses inspect instead of scanning the class
hierarchy and method signatures directly.

The 2.0 rewrite works in essentially the same way as 1.0, but is much
more compact and (I think) the code is cleaner. I called it 2.0 because
the class API is different in a few important ways from the earlier
version. I still want to add argument validation (for non-option
arguments to the program), but that will take a little more time.

ORM envy

Jonathan LaCour presented an overview of SQLAlchemy at last night’s
PyATL meeting, and now I have ORM envy. It’s too bad I can’t afford the
effort that would be involved in replacing the in-house ORM we use at
work, but I’ll definitely consider using it for my own projects.

PyMOTW: shelve

The shelve module can be used as a simple persistent storage option
for Python objects when a relational database is overkill. The shelf is
accessed by keys, just as with a dictionary. The values are pickled and
written to a database created and managed by anydbm.

Read more at pymotw.com: shelve

New project: feedcache

Back in mid-June I promised Jeremy Jones that I would clean up some
of the code I use for CastSampler.com to cache RSS and Atom feeds so
he could look at it for his podgrabber project. I finally found some
time to work on it this weekend (not quite 2 months later, sorry
Jeremy).

The result is feedcache, which I have released in “alpha” status,
for now. I don’t usually bother releasing my code in alpha state,
because that usually means I’m not actually using it anywhere with
enough regularity to ensure that it is robust. I am going ahead and
releasing feedcache early because I am hoping for some feedback on the
API. I realized that the way I cache feeds for CastSampler.com is not
the way all applications will want to cache them, so the design might be
biased.

The Design

There are two aspects to handling caching the feed data. The high
level code that knows it is working with RSS or Atom feeds, and low
level code that saves the data with a timestamp. The high level Cache
class is responsible for fetching, updating, and expiring feed content.
The low level storage classes are responsible for saving and restoring
feed content.

Since the storage handling is separated from the cache management, it
is possible to adapt the Cache to whatever sort of storage option might
work best for you. So far, I have implemented two backend storage
options. MemoryStorage keeps everything in memory, and is mostly useful
for testing. ShelveStorage option uses the shelve module to store all
of the feed data in one file using pickles. I hope that the API for the
backend storage manager is simple enough to make it easy for you to tie
in your own backend if neither of these options is appealing. Something
that uses memcached would be very interesting, for example.

The Cache class uses a fairly simple algorithm to decide if it needs
to update the stored data:

  1. If there is nothing stored for the URL, fetch the data.
  2. If there is something stored for the URL and its time-to-live has not
    passed, use that data. (This throttles repeated requests for the same
    feed content.)
  3. If the stored data has expired, use any available ETag and
    modification time header data to perform a conditional GET of the
    data. If new data is returned, update the stored data. If no new data
    is returned, update the time-to-live for the stored data and return
    what is stored.

The feed data is retrieved and parsed by Mark Pilgrim’s feedparser
module
, so the Cache really does just manage the contents of the
backend storage.

Another benefit of separating the cache manager from the storage
handler is only the storage handler needs to be thread-safe. The storage
handler is given to each Cache as an argument to the constructor. In a
multi-threaded app, each thread can have its own Cache (which does the
fetching, when needed) and share a single backend storage handler.

Example

Here is a simple example program that uses a shelve file for storage.
The example does not use multiple threads, but should still illustrate
how to use the cache.

def main(urls=[]):
    print 'Saving feed data to ./.feedcache'
    storage = shelvestorage.ShelveStorage('.feedcache')
    storage.open()
    try:
        fc = cache.Cache(storage)
        for url in urls:
            parsed_data = fc[url]
            print parsed_data.feed.title
            for entry in parsed_data.entries:
                print 't', entry.title
    finally:
        storage.close()
    return

Additional Work

This project is still a work in process, but I would appreciate any
feedback you have, good or bad. And of course, report bugs if you find
them!

PyMOTW: glob

Even though the glob API is very simple, the module packs a lot of
power. It is useful in any situation where your program needs to look
for a list of files on the filesystem with names matching a pattern. If
you need a list of filenames that all have a certain extension, prefix,
or any common string in the middle, use glob instead of writing code to
scan the directory contents yourself.

Read more at pymotw.com: glob

AstronomyPictureOfTheDay 2.0

There is a new release of AstronomyPictureOfTheDay available this
morning. Version 2.0 is a substantial rewrite of the 1.1 version, but
retains essentially the same functionality. The primary difference is
that you no longer have to run a script to “personalize” it during
installation. Now that it is distributed as an Application instead of an
Automator workflow, you can just drag it to your Applications folder.

The source is still created with Automator, but some of the Finder
actions for working with directories have been replaced by Shell Script
actions to perform the same operations. The Finder actions are
hard-coded to specific folder names, selected in the Automator UI when
the action is configured. With a Shell Script action, I can use
environment variables such as “$HOME” to make the action more flexible.
Using variables also avoids the problem of having everything in the
workflow tied to my own home directory, so that the paths needed to be
modified before the workflow was usable by anyone else.

I could, of course, have written the entire program as a shell script,
Python program, or whatever. But the point of building it with
Automator in the first place was that it was easy. I suppose I am
stretching the boundaries of where Automator is the easiest tool for
this particular job, but at least now the hack is hidden inside the
app instead of hanging out where everyone can see it in the
installation instructions.

Converting podcasts to regular tracks in iTunes

I have spent the better part of the morning trying to work out how to
convert podcasts to “regular” tracks in iTunes, so they would show up in
shuffle, etc. Mostly this was for my collection of Jonathan Coulton
Thing a Week” episodes, but it would be useful for anything you
wanted to move out of your podcast list into the main audio portion of
the library. I suppose the reason it took so long to find the solution
is I started by searching for it on Google instead of just looking
through the iTunes menu options, though as you will see the solution
wasn’t immediately obvious even once I had found it.

I knew that iTunes would let me change settings like “Remember
playback position” and “Skip when shuffling” from the info dialog
(Cmd-I), so I started there. The only setting that even mentioned
podcast was the genre, and I already knew that was not what I needed to
change. Some of the tracks already had “real” genres and only some were
set to “Podcast”.

I also knew there is a separate podcast flag available for queries in
Smart Lists and AppleScript, since I used it to create my “Active Queue”
podcast playlist with selections of episodes from various podcast series
(it’s like creating your own mix tape, but for talk radio). I tried to
write a simple AppleScript to change the podcast flag of selected tracks
to true. It turns out the flag is a read-only attribute. No amount of
searching uncovered any way to change the setting using AppleScript.

The first useful looking suggestions I ran into were to convert the
ID3 tag format to an older version
, then convert it back. Doing that
erased most of the comments and other meta-data associated with the
tracks, though, so I didn’t like the results.

Next I found a few forum and blog posts that talked about an
ITUNESPODCAST setting in the extended ID3 tags. They all mentioned a
Windows program for removing or changing the flag, though. I examined a
few of the files with Ned Batchelder’s python module id3reader, but
didn’t see anything that looked like “ITUNESPODCAST” in the output.

Going back to Google, I finally found a reference to converting the
files to AAC using an option in the Advanced menu. That seemed like
overkill, but at this point I was becoming fed up and just wanted to be
done with the whole thing. I could always re-encode as MP3, after all.
Well, iTunes didn’t have a menu option to “Encode as AAC”. It did have
“Convert selection to MP3”, which didn’t make much sense to me. As far
as I knew, the tracks were already MP3 files. But lo and behold,
selecting that menu option did enable them in the iTunes Music Library.
It made copies of all of the tracks as it converted them, so I could
even delete the podcast subscription.

So, if you want to add podcast episodes you have already downloaded to
your music library and turn off the podcast flag, select the track and
choose Advanced->Convert selection to MP3.

Unexpectedly broken, and fixed: svnbackup

Yesterday Pierre Lemay sent me one of the clearest bug reports I’ve
seen in quite a while, and a patch to fix the problem. He was having
trouble with svnbackup duplicating changesets in the dump files. It
turns out every changeset that appeared on a “boundary” (at the end of
one dump file and the beginning of the next) was included in both dumps.
Oops.

When I tested the script, I was able to recover the repository without
any trouble. I didn’t check 2 cases that Pierre encountered. First,
the changeset revision numbers did not stay consistent. When the
changeset was duplicated, that threw off all of the subsequent
changeset ids by 1. For each duplicate. That in itself is only
annoying. The more troubling problem is when a duplicate changeset
includes a delete operation. The second delete would fail while
restoring, which prevented Pierre from importing the rest of the
backup.

In his email describing all of that, he gave me great details about
how he had tested, the specific scenario that caused the problem, and
then provided the fix!

So, if you are using version 1.0, go on over and download version
1.1 with Pierre’s fixes.

Unexpectedly popular: svnbackup