New project: feedcache

Back in mid-June I promised Jeremy Jones that I would clean up some
of the code I use for to cache RSS and Atom feeds so
he could look at it for his podgrabber project. I finally found some
time to work on it this weekend (not quite 2 months later, sorry

The result is feedcache, which I have released in “alpha” status,
for now. I don’t usually bother releasing my code in alpha state,
because that usually means I’m not actually using it anywhere with
enough regularity to ensure that it is robust. I am going ahead and
releasing feedcache early because I am hoping for some feedback on the
API. I realized that the way I cache feeds for is not
the way all applications will want to cache them, so the design might be

The Design

There are two aspects to handling caching the feed data. The high
level code that knows it is working with RSS or Atom feeds, and low
level code that saves the data with a timestamp. The high level Cache
class is responsible for fetching, updating, and expiring feed content.
The low level storage classes are responsible for saving and restoring
feed content.

Since the storage handling is separated from the cache management, it
is possible to adapt the Cache to whatever sort of storage option might
work best for you. So far, I have implemented two backend storage
options. MemoryStorage keeps everything in memory, and is mostly useful
for testing. ShelveStorage option uses the shelve module to store all
of the feed data in one file using pickles. I hope that the API for the
backend storage manager is simple enough to make it easy for you to tie
in your own backend if neither of these options is appealing. Something
that uses memcached would be very interesting, for example.

The Cache class uses a fairly simple algorithm to decide if it needs
to update the stored data:

  1. If there is nothing stored for the URL, fetch the data.
  2. If there is something stored for the URL and its time-to-live has not
    passed, use that data. (This throttles repeated requests for the same
    feed content.)
  3. If the stored data has expired, use any available ETag and
    modification time header data to perform a conditional GET of the
    data. If new data is returned, update the stored data. If no new data
    is returned, update the time-to-live for the stored data and return
    what is stored.

The feed data is retrieved and parsed by Mark Pilgrim’s feedparser
, so the Cache really does just manage the contents of the
backend storage.

Another benefit of separating the cache manager from the storage
handler is only the storage handler needs to be thread-safe. The storage
handler is given to each Cache as an argument to the constructor. In a
multi-threaded app, each thread can have its own Cache (which does the
fetching, when needed) and share a single backend storage handler.


Here is a simple example program that uses a shelve file for storage.
The example does not use multiple threads, but should still illustrate
how to use the cache.

def main(urls=[]):
    print 'Saving feed data to ./.feedcache'
    storage = shelvestorage.ShelveStorage('.feedcache')
        fc = cache.Cache(storage)
        for url in urls:
            parsed_data = fc[url]
            print parsed_data.feed.title
            for entry in parsed_data.entries:
                print 't', entry.title

Additional Work

This project is still a work in process, but I would appreciate any
feedback you have, good or bad. And of course, report bugs if you find

PyMOTW: glob

Even though the glob API is very simple, the module packs a lot of
power. It is useful in any situation where your program needs to look
for a list of files on the filesystem with names matching a pattern. If
you need a list of filenames that all have a certain extension, prefix,
or any common string in the middle, use glob instead of writing code to
scan the directory contents yourself.

Read more at glob

AstronomyPictureOfTheDay 2.0

There is a new release of AstronomyPictureOfTheDay available this
morning. Version 2.0 is a substantial rewrite of the 1.1 version, but
retains essentially the same functionality. The primary difference is
that you no longer have to run a script to “personalize” it during
installation. Now that it is distributed as an Application instead of an
Automator workflow, you can just drag it to your Applications folder.

The source is still created with Automator, but some of the Finder
actions for working with directories have been replaced by Shell Script
actions to perform the same operations. The Finder actions are
hard-coded to specific folder names, selected in the Automator UI when
the action is configured. With a Shell Script action, I can use
environment variables such as “$HOME” to make the action more flexible.
Using variables also avoids the problem of having everything in the
workflow tied to my own home directory, so that the paths needed to be
modified before the workflow was usable by anyone else.

I could, of course, have written the entire program as a shell script,
Python program, or whatever. But the point of building it with
Automator in the first place was that it was easy. I suppose I am
stretching the boundaries of where Automator is the easiest tool for
this particular job, but at least now the hack is hidden inside the
app instead of hanging out where everyone can see it in the
installation instructions.

Converting podcasts to regular tracks in iTunes

I have spent the better part of the morning trying to work out how to
convert podcasts to “regular” tracks in iTunes, so they would show up in
shuffle, etc. Mostly this was for my collection of Jonathan Coulton
Thing a Week” episodes, but it would be useful for anything you
wanted to move out of your podcast list into the main audio portion of
the library. I suppose the reason it took so long to find the solution
is I started by searching for it on Google instead of just looking
through the iTunes menu options, though as you will see the solution
wasn’t immediately obvious even once I had found it.

I knew that iTunes would let me change settings like “Remember
playback position” and “Skip when shuffling” from the info dialog
(Cmd-I), so I started there. The only setting that even mentioned
podcast was the genre, and I already knew that was not what I needed to
change. Some of the tracks already had “real” genres and only some were
set to “Podcast”.

I also knew there is a separate podcast flag available for queries in
Smart Lists and AppleScript, since I used it to create my “Active Queue”
podcast playlist with selections of episodes from various podcast series
(it’s like creating your own mix tape, but for talk radio). I tried to
write a simple AppleScript to change the podcast flag of selected tracks
to true. It turns out the flag is a read-only attribute. No amount of
searching uncovered any way to change the setting using AppleScript.

The first useful looking suggestions I ran into were to convert the
ID3 tag format to an older version
, then convert it back. Doing that
erased most of the comments and other meta-data associated with the
tracks, though, so I didn’t like the results.

Next I found a few forum and blog posts that talked about an
ITUNESPODCAST setting in the extended ID3 tags. They all mentioned a
Windows program for removing or changing the flag, though. I examined a
few of the files with Ned Batchelder’s python module id3reader, but
didn’t see anything that looked like “ITUNESPODCAST” in the output.

Going back to Google, I finally found a reference to converting the
files to AAC using an option in the Advanced menu. That seemed like
overkill, but at this point I was becoming fed up and just wanted to be
done with the whole thing. I could always re-encode as MP3, after all.
Well, iTunes didn’t have a menu option to “Encode as AAC”. It did have
“Convert selection to MP3”, which didn’t make much sense to me. As far
as I knew, the tracks were already MP3 files. But lo and behold,
selecting that menu option did enable them in the iTunes Music Library.
It made copies of all of the tracks as it converted them, so I could
even delete the podcast subscription.

So, if you want to add podcast episodes you have already downloaded to
your music library and turn off the podcast flag, select the track and
choose Advanced->Convert selection to MP3.

Unexpectedly broken, and fixed: svnbackup

Yesterday Pierre Lemay sent me one of the clearest bug reports I’ve
seen in quite a while, and a patch to fix the problem. He was having
trouble with svnbackup duplicating changesets in the dump files. It
turns out every changeset that appeared on a “boundary” (at the end of
one dump file and the beginning of the next) was included in both dumps.

When I tested the script, I was able to recover the repository without
any trouble. I didn’t check 2 cases that Pierre encountered. First,
the changeset revision numbers did not stay consistent. When the
changeset was duplicated, that threw off all of the subsequent
changeset ids by 1. For each duplicate. That in itself is only
annoying. The more troubling problem is when a duplicate changeset
includes a delete operation. The second delete would fail while
restoring, which prevented Pierre from importing the rest of the

In his email describing all of that, he gave me great details about
how he had tested, the specific scenario that caused the problem, and
then provided the fix!

So, if you are using version 1.0, go on over and download version
1.1 with Pierre’s fixes.

Unexpectedly popular: svnbackup

PyMOTW: atexit

The atexit module provides a simple interface to register
functions to be called when a program closes down normally. The
sys module also provides a hook, sys.exitfunc, but only one
function can be registered there. The atexit registry can be used
by multiple modules and libraries simultaneously.

Read more at atexit

PyMOTW: subprocess

The subprocess module provides a consistent interface to creating and
working with additional processes. It offers a higher-level interface
than some of the other available modules, and is intended to replace
functions such as os.system, os.spawn*, os.popen*, popen2.* and
commands.*. To make it easier to compare subprocess with those other
modules, this week I will re-create earlier examples using the functions
being replaced.

Read more at subprocess