Better blogger backups

I have enhanced the blog backup script I wrote a while back to
automatically find and include comments feeds, so comments are now
archived along with the original feed data. The means for recognizing
“comments” feeds may make the script work only with blogger.com, though,
since it depends on having “comments” in the URL. This does what I need
now, though.

testing regular expressions

I discovered Christof Hoeke’s retest program today. This is a very
slick use of Python’s standard library HTTP server module to package an
AJAX app for interactively testing out regular expressions. I used to
have a Tkinter app that did something similar, but Christof’s is much
lighter weight.

Now I need to figure out how to package it to run as an app when I
double click on it in the Finder, instead of opening the .py file in an
editor.

Object-Relational Mappers

My friend Steve and I have spent some time discussing
object-relational mapping recently, partially initiated by his
comments on the ORM features in django
.

For some reason I’ve never quite understood, there seems to be an
inherent fear of SQL in the web development community, and over the
years there have been many efforts to hide the SQL completely (or in
the case of Zope, encourage the use of a custom object database
instead of a relational database). Personally I’m wary of any form
of object relational mapping which works automatically. What I do
want is a nice abstraction layer (sometimes called the data access
object pattern), so that the code working with objects doesn’t know
that the objects are actually stored in a relational database.

I tend to agree. I’m confused by the intense need to create a new way
to express a relational database schema in Python, Ruby, or any other
language. The DDL is a perfectly legitimate way to express the schema
of the database. Why not just use it?

We use an ORM like that at work. The whole thing was written several
years ago before the SQLObject and SQLAlchemy ORMs were available, of
course, or we would be using one of them. The database connection layer
scans the database table definitions when it connects for the first
time. The base class for persistent objects uses that information to
discover the names and types of attributes for classes. We do it all at
runtime now, though we have discussed caching the information somehow
for production servers (maybe using a pickle or even writing out a
python module during the build by parsing the DDL itself). Scanning the
tables doesn’t take as long as you would think, though, so it hasn’t
become a hot-spot for performance tuning. Yet.

Steve suggested a slightly different design. Use DDL to define the
schema, then convert the schema to base classes (one per table) with a
code generator. Then subclass from the auto-generated tables to add
“business logic”. I’m not sure how well that would work, but it sounds
like an interesting idea. If the generated code is going to support
querying for and returning related objects, how does it know to use the
subclass to create instances instead of the generated class?

I do like the automatic handling of queries for related objects, and
the system used by django is particularly elegant. Two features I
especially like are:

  1. The QuerySet isn’t resolved and executed until you start indexing
    into it.
  2. Modifying a QuerySet by applying a filter actually creates a new
    QuerySet.

This means passing QuerySet instances around is inexpensive, and callers
do not have to worry about call-by-reference objects being modified
unexpectedly. I need to study SQLAlchemy again, to see how it handles
query result sets.

Python Cheese Shop

It has been a while since I released a new open source project. The
last time I dealt with the Python project registry it required a highly
manual through-the-web registration process. The Cheese Shop is so
much nicer, and the integration with distutils makes it so easy to
register a project and release that there is no reason in the world not
to do it. There are just a few basic steps to getting started:

  1. Create a user at http://cheeseshop.python.org/pypi by clicking on the
    “Register” link and following the instructions.
  2. Create a setup.py file for your Python project. You’re doing this
    already, aren’t you, so your users can install your app or library
    with disutils?
  3. Type: python setup.py register

The CheeseShopTutorial has more details, but once you’ve registered
it really is just that simple. It turns out they will even host
downloads of the source releases, if you want. I don’t mind hosting my
own releases, and they will only host Python (so none of my AppleScript
projects could go there). But that’s a nice commitment on their part.

Featuritis

My project site is finally online, and I find myself falling into
precisely the trap I was hoping to avoid. I originally wanted to find
some existing software to host the site, so I could concentrate on the
myriad projects cluttering up the back of my brain. Since I opted to
build my own, I’ve found myself focusing on building more features into
the site management tool instead of those other projects.
In any event, today I added Atom feeds to track releases for each
project, as well as a global feed to track all releases from the site.
The feeds include download links to each released software bundle as
enclosures, because it was easy not because it seems especially useful.

code hosting

I spent some time over the weekend building a rough tool with django to
host my code projects. It is only at http://www.doughellmann.com,
though that domain may not be available in your DNS cache, yet. I’m
happy with the schema for the results, but will probably tweak the
colors and layout for a while.

Proctor 1.0

I’ve moved Proctor development from sourceforge to my own server
and released version 1.0.

We have been using proctor successfully for several years now at work,
and it makes automating our nightly tests very easy. The build is
automatic, the software is installed automatically, and then proctor
runs the test suite. All 3000+ tests take several hours to run, mostly
because they aren’t all strictly “unit” tests.

mailbox2ics

We have an Exchange-like mail server at work, but it doesn’t support
iCal subscriptions. Since I use a Mac, and don’t have any interest in
Outlook, that makes calendar access a pain.

After some poking around, I discovered that the server stores the
calendar information in IMAP folders, with each event in a separate ICS
file attached to a fake message. So I put together a small script read
the IMAP messages and merge the ICS files into a single output file. By
writing the output file to a folder on the web server, it is easy to set
up a subscription in iCal.

The result only works one way, of course, though it should be possible
to push fake messages into the IMAP server. I have not tried that,
because I just use the server’s web interface for adding new events to
the calendar.

The script depends on the icalendar package from codespeak.net, and
the Python standard library packages for IMAP and email parsing.

I have posted the script to my server: mailbox2ics.