Object-Relational Mappers

My friend Steve and I have spent some time discussing object-relational mapping recently, partially initiated by his comments on the ORM features in django.

For some reason I’ve never quite understood, there seems to be an inherent fear of SQL in the web development community, and over the years there have been many efforts to hide the SQL completely (or in the case of Zope, encourage the use of a custom object database instead of a relational database). Personally I’m wary of any form of object relational mapping which works automatically. What I do want is a nice abstraction layer (sometimes called the data access object pattern), so that the code working with objects doesn’t know that the objects are actually stored in a relational database.

I tend to agree. I’m confused by the intense need to create a new way to express a relational database schema in Python, Ruby, or any other language. The DDL is a perfectly legitimate way to express the schema of the database. Why not just use it?

We use an ORM like that at work. The whole thing was written several years ago before the SQLObject and SQLAlchemy ORMs were available, of course, or we would be using one of them. The database connection layer scans the database table definitions when it connects for the first time. The base class for persistent objects uses that information to discover the names and types of attributes for classes. We do it all at runtime now, though we have discussed caching the information somehow for production servers (maybe using a pickle or even writing out a python module during the build by parsing the DDL itself). Scanning the tables doesn’t take as long as you would think, though, so it hasn’t become a hot-spot for performance tuning. Yet.

Steve suggested a slightly different design. Use DDL to define the schema, then convert the schema to base classes (one per table) with a code generator. Then subclass from the auto-generated tables to add “business logic”. I’m not sure how well that would work, but it sounds like an interesting idea. If the generated code is going to support querying for and returning related objects, how does it know to use the subclass to create instances instead of the generated class?

I do like the automatic handling of queries for related objects, and the system used by django is particularly elegant. Two features I especially like are:

  • The QuerySet isn’t resolved and executed until you start indexing into it.
  • Modifying a QuerySet by applying a filter actually creates a new QuerySet.

This means passing QuerySet instances around is inexpensive, and callers do not have to worry about call-by-reference objects being modified unexpectedly. I need to study SQLAlchemy again, to see how it handles query result sets.