My friend Steve and I have spent some time discussing
object-relational mapping recently, partially initiated by his
comments on the ORM features in django.
For some reason I’ve never quite understood, there seems to be an
inherent fear of SQL in the web development community, and over the
years there have been many efforts to hide the SQL completely (or in
the case of Zope, encourage the use of a custom object database
instead of a relational database). Personally I’m wary of any form
of object relational mapping which works automatically. What I do
want is a nice abstraction layer (sometimes called the data access
object pattern), so that the code working with objects doesn’t know
that the objects are actually stored in a relational database.
I tend to agree. I’m confused by the intense need to create a new way
to express a relational database schema in Python, Ruby, or any other
language. The DDL is a perfectly legitimate way to express the schema
of the database. Why not just use it?
We use an ORM like that at work. The whole thing was written several
years ago before the SQLObject and SQLAlchemy ORMs were available, of
course, or we would be using one of them. The database connection layer
scans the database table definitions when it connects for the first
time. The base class for persistent objects uses that information to
discover the names and types of attributes for classes. We do it all at
runtime now, though we have discussed caching the information somehow
for production servers (maybe using a pickle or even writing out a
python module during the build by parsing the DDL itself). Scanning the
tables doesn’t take as long as you would think, though, so it hasn’t
become a hot-spot for performance tuning. Yet.
Steve suggested a slightly different design. Use DDL to define the
schema, then convert the schema to base classes (one per table) with a
code generator. Then subclass from the auto-generated tables to add
“business logic”. I’m not sure how well that would work, but it sounds
like an interesting idea. If the generated code is going to support
querying for and returning related objects, how does it know to use the
subclass to create instances instead of the generated class?
I do like the automatic handling of queries for related objects, and
the system used by django is particularly elegant. Two features I
especially like are:
- The QuerySet isn’t resolved and executed until you start indexing
- Modifying a QuerySet by applying a filter actually creates a new
This means passing QuerySet instances around is inexpensive, and callers
do not have to worry about call-by-reference objects being modified
unexpectedly. I need to study SQLAlchemy again, to see how it handles
query result sets.