testing regular expressions

I discovered Christof Hoeke’s retest program today. This is a very
slick use of Python’s standard library HTTP server module to package an
AJAX app for interactively testing out regular expressions. I used to
have a Tkinter app that did something similar, but Christof’s is much
lighter weight.

Now I need to figure out how to package it to run as an app when I
double click on it in the Finder, instead of opening the .py file in an
editor.

CastSampler.com monitoring feeds

On the plane back from Phoenix this week, I implemented some changes
to the way CastSampler.com republishes feeds for the sites a user
subscribes to. The user page used to link directly to the original feed
so it would be easy to copy it to a regular RSS reader to keep up to
date on new shows. That link has been replaced with a “monitor” feed
which uses the original description and title for each item, but
replaces the link with a new URL that causes the show to be added to
your CastSampler queue. The user page still links to the original home
page for the feed, so I think I am doing enough as far as attribution
and advertisement. Any author information included in the original feed
is also passed through to the monitor feed. The OPML file generated for
a user’s feeds links to these “monitor” feeds instead of the original
source, too.

The goal of these changes is to make it easy to use a feed-reader such
as Bloglines or Google Reader to monitor podcasts from CastSampler. To
add an episode to your queue, just click the link in the monitor feed to
be directed to the appropriate CastSampler.com page.

By the way, how cool is it to be able to develop a web app on my
Powerbook while I’m on a plane? What an age to be alive.

Adium ChatMonitor

We recently set up our own Jabber server at work. For a short time we
had been using an IRC server, but decided for a variety of
non-technical reasons to switch to Jabber. The benefit is now I only
have to run one chat client (Adium). The downside, is I miss the
feature of Colloquy which had a special notification event for when
I was mentioned by name.

I searched for a while, but didn’t find any way to add such a
notification to Adium. I finally hacked something together using an
AppleScript triggered for every incoming message. I’m sure there must
be a better way to achieve the same results, but this
works. Eventually I should learn more about how to develop true OS X
apps using Objective C, and then I can create a real plugin to do the
same thing.

Nothing new under the Sun

Or should I say IBM?

It turns out IBM Alphaworks already has a data visualization
project called Many Eyes that can render network diagrams as I
described in my earlier post. The demos look impressive.

Their UI for adding data requires you to upload from a separate
source, which makes the social aspect of my idea more difficult to
implement. Perhaps Many Eyes can be used as the visualization front-end
for a site that collects the data. Any data uploaded to Many Eyes
becomes publicly visible, but that’s not an issue since the original
site would have similar rules.

The network visualization from Many Eyes is more limited than what I
would want to see in a full featured tool, though. It could be very
useful to be able to see the types of relationships (using different
colors for edges, etc.). They also point out that since the rendering is
done in the browser, it may not be well suited for large data sets or
“for networks in which a lot of nodes have a large number of neighbors”.

Originally spotted on Boing Boing.

Visualizing People and Relationships

While I’m thinking about digraphs and visualization, I want to
describe another idea for a website I have been mulling over. It would
offer a way to see the relationships between people using a digraph
rendering engine.

There would be a central organizing theme for a given rendering. It
might be the current political scandal, an emergency response plan, a
corporate organizational chart, or any other theme by which people are
related to each other. Each theme would have a rendering of the current
members and their relationships, as a digraph. Users could add people
(nodes) and relationships (edges). Relationships could have supporting
documentation in the form of URLs (useful for scandal tracking).

The UI would not need to be very complicated. To add information, you
just need a simple form with 2 fields for node names, a description of
the relationship, and optional URLs to supporting documentation. You
could get fancy with auto-completion of the node names, but that’s just
a detail. Editing a node/edge uses a similarly simple form. Each theme
page would also have an RSS feed, of course, of changes.

It would also be useful to be able to see the themes a node was
involved in, as an alternate view. So an individual lawmaker might show
up in a theme for a campaign and a general legislative topic.

As with any social site, suppressing malicious input might be tricky.
Using the wikipedia model of allowing anyone to edit anything, flag
content as suspicious, and block edits to prevent flame-type wars might
be enough.

All of the graphs should be available as image files. The question is,
are they rendered on the fly or on some regular basis? That would depend
on how expensive the rendering is. Obviously they only need to be
re-rendered after a change, so we want to cache the output files.

Spam Irony

In my spam research today, I came across this link to a blog post
discussing POPFile, a POP3 spam filtering tool. I’ve seen the tool
before, and I’m not even sure why I bothered to read the post, but I’m
glad I did. This bit from the end caught my eye:

Steve Shaw is the developer of PopUpMaster Pro, which allows you to
add unblockable popups to your web site quickly and easily,
specifically designed to sign up subscribers to your list, and fast.

It’s good to see that the marketers are not immune to the problem.

PyUGraph

I am continuing to migrate my old project repositories from CVS to
svn. In the process, today, I found some old code I wrote in 2001 (or
earlier) to generate input files for daVinci, an old di-graph
visualization package. It turns out that daVinci has been renamed to
uGraph, so when I released the code I updated the module name.

There are now other, possibly better, graph visualization tools
available. NetworkX looks very promising. It uses Graphviz, which
produces some really nice output. But, daVinci was the first tool I used
for doing relationship analysis. I used it to analyze calls between
functions in some nasty Perl code I was maintaining. I also used it to
analyze the module linkage dependencies in a large C toolkit library,
with the idea that we would split the big library up into several
smaller .so files for release. And there have been several one-off
projects along the way, too. Unfortunately, I seem to have lost most of
that code.

Object-Relational Mappers

My friend Steve and I have spent some time discussing
object-relational mapping recently, partially initiated by his
comments on the ORM features in django
.

For some reason I’ve never quite understood, there seems to be an
inherent fear of SQL in the web development community, and over the
years there have been many efforts to hide the SQL completely (or in
the case of Zope, encourage the use of a custom object database
instead of a relational database). Personally I’m wary of any form
of object relational mapping which works automatically. What I do
want is a nice abstraction layer (sometimes called the data access
object pattern), so that the code working with objects doesn’t know
that the objects are actually stored in a relational database.

I tend to agree. I’m confused by the intense need to create a new way
to express a relational database schema in Python, Ruby, or any other
language. The DDL is a perfectly legitimate way to express the schema
of the database. Why not just use it?

We use an ORM like that at work. The whole thing was written several
years ago before the SQLObject and SQLAlchemy ORMs were available, of
course, or we would be using one of them. The database connection layer
scans the database table definitions when it connects for the first
time. The base class for persistent objects uses that information to
discover the names and types of attributes for classes. We do it all at
runtime now, though we have discussed caching the information somehow
for production servers (maybe using a pickle or even writing out a
python module during the build by parsing the DDL itself). Scanning the
tables doesn’t take as long as you would think, though, so it hasn’t
become a hot-spot for performance tuning. Yet.

Steve suggested a slightly different design. Use DDL to define the
schema, then convert the schema to base classes (one per table) with a
code generator. Then subclass from the auto-generated tables to add
“business logic”. I’m not sure how well that would work, but it sounds
like an interesting idea. If the generated code is going to support
querying for and returning related objects, how does it know to use the
subclass to create instances instead of the generated class?

I do like the automatic handling of queries for related objects, and
the system used by django is particularly elegant. Two features I
especially like are:

  1. The QuerySet isn’t resolved and executed until you start indexing
    into it.
  2. Modifying a QuerySet by applying a filter actually creates a new
    QuerySet.

This means passing QuerySet instances around is inexpensive, and callers
do not have to worry about call-by-reference objects being modified
unexpectedly. I need to study SQLAlchemy again, to see how it handles
query result sets.

Blog location change

I’ve decided to take advantage of the new Blogger feature “Custom
Domains
” and move my blog under my own domain. This is a much more
attractive feature than the older ftp publishing since Blogger still
hosts the content for me.

If all goes well, it should be transparent and all of the old URLs
should redirect to the new domain.