Nothing new under the Sun

Or should I say IBM?

It turns out IBM Alphaworks already has a data visualization
project called Many Eyes that can render network diagrams as I
described in my earlier post. The demos look impressive.

Their UI for adding data requires you to upload from a separate
source, which makes the social aspect of my idea more difficult to
implement. Perhaps Many Eyes can be used as the visualization front-end
for a site that collects the data. Any data uploaded to Many Eyes
becomes publicly visible, but that’s not an issue since the original
site would have similar rules.

The network visualization from Many Eyes is more limited than what I
would want to see in a full featured tool, though. It could be very
useful to be able to see the types of relationships (using different
colors for edges, etc.). They also point out that since the rendering is
done in the browser, it may not be well suited for large data sets or
“for networks in which a lot of nodes have a large number of neighbors”.

Originally spotted on Boing Boing.

Visualizing People and Relationships

While I’m thinking about digraphs and visualization, I want to
describe another idea for a website I have been mulling over. It would
offer a way to see the relationships between people using a digraph
rendering engine.

There would be a central organizing theme for a given rendering. It
might be the current political scandal, an emergency response plan, a
corporate organizational chart, or any other theme by which people are
related to each other. Each theme would have a rendering of the current
members and their relationships, as a digraph. Users could add people
(nodes) and relationships (edges). Relationships could have supporting
documentation in the form of URLs (useful for scandal tracking).

The UI would not need to be very complicated. To add information, you
just need a simple form with 2 fields for node names, a description of
the relationship, and optional URLs to supporting documentation. You
could get fancy with auto-completion of the node names, but that’s just
a detail. Editing a node/edge uses a similarly simple form. Each theme
page would also have an RSS feed, of course, of changes.

It would also be useful to be able to see the themes a node was
involved in, as an alternate view. So an individual lawmaker might show
up in a theme for a campaign and a general legislative topic.

As with any social site, suppressing malicious input might be tricky.
Using the wikipedia model of allowing anyone to edit anything, flag
content as suspicious, and block edits to prevent flame-type wars might
be enough.

All of the graphs should be available as image files. The question is,
are they rendered on the fly or on some regular basis? That would depend
on how expensive the rendering is. Obviously they only need to be
re-rendered after a change, so we want to cache the output files.

Spam Irony

In my spam research today, I came across this link to a blog post
discussing POPFile, a POP3 spam filtering tool. I’ve seen the tool
before, and I’m not even sure why I bothered to read the post, but I’m
glad I did. This bit from the end caught my eye:

Steve Shaw is the developer of PopUpMaster Pro, which allows you to
add unblockable popups to your web site quickly and easily,
specifically designed to sign up subscribers to your list, and fast.

It’s good to see that the marketers are not immune to the problem.

PyUGraph

I am continuing to migrate my old project repositories from CVS to
svn. In the process, today, I found some old code I wrote in 2001 (or
earlier) to generate input files for daVinci, an old di-graph
visualization package. It turns out that daVinci has been renamed to
uGraph, so when I released the code I updated the module name.

There are now other, possibly better, graph visualization tools
available. NetworkX looks very promising. It uses Graphviz, which
produces some really nice output. But, daVinci was the first tool I used
for doing relationship analysis. I used it to analyze calls between
functions in some nasty Perl code I was maintaining. I also used it to
analyze the module linkage dependencies in a large C toolkit library,
with the idea that we would split the big library up into several
smaller .so files for release. And there have been several one-off
projects along the way, too. Unfortunately, I seem to have lost most of
that code.

Object-Relational Mappers

My friend Steve and I have spent some time discussing
object-relational mapping recently, partially initiated by his
comments on the ORM features in django
.

For some reason I’ve never quite understood, there seems to be an
inherent fear of SQL in the web development community, and over the
years there have been many efforts to hide the SQL completely (or in
the case of Zope, encourage the use of a custom object database
instead of a relational database). Personally I’m wary of any form
of object relational mapping which works automatically. What I do
want is a nice abstraction layer (sometimes called the data access
object pattern), so that the code working with objects doesn’t know
that the objects are actually stored in a relational database.

I tend to agree. I’m confused by the intense need to create a new way
to express a relational database schema in Python, Ruby, or any other
language. The DDL is a perfectly legitimate way to express the schema
of the database. Why not just use it?

We use an ORM like that at work. The whole thing was written several
years ago before the SQLObject and SQLAlchemy ORMs were available, of
course, or we would be using one of them. The database connection layer
scans the database table definitions when it connects for the first
time. The base class for persistent objects uses that information to
discover the names and types of attributes for classes. We do it all at
runtime now, though we have discussed caching the information somehow
for production servers (maybe using a pickle or even writing out a
python module during the build by parsing the DDL itself). Scanning the
tables doesn’t take as long as you would think, though, so it hasn’t
become a hot-spot for performance tuning. Yet.

Steve suggested a slightly different design. Use DDL to define the
schema, then convert the schema to base classes (one per table) with a
code generator. Then subclass from the auto-generated tables to add
“business logic”. I’m not sure how well that would work, but it sounds
like an interesting idea. If the generated code is going to support
querying for and returning related objects, how does it know to use the
subclass to create instances instead of the generated class?

I do like the automatic handling of queries for related objects, and
the system used by django is particularly elegant. Two features I
especially like are:

  1. The QuerySet isn’t resolved and executed until you start indexing
    into it.
  2. Modifying a QuerySet by applying a filter actually creates a new
    QuerySet.

This means passing QuerySet instances around is inexpensive, and callers
do not have to worry about call-by-reference objects being modified
unexpectedly. I need to study SQLAlchemy again, to see how it handles
query result sets.

Blog location change

I’ve decided to take advantage of the new Blogger feature “Custom
Domains
” and move my blog under my own domain. This is a much more
attractive feature than the older ftp publishing since Blogger still
hosts the content for me.

If all goes well, it should be transparent and all of the old URLs
should redirect to the new domain.

page rank

A few months ago when I googled myself, I came up with a variety of
random old posts to forums or mailing lists. Most of the information was
stale. After a couple of weeks of having this blog online, and just a
few days of having my personal site online, those have hit the top of
the search results list for “doug hellmann”. Somehow that’s
satisfying.

Entrepreneurial Debt Waivers

The company I work for came out of the Advanced Technology
Development Center
at Georgia Tech, which is an incubator for small
companies run by the university. Among other resources, ATDC provides
nice facilities with shared conference and break rooms but private
office and lab space. There were a lot of companies in the incubator, at
various levels of maturity. There were regular get-togethers and plenty
of opportunity to exchange ideas with people down the hall. Our company
has since graduated, but the time we spent there meant we didn’t have to
worry about a lot of little details that come up with a business.

Ed Kohler writes about an idea for VC firms over at Technology
Evangalist
. The basic idea is to grab students as they graduate
(probably before) and set them up so they have no loan debt and a
reasonable salary in exchange for a stake in whatever idea they happen
to be working on. Kohler’s idea takes the normal incubator like ATDC one
(or more) step farther by suggesting paying off student loans and and
the housing rent as well as the office space.

It seems natural to extend this even further and combine the 2
systems. Why not buy an office/apartment building? Offer a variety of
apartment sizes to accommodate married and single people, etc. Provide
office space, a food court, the works. Some of the space could even be
rented to companies that are not part of the VC fund. The point is to
pull all of it together into one place to keep the energy level high and
make it an attractive place to be in addition to sharing whatever
resources can be shared between companies.

Maybe the whole thing is done by renting out floors of a multi-use
building that someone else owns under a single lease, then subletting
the space (instead of buying the building out-right). I tend to think in
terms of high-rises because I work in Atlanta. In other areas, you might
want a campus or office park. I’m sure there are a lot of ways to
structure it.

In the end what you get is a startup “factory” that churns out ideas
on a (hopefully) regular basis. You bring a new crop of people in each
semester when they graduate. Start small with each new company. Each
“startup” is owned by a holding company in the beginning. If it starts
to look promising, spin it off to its own company as needed. If an idea
isn’t panning out, kill it and either move the people to another project
or cut them loose and let them try it on their own.

Maybe multiple VC firms would work together to fund the thing – I
don’t know how well the politics of that would work, but I’m not a VC.

Hmm. This all sounds a lot like the old research labs from before
everyone wanted to be out on their own…

Coder’s Block

Logan Koester posted some tips for overcoming Coder’s Block.

I get blocked, once in a while, too. In those situations, it almost
always comes with the feeling that the problem I am trying to solve is
too big. That, in turn, usually stems from not having thought about the
problem enough, rather than the other way around.

The development staff at my company is pretty small, so we are all
involved in each new feature from “front to back”, as it were. I like to
start by thinking about the user interaction aspect of the problem. It
doesn’t make sense to start with the back-end design until you know what
the front-end is supposed to do, right? So I think about what operations
the user needs to perform, then what inputs are needed to handle them.
From there I can work out how many of those inputs should be stored for
re-use.

I like to draw diagrams, since I find they are easier to re-assimilate
when I come back to a problem after some time. So I may sketch out a few
UI screens, or draw a few boxes and arrows to understand the
relationships between objects (I use a sort of pidgin UML for that). I
also make lists of attributes I might need for classes, since those map
to the database schema.

There are plenty of good tools for making such sketches on the
computer, but I guess I’m Old School. I find that sitting down with a
pen and paper, away from the computer, helps clarify my thoughts. Since
I don’t have my text editor, the temptation to write code is reduced and
I can concentrate on the big picture. And once I have the big picture
worked out, the way forward is usually clear.