How NOT to Backup a Blogger Blog

Over at the Google Operating System blog, they offer a way to
“backup” your blog
. It is mostly a manual hack to load the entire blog
into one page in a web browser, then save the resulting HTML, though a
similar technique is offered for saving the contents of your XML feed.

There are a few problems with this technique:

  1. It depends on knowing how many posts are in the blog, up front.
  2. The steps and tools given are manual.
  3. Comments are handled separately.

A backup needs to be automated. If I have to remember to do something
by hand, it isn’t going to be done on a regular basis. I want to add to
my blog without worrying about how many posts there are and tweaking
some backup procedure that depends on knowing all about the content of
the blog up front. I want comments saved automatically along with each
post, not in one big lump. And if I need to import the data into a
database, I want the backup format to support parsing the data easily.

What to do?

Enter BlogBackup, the unimaginatively named, fully automatic,
backup software for your blog. Just point the command line tool at your
blog feed and a directory where the backup output should go. It will
automatically perform a full backup, including:

  1. Every blog post is saved to a separate file in an easily parsable
    format, including all of the meta-data provided by the feed
    (categories, tags, publish dates, author, etc.).
  2. Comments are saved in separate directories, organized around the post
    with which they are associated. Comments also include all of their
    meta-data.
  3. The content of blog posts and comments are copied to a separate text
    file for easy indexing by desktop search tools such as Spotlight.

Since the tool is a command line program, it is easy to automate with
cron or a similar scheduling tool. Since it is fully automatic and reads
the feed itself, you do not need to reconfigure it as your blog grows.
And the data is stored in a format which makes it easy to parse to load
into another database of some sort.

So, go forth and automate.