How NOT to Backup a Blogger Blog

Over at the Google Operating System blog, they offer a way to “backup” your blog. It is mostly a manual hack to load the entire blog into one page in a web browser, then save the resulting HTML, though a similar technique is offered for saving the contents of your XML feed.

There are a few problems with this technique:

  • It depends on knowing how many posts are in the blog, up front.
  • The steps and tools given are manual.
  • Comments are handled separately.

A backup needs to be automated. If I have to remember to do something by hand, it isn’t going to be done on a regular basis. I want to add to my blog without worrying about how many posts there are and tweaking some backup procedure that depends on knowing all about the content of the blog up front. I want comments saved automatically along with each post, not in one big lump. And if I need to import the data into a database, I want the backup format to support parsing the data easily.

What to do?

Enter BlogBackup, the unimaginatively named, fully automatic, backup software for your blog. Just point the command line tool at your blog feed and a directory where the backup output should go. It will automatically perform a full backup, including:

  • Every blog post is saved to a separate file in an easily parsable format, including all of the meta-data provided by the feed (categories, tags, publish dates, author, etc.).
  • Comments are saved in separate directories, organized around the post with which they are associated. Comments also include all of their meta-data.
  • The content of blog posts and comments are copied to a separate text file for easy indexing by desktop search tools such as Spotlight.

Since the tool is a command line program, it is easy to automate with cron or a similar scheduling tool. Since it is fully automatic and reads the feed itself, you do not need to reconfigure it as your blog grows. And the data is stored in a format which makes it easy to parse to load into another database of some sort.

So, go forth and automate.