Switching blogging platforms. Again.

Over the past week or so I have converted all of my blog content to
reStructuredText and replaced the WordPress instance I was using
with static files. It took me a little while to find the right
combination of tools, and I finally settled on using Tinkerer and
Sphinx with Python 3.

When I started blogging in 2006, I chose blogger.com to host my
site. I didn’t want to host my own server and manage the software, and
Blogger was run by Google so I expected it to be around and usable
forever. I also had a separate website where I posted some of the
longer articles, such as my Python Magazine columns and feature articles, book reviews, and other items
that used a lot of source code for examples. I managed that content
with Sphinx, and posted references to the articles on the blog so
they would go out through the RSS feed.

I eventually grew dissatisfied with the web-based editor provided by
Blogger, but found MarsEdit. When I found the light and started
writing posts in reStructuredText instead of HTML, I created
rst2marsedit so I could keep using the desktop client to preview and
publish my posts. And then when I was working as the Communications
Director for the PSF I created rst2blogger to make that work easier
for myself and the rest of the team, many of whom didn’t have
MarsEdit.

During all that time, Blogger happily served my content and provided
backing for the RSS feeds that are the heartbeat of a blog. At some
point, though, I noticed that the blog wasn’t looking very good on
mobile devices. I don’t even remember the specific details, but the
options for managing and editing the theme at the time made me decide
that continuing with Blogger and supporting mobile content with nicely
formatted source code embedded in blog posts was going to be a
hassle.

I looked around at static content generating tools at that point, but
didn’t find anything I really liked. I had the idea that I really
needed something that supported scheduled posts, which would let me
write content on the weekend and publish it on Monday morning. I had
used this feature of Blogger for a long time with the Python Module of
the Week posts, and had come to rely on it as part of my
workflow. None of the static site generators supported that, of
course, because they all just write HTML files. The best answer I
found was to write cron jobs to deploy new content at scheduled times.

Moving to WordPress

About 2 years ago, I changed jobs to DreamHost, and one of our
primary services is hosting WordPress-based websites. It made sense to
me to give WordPress a try, and it seemed to meet all of my needs – a
fairly nice default theme that supported desktops and mobile devices,
scheduled posts, and per-tag RSS feeds for sending to aggregation
sites. I imported my Blogger content into WordPress, updated the
domain settings to point to the new server, and kept blogging. After
the initial work to set up the theme, nothing was really that
different. I still used rst2marsedit to post, so my day-to-day
interface was exactly the same. After a few minor customizations for
source code listings, the default worked well.

I downloaded the WordPress apps for my phone and tablet, in case I
wanted to blog at a conference. That turned out to be pointless,
though, because when I do write, I tend to write long-form posts, and
wasn’t comfortable writing that much on either mobile device. The apps
didn’t cause any problems, but they weren’t particularly useful for
me.

Trouble Begins

A month or two after I set up the new site, I received an automated
notice that my VPS had been rebooted because it was using too many
resources. I’m used to working with unmanaged cloud servers, but
hadn’t had this experience with a traditional managed hosting service
before. Basically, because the WordPress service was using up too much
CPU, it either crashed the VPS or the VPS was restarted to terminate
the process. I increased the size of the server a couple of times
before things stabilized.

I want to write, not run blogging software.

I think the problem had to do with some search engine spiders hitting
the site all at the same time, but I’m not certain. It is entirely
likely that if I spent the time to figure out what was causing the
problem, I could have added caching or tuned some configuration
settings to make the site behave better. But I really didn’t want to
have to figure all of that out. Some people enjoy tweaking and tuning
and fiddling with services constantly, but that’s not for me. I want
to write, not run blogging software.

For a while I ran a larger VPS to handle the spikes in traffic and
just lived with the situation. I had other things on my mind, other
projects, and it was working well enough. But then some update or
other broke my custom style sheet, so all of the content on the site
looked terrible – it’s very difficult to make sense of unfamiliar
Python code when the indentation has been stripped out. Figuring out
what caused that, how to fix it, and prevent a recurrence, was going
to be a lot of hassle – the same thing that pushed me off of Blogger
in the first place.

Other Options

So a few weeks ago I started looking around at static site building
tools again. I had a few basic requirements. I wanted something
written in Python, in case I needed to extend it. I wanted to write
content in reStructuredText, since I am comfortable with it and can
extend it if needed. And I need tag-specific RSS feeds, for
aggregation on Planet Python and Planet OpenStack. I stopped
worrying about scheduled posts. Although traffic patterns for my site
trend down on the weekend and up during the week, I don’t plan to let
that control my publishing schedule any more. For aspects like
supporting mobile devices, I planned to find or customize a theme.

When I asked for suggestions on Twitter, the most popular response was
Pelican, so that’s where I started. The documentation is clear and
extensive. I was able to set up a new blog instance on my laptop
fairly quickly. There is a tool for converting a WordPress blog
archive file to reStructuredText files to import into the new
blog. The results still needed a fair amount of cleanup, but after
converting it from the other two blogging systems that wasn’t a
surprise.

There were a few hiccups, though. According to the Pelican docs,
activating syntax highlighting requires using a code-block tag,
and it wasn’t clear whether regular literal blocks would work. Since
the converter created plain literal blocks (marked with ::), that
would mean a lot of hand-editing to add syntax highlighting back. I
had trouble with some of the tags in the articles for which I had reST
source files, both blog entries and magazine columns. I had a few
cssclass directives, for example, and there were a few others. I
could update all of them, which would be annoying but not difficult,
but I wasn’t ready to stop looking and make that commitment yet.

Julien Danjou recommended Hyde, so I looked at that next. I wasn’t
able to find good documentation, and based on the output of the quick
start it looked like I would be writing in a combination of markdown
and HTML templates. That didn’t seem like the right direction for me,
especially since I already had a lot of reST content that I wanted to
import.

Fiddling with Tinkerer

I have extensive experience with Sphinx, so what I really wanted was
a tool that would let me use that knowledge and the tools I have
already built and add the pieces that are missing to make a blog. I
knew there was an extension in the sphinx-contrib repository for
creating an RSS feed, but I needed separate feeds for different tags
or categories. Jeff Forcier sent me a link to Tinkerer, and after
reading the docs I knew I would be able to make it do what I wanted.

The Pelican exporter had left all of the articles in one directory,
giving each a unique file name. Tinkerer wanted the input files
organized in a directory structure by year/month/day, so I needed to
move the files around. I wrote a simple bash script to iterate over
all of the files, extract the publication date from the metadata
Pelican had written inside, and then copy the file into the right
directory structure for Tinkerer.

Tinkerer’s tinker command line tool also manages the list of items
in the master table of contents, used to control the order of articles
shown on the site and in the RSS feed. Because I was importing my
articles from scratch, I had an empty file, but I was able to create
the initial list with find and sed. At that point I was able
to run tinker –build with the default theme, and work on cleaning
up the results.

Importing the Content

I edited the source files by hand and with Unix tools like sed to
correct formatting errors from the content exported from
WordPress. For some of the longer articles, I was glad to be able to
replace an exported file with the original source file from the old
version of the site, but a lot of the changes were the same (removing
certain directives and metadata added by the exporter) so plain text
files and a few standard Unix tools once again proved their utility.

After I cleaned up all of the files enough that the build worked, I
made another pass to adjust the formatting to be more consistent and
to remove artifacts left by the importer that didn’t break the build
(many many raw HTML blocks). Then, I installed
sphinxcontrib-spelling and fixed all of the errors it
reported. Finally, I replaced the contents of the entries related to
Python Module of the Week with references to the appropriate pages on
PyMOTW.com, so I only have one copy to keep up to date. It took a
few mornings, but I finally had a clean set of around 500 source
files.

The Blogger and WordPress configurations I had been using included the
year and month in the URL to a post, but not the day. Tinkerer, and
especially the RSS feed generator plugin for Tinkerer, want the URLs
to include the day of the month as well. I made a list of all of the
HTML files and passed it through sed to generate the Apache
redirect rules to put in a .htaccess file at the root of the site.

At this point I had all of the existing content building, so the next
step was to make the output look the way I wanted by updating the
theme.

Creating a Theme

I experimented with a few of the standard themes, but decided that
none quite fit what I wanted. Everyone has a theme based on Bootstrap,
so I didn’t want to use one of those. I did want something that would
work scale to different sizes of screens, though, so I started with
the boilerplate theme from Tinkerer, but modified it heavily using the
Pure CSS tools from Yahoo to provide the layout I wanted.

Next, I spent some time looking for color schemes on colrd.com,
until I found some that I liked. I learned about font-awesome (used
in Bootstrap) and the WebSymbolsRegular font included with
Tinkerer’s themes – both useful for including icons in responsive
designs. I’m no CSS expert, but with this combination of tools I was
able to create a theme that I liked, that looked good with the content
I have and expect to add, and that worked on mobile devices. Not
everything was perfect, though.

Negatives

There were a few aspects of Tinkerer’s default behaviors that I didn’t
like. First, extensions aren’t really installed so much as “vendored”
into your site’s code tree. That means I won’t have problems if new
version of an extension I use is released with a change that isn’t
backwards-compatible, but it also means if there are bug fixes I will
have to handle the update myself without tools like pip.

By default Tinkerer disables the per-heading permalink feature of
Sphinx, and I wanted that left on. Enabling it introduced ugly links
to my RSS feed (probably why it is turned off in the first place), so
I had to make some changes to the RSS feed generation code. I made
some other changes to allow me to limit the number of entries in the
feeds, to save build time and to save consumers of the content from
downloading the entire site over and over. I will be contributing
those changes upstream, soon.

The documentation for the extensions in the tinkerer-contrib
repository is … sparse. Most of the extensions have a README file,
but if I did not understand Sphinx I’m not sure I would have known
what to do with most of them. I was able to make the tag-specific RSS
feed extension work, so that took care of one of my important
requirements.

My site built cleanly with Python 3 the first time I tried.

Python 3.3

I still use Python 2 by default for a lot of things, but I am trying
to make more of an effort to consider Python 3 as well. I have several
libraries that work with both, now, and we are continuing the slow
process of porting libraries used in OpenStack as well. I decided to
go ahead and set up this blog to run with Python 3, so after I had
everything working correctly with Python 2, I built a new virtualenv
using Python 3.3 and installed all of the dependencies. My site built
cleanly with Python 3 the first time I tried. The only problems were
with the old Google sitemap generator script I have been using for
years with PyMOTW.com. I will run that script with Python 2 until I
have time to convert it.

Deploying

I used to use a Mercurial, and then Git, repository with a hook script
to update the old version of my static content. For PyMOTW.com I
simply rsync the content, although I do check the built version of
that site into version control (to make it easier for me to discover
changes in the output format). To keep things simple for now I am
running rsync from the same Makefile I use to control the rest of
the commands to build the site.

Conclusions

There is no clear winner among the available tools for every
situation. I chose something based on Sphinx because I am comfortable
with how that rendering system already works. For someone without that
experience, Pelican may be a better option. For people who don’t like
reStructuredText, hyde or mynt may be more appealing – especially for
a new site, without existing content to be imported. But I’m happy
with Tinkerer, and I’m confident that I can smooth out any rough spots
if I need to.