Switching blogging platforms. Again.
Over the past week or so I have converted all of my blog content to reStructuredText and replaced the WordPress instance I was using with static files. It took me a little while to find the right combination of tools, and I finally settled on using Tinkerer and Sphinx with Python 3.
When I started blogging in 2006, I chose blogger.com to host my site. I didn’t want to host my own server and manage the software, and Blogger was run by Google so I expected it to be around and usable forever. I also had a separate website where I posted some of the longer articles, such as my Python Magazine columns and feature articles, book reviews, and other items that used a lot of source code for examples. I managed that content with Sphinx, and posted references to the articles on the blog so they would go out through the RSS feed.
I eventually grew dissatisfied with the web-based editor provided by Blogger, but found MarsEdit. When I found the light and started writing posts in reStructuredText instead of HTML, I created rst2marsedit so I could keep using the desktop client to preview and publish my posts. And then when I was working as the Communications Director for the PSF I created rst2blogger to make that work easier for myself and the rest of the team, many of whom didn’t have MarsEdit.
During all that time, Blogger happily served my content and provided backing for the RSS feeds that are the heartbeat of a blog. At some point, though, I noticed that the blog wasn’t looking very good on mobile devices. I don’t even remember the specific details, but the options for managing and editing the theme at the time made me decide that continuing with Blogger and supporting mobile content with nicely formatted source code embedded in blog posts was going to be a hassle.
I looked around at static content generating tools at that point, but didn’t find anything I really liked. I had the idea that I really needed something that supported scheduled posts, which would let me write content on the weekend and publish it on Monday morning. I had used this feature of Blogger for a long time with the Python Module of the Week posts, and had come to rely on it as part of my workflow. None of the static site generators supported that, of course, because they all just write HTML files. The best answer I found was to write cron jobs to deploy new content at scheduled times.
Moving to WordPress
About 2 years ago, I changed jobs to DreamHost, and one of our primary services is hosting WordPress-based websites. It made sense to me to give WordPress a try, and it seemed to meet all of my needs – a fairly nice default theme that supported desktops and mobile devices, scheduled posts, and per-tag RSS feeds for sending to aggregation sites. I imported my Blogger content into WordPress, updated the domain settings to point to the new server, and kept blogging. After the initial work to set up the theme, nothing was really that different. I still used rst2marsedit to post, so my day-to-day interface was exactly the same. After a few minor customizations for source code listings, the default worked well.
I downloaded the WordPress apps for my phone and tablet, in case I wanted to blog at a conference. That turned out to be pointless, though, because when I do write, I tend to write long-form posts, and wasn’t comfortable writing that much on either mobile device. The apps didn’t cause any problems, but they weren’t particularly useful for me.
A month or two after I set up the new site, I received an automated notice that my VPS had been rebooted because it was using too many resources. I’m used to working with unmanaged cloud servers, but hadn’t had this experience with a traditional managed hosting service before. Basically, because the WordPress service was using up too much CPU, it either crashed the VPS or the VPS was restarted to terminate the process. I increased the size of the server a couple of times before things stabilized.
I want to write, not run blogging software.
I think the problem had to do with some search engine spiders hitting the site all at the same time, but I’m not certain. It is entirely likely that if I spent the time to figure out what was causing the problem, I could have added caching or tuned some configuration settings to make the site behave better. But I really didn’t want to have to figure all of that out. Some people enjoy tweaking and tuning and fiddling with services constantly, but that’s not for me. I want to write, not run blogging software.
For a while I ran a larger VPS to handle the spikes in traffic and just lived with the situation. I had other things on my mind, other projects, and it was working well enough. But then some update or other broke my custom style sheet, so all of the content on the site looked terrible – it’s very difficult to make sense of unfamiliar Python code when the indentation has been stripped out. Figuring out what caused that, how to fix it, and prevent a recurrence, was going to be a lot of hassle – the same thing that pushed me off of Blogger in the first place.
So a few weeks ago I started looking around at static site building tools again. I had a few basic requirements. I wanted something written in Python, in case I needed to extend it. I wanted to write content in reStructuredText, since I am comfortable with it and can extend it if needed. And I need tag-specific RSS feeds, for aggregation on Planet Python and Planet OpenStack. I stopped worrying about scheduled posts. Although traffic patterns for my site trend down on the weekend and up during the week, I don’t plan to let that control my publishing schedule any more. For aspects like supporting mobile devices, I planned to find or customize a theme.
When I asked for suggestions on Twitter, the most popular response was Pelican, so that’s where I started. The documentation is clear and extensive. I was able to set up a new blog instance on my laptop fairly quickly. There is a tool for converting a WordPress blog archive file to reStructuredText files to import into the new blog. The results still needed a fair amount of cleanup, but after converting it from the other two blogging systems that wasn’t a surprise.
There were a few hiccups, though. According to the Pelican docs, activating syntax highlighting requires using a
code-block tag, and it wasn’t clear whether regular literal blocks would work. Since the converter created plain literal blocks (marked with
::), that would mean a lot of hand-editing to add syntax highlighting back. I had trouble with some of the tags in the articles for which I had reST source files, both blog entries and magazine columns. I had a few
cssclass directives, for example, and there were a few others. I could update all of them, which would be annoying but not difficult, but I wasn’t ready to stop looking and make that commitment yet.
Julien Danjou recommended Hyde, so I looked at that next. I wasn’t able to find good documentation, and based on the output of the quick start it looked like I would be writing in a combination of markdown and HTML templates. That didn’t seem like the right direction for me, especially since I already had a lot of reST content that I wanted to import.
Fiddling with Tinkerer
I have extensive experience with Sphinx, so what I really wanted was a tool that would let me use that knowledge and the tools I have already built and add the pieces that are missing to make a blog. I knew there was an extension in the sphinx-contrib repository for creating an RSS feed, but I needed separate feeds for different tags or categories. Jeff Forcier sent me a link to Tinkerer, and after reading the docs I knew I would be able to make it do what I wanted.
The Pelican exporter had left all of the articles in one directory, giving each a unique file name. Tinkerer wanted the input files organized in a directory structure by year/month/day, so I needed to move the files around. I wrote a simple bash script to iterate over all of the files, extract the publication date from the metadata Pelican had written inside, and then copy the file into the right directory structure for Tinkerer.
tinker command line tool also manages the list of items in the master table of contents, used to control the order of articles shown on the site and in the RSS feed. Because I was importing my articles from scratch, I had an empty file, but I was able to create the initial list with
sed. At that point I was able to run
tinker --build with the default theme, and work on cleaning up the results.
Importing the Content
I edited the source files by hand and with Unix tools like
sed to correct formatting errors from the content exported from WordPress. For some of the longer articles, I was glad to be able to replace an exported file with the original source file from the old version of the site, but a lot of the changes were the same (removing certain directives and metadata added by the exporter) so plain text files and a few standard Unix tools once again proved their utility.
After I cleaned up all of the files enough that the build worked, I made another pass to adjust the formatting to be more consistent and to remove artifacts left by the importer that didn’t break the build (many many raw HTML blocks). Then, I installed sphinxcontrib-spelling and fixed all of the errors it reported. Finally, I replaced the contents of the entries related to Python Module of the Week with references to the appropriate pages on PyMOTW.com, so I only have one copy to keep up to date. It took a few mornings, but I finally had a clean set of around 500 source files.
The Blogger and WordPress configurations I had been using included the year and month in the URL to a post, but not the day. Tinkerer, and especially the RSS feed generator plugin for Tinkerer, want the URLs to include the day of the month as well. I made a list of all of the HTML files and passed it through
sed to generate the Apache redirect rules to put in a
.htaccess file at the root of the site.
At this point I had all of the existing content building, so the next step was to make the output look the way I wanted by updating the theme.
Creating a Theme
I experimented with a few of the standard themes, but decided that none quite fit what I wanted. Everyone has a theme based on Bootstrap, so I didn’t want to use one of those. I did want something that would work scale to different sizes of screens, though, so I started with the boilerplate theme from Tinkerer, but modified it heavily using the Pure CSS tools from Yahoo to provide the layout I wanted.
Next, I spent some time looking for color schemes on colrd.com, until I found some that I liked. I learned about font-awesome (used in Bootstrap) and the WebSymbolsRegular font included with Tinkerer’s themes – both useful for including icons in responsive designs. I’m no CSS expert, but with this combination of tools I was able to create a theme that I liked, that looked good with the content I have and expect to add, and that worked on mobile devices. Not everything was perfect, though.
There were a few aspects of Tinkerer’s default behaviors that I didn’t like. First, extensions aren’t really installed so much as “vendored” into your site’s code tree. That means I won’t have problems if new version of an extension I use is released with a change that isn’t backwards-compatible, but it also means if there are bug fixes I will have to handle the update myself without tools like pip.
By default Tinkerer disables the per-heading permalink feature of Sphinx, and I wanted that left on. Enabling it introduced ugly links to my RSS feed (probably why it is turned off in the first place), so I had to make some changes to the RSS feed generation code. I made some other changes to allow me to limit the number of entries in the feeds, to save build time and to save consumers of the content from downloading the entire site over and over. I will be contributing those changes upstream, soon.
The documentation for the extensions in the tinkerer-contrib repository is … sparse. Most of the extensions have a README file, but if I did not understand Sphinx I’m not sure I would have known what to do with most of them. I was able to make the tag-specific RSS feed extension work, so that took care of one of my important requirements.
My site built cleanly with Python 3 the first time I tried.
I still use Python 2 by default for a lot of things, but I am trying to make more of an effort to consider Python 3 as well. I have several libraries that work with both, now, and we are continuing the slow process of porting libraries used in OpenStack as well. I decided to go ahead and set up this blog to run with Python 3, so after I had everything working correctly with Python 2, I built a new virtualenv using Python 3.3 and installed all of the dependencies. My site built cleanly with Python 3 the first time I tried. The only problems were with the old Google sitemap generator script I have been using for years with PyMOTW.com. I will run that script with Python 2 until I have time to convert it.
I used to use a Mercurial, and then Git, repository with a hook script to update the old version of my static content. For PyMOTW.com I simply rsync the content, although I do check the built version of that site into version control (to make it easier for me to discover changes in the output format). To keep things simple for now I am running
rsync from the same Makefile I use to control the rest of the commands to build the site.
There is no clear winner among the available tools for every situation. I chose something based on Sphinx because I am comfortable with how that rendering system already works. For someone without that experience, Pelican may be a better option. For people who don’t like reStructuredText, hyde or mynt may be more appealing – especially for a new site, without existing content to be imported. But I’m happy with Tinkerer, and I’m confident that I can smooth out any rough spots if I need to.