Cleaning up UTF-8 character entities when exporting from WordPress to Jekyll

I’ve been experimenting with converting this blog to Jekyll or another static blog generator. I’m sticking with Jekyll at the moment due to its ease of use and its plugin environment. The main idea behind this is to reduce the resource consumption and hopefully also speed up the delivery of the blog. In fact, there is a static version of the blog available right now, even though it’s kinda pre-alpha and not always up to date. The Jekyll version also doesn’t have the comments set up yet nor does it have a theme I like, so it’s still very much work in slow progress.

To export the contents from WordPress to Jekyll I use the surprisingly named WordPress to Jekyll exporter plugin. This plugin dumps the whole WordPress data including pictures into a zip file in a format that is mostly markdown grokked by Jekyll. It doesn’t convert all the links to markdown, so the generated files need some manual cleanup. One problem I keep running into is that the exporter dumps out certain UTF-8 character entities as their numerical code. Unfortunately when processing the data with Jekyll afterwards, those UTF-8 entities get turned into strings that are displayed as is. Please note I’m not complaining about this functionality, I’d rather have this information preserved so I can rework it later on. So I wrote a script to help with this task.

Getting WordPress to work on FreeBSD 10 with PHP 7.0 and Jetpack

This blog is self-hosted, together with some other services on a FreeBSD virtual server over at RootBSD. Yes, I’m one of those weirdos who hosts their own servers – even if they’re virtual – instead of just using free or buying services.

I recently had to migrate from the old server instance I’ve been using since 2010 to a new, shiny FreeBSD 10 server. That prompted a review of various packages I use via the FreeBSD ports collection and most importantly, resulted in a decision to upgrade from PHP 5.6 to PHP 7.0 “while we’re in there”.

Scheduling WordPress posts with org2blog

Another metablogging post, but this may come in handy for people who like to produce blog posts in bulk and schedule them for publication in WordPress at a later date.

In my case, my ability to find time to blog is directly correlated to my workload in my day job. That’s why you see regular gaps in my posting that may last for a few weeks to a month or two.

To counteract this, I try to write multiple blog posts in one sitting when I’ve got both the time and am inspired to write, then schedule them such that WordPress pushes them out automatically over the next few days or weeks. My normal workflow for this was:

  • Write post in org2blog
  • Publish post to WordPress, adjust the publication date
  • Edit the post in org2blog again, push in and then remember to tweak the publication date because org2blog overwrote the publication date

The last two steps of course are unnecessary. See the #+DATE: line in the first line of the screenshot?

org2blog highlighted date entry
Date section in an org2blog post during editing

When you create a new blog post using org2blog/wp-new-entry, just edit the date that org2blog automatically inserts to your desired post schedule and the setting will carry over into WordPress. Easy if you know how.