Migrating blog posts to Jekyll

Dani Rodríguez • 11 Sep 2016 • Permalink

I’ve been recently working on recovering posts from old blogs that were stored in my local backup, where they have been rotting for a lot of years. Those blog posts were made in WordPress and Blogger, and therefore had to be migrated into Jekyll.

The experience has not been as bad as I thought it would be. Of course, when I talk about not as bad I’m actually refering to the process of checking the HTML is valid, rewriting HTML, or sometimes Markdown. During this time I have had to read some nonsensical blog posts that probably made sense to my 15-year- old myself, but that totally doesn’t make any sense today. Painful.

However, going back to the import process. There are migration tools for Jekyll. They are here. A shame I discovered them later, specially after moving the Blogger posts, because those were the blog posts that were hardest to bring back. I have two complains about Blogger.

The HTML editor for Blogger is so quirky and horrible. It doesn’t produce valid HTML. Instead, it expects the Blogspot engine to fix over all the mistakes that it makes. For instance, the default behaviour is to not produce <p>paragraphs</p>. Instead, it converts every line break into a <br> tag. That’s not how you are supposed to write HTML. Plus, it often adds hidden HTML code such as classes, or styles or things like that.
Google storage for Blogspot images is a total nightmare. They changed the URLs so many times and they moved so bad their backends from Picasa to Google+ to Google Photos that many URLs used in older posts are gone and those images are lost. Which is a shame.

A web browser pointing to an URL hosted in Google servers that should contain a picture, but instead it shows a warning icon. — Oh, god, no

The blog posts coming from WordPress were much better. Although, they were a lot and the WordPress editor also likes (or used to like, since the HTML code for some blog posts was not written in this decade) to insert hidden HTML code, that was awful to remove. That has taken me a lot of time because of all the effort required to go through all those blog posts, rewriting things, adding the YAML front matter, and that.

For the record; yes, this time I used the Jekyll Importer. I would have gone nuts if I had to do this totally by hand. The process is nice, I can pull the XML export files that you get from WordPress, and even some RSS feed snapshots (XML files too) and it can extract most of the content.

I recovered up to 150 blog posts. I don’t expect to have recovered all of them. I’d probably have deleted a lot of them without having them in my local backups. The Wayback Machine from the Internet Archive has helped me a lot, containing some snapshots of my old blogs, yet not all pages were archived, so some of the content is totally lost. Not a big deal, though.

Not all the content has been made available. I have just cherry picked things that I considered relevant enough to put back online. Most of the content were outdated news and dead links, it would be a waste of space.