Moving a blog from Wordpress to Contao 3.5

Posted by Howard Richardson (comments: 2)

Occasionally I get called upon to rescue failing blogs. Wordpress is decent enough as a blogging platform, but it suffers from bitrot a lot more than Contao, so given a decent development budget I might choose to move a site across to Contao instead if the project is suitable. For this one I had an established blog with about 150 articles in five different categories that I needed moving across to a new Contao 3.5 installation.

Each of the categories on the old blog would be turned into a separate News archive on Contao. In addition I would try and implement tags using the common Contao tags module.

Firstly after quite a bit of searching I found this RSS import module for Contao:
https://github.com/fipps/contao_xRssImport3

It took a little while to get working because when you unzipped the extension to the system/modules directory, you actually have to rename it to get it to work correctly. The correct directory name should be "xRssImport3".

Once you have the extension working it adds an option under each news archive to import and sync all the news there with an external RSS feed.

Going back to the Wordpress blog, I used the built-in export to export each category individually to separate XML files in RSS format. I renamed these category_X.rss etc and uploaded them to a temp directory on my Contao server. These would serve as my "external" input feed for the importer. I used the address of the new Contao blog and then pointed the import extension to each RSS dump file individually. There were a few options to consider.

You get nothing but empty plain text if you don't specify which HTML tags you want to allow, so I came up with a full list of all the ones I wanted that looked like this:
<p><div><strong><b><i><em><span><a><img><h1><h2><h3><h4><h5><h6><ul><ol><li><blockquote><figure><figcaption><br><object><pre><small><video><sup><sub><u><picture><section>

Also I wanted to carry the tags over from Wordpress into the Contao tag extension, so I used the option for "Field for Subtitles" and set it to "category". This would then take all the Wordpress categories and tags and serialise them into a comma-separated list which would become the sub-headline in Contao, a field I wasn't using.

One final setting, change Redirect Target to default, and I was ready to run. You click save just once and the server will go and grab all those articles and process them immediately. Going back into the news archive afterwards I found everything in place where I wanted it.

Once you've done the import once for an archive you can uncheck the Import option on the archive, so that it doesn't keep trying to update itself daily from a static RSS file.

The last step was to get the tags from the subtitle field into the tags field. Although these look similar on the front-end, in the database they're actually broken down into individually numbered tag records, so a simple "copy field across" command in SQL wouldn't work.

In the end I fixed it in a semi-automated way. Using the "edit multiple" option in Contao, I called up the subtitle and tag fields of each story all on one page together. Then using a repeating keyboard macro, I just went through each record one by one doing a "select all, copy (subtitle), tab back one field, paste (into tag field)" one-key command. It was a little slower than ideal, but still in 10-15 minutes I'd got through all 150 articles. Obviously if I'd had thousands of articles it'd be a different matter, but for this small number it seemed a decent enough solution. Sometimes you have to know when to kludge it!

Likewise a quick "edit multiple" afterwards allowed me to wipe the subtitle field clean using "override", so the tags wouldn't still be there.

And that's all there was to moving my articles over. The article thumbnail images didn't come through, but I didn't investigate that as I was planning to redo these by hand anyway. The import extension does have an image import option built into it, so perhaps the article images might not have made it into the RSS feed originally on export, or maybe something else went wrong along the data-chain. Anyway, this wasn't an issue for me so I didn't spend much time investigating it, though I appreciate this may not be good enough for everyone.

I was so impressed that this large data migration that would have been a total pain to do by hand in the end worked out so easy to do. Dates and titles of articles carried across perfectly. I assigned all the articles to one author on Contao, because they were all from one author originally on the Wordpress blog. Maybe there'd be a bit more manual fixing there if you have multiple authors. At any rate, as proof of concept this worked amazingly, and I'm putting it down here in brief fashion so others can benefit from this easy migration idea.

Go back

Add a comment

Comment by Bjoern |

Hey

great article. Exactly what I was searching for. I just wanted to ask, if you was able to import the comments for the articles mith that method, too.

Bjoern

Comment by Howard |

Hi Bjoern, I don't think the comments carried over. It wasn't a problem for me because there wasn't any on the old blog I was working on. If you wanted this, I guess you would have to write a script that would import them separately, but that would almost be as much work as just writing a whole import routine.