dburrows/ blog/ entry/ HOWTO convert your Blogger/BlogSpot blog to ikiwiki

Until recently, my blog was hosted on Google's free Blogger service. When I decided to move to my own server and ikiwiki, I didn't want to lose all my old posts. But exporting your blog from Blogger is a bit tricky. In fact, the Blogger page that provides instructions on how to do so starts off by saying:

Blogger does not have an export or download function.

Regardless of what they say, you can cajole Blogger into exporting text for use in other blog systems. Instructions for moving from Blogger to various pieces of software are available out there on the Web, but to switch to ikiwiki I had to tweak them a bit. Here's the procedure I used:

  1. Extract your information from Blogger's clutches and add minimal IkiWiki tags.

    As in the official Blogger instructions, change your template temporarily. However, instead of using the page template they suggest, use the following template, which you can find as a text file here.

    <Blogger>
    [[meta author="<$BlogItemAuthor$>"]]
    [[meta date="<$BlogItemDateTime$>"]]
    [[meta title="<$BlogItemSubject$>"]]
    
    <$BlogItemBody$>
    </Blogger>
    
    <BlogItemComments>
    Comment by <$BlogCommentAuthor$> at <$BlogCommentDateTime$>:
    <$BlogCommentBody$>
    </BlogItemComments>
    

    I recommend saving the file that's produced by this step even after you process it in the next step, in case you discover that you made a mistake.

  2. Split the extracted file into pieces.

    Exporting your blog using the template above will get you all your blog posts in reasonable Markdown format, with all dates and titles preserved. But they'll all be scrunched into one file. Blogspot also won't interpret paragraph breaks into blank lines, and it inserts strange empty <div> tags all over. So the next step is to split this file into pieces and clean up the HTML. Since I had over a hundred posts, I wrote a Python script to do this postprocessing for me. You can find it here. You can run it like this to output each blog post as a separate file in the current directory:

    /path/to/fixup-blogspot-export input-file
    

    This script makes a few assumptions about its input file:

    1. It assumes that the text [[meta author="NAME']] does not appear at the start of a line anywhere except the first line of a blog post. If you write about MarkDown syntax on your blog, you will run into problems.
    2. It makes a similar assumption about Ikiwiki's date meta.
    3. It assumes that line breaks are expressed using a variant of <br/>, and that <br/> never appears in the text of your blog.
    4. It assumes that the text Comment by does not appear at the start of a line in any post.
    5. It assumes, in general, that no Markdown or Ikiwiki commands appear in the blog text. If any do, you'll have to manually escape them, either before or after running the conversion script.

    All in all, it's probably a good idea to at least skim the output once.

  3. Create a top-level Ikiwiki page for your blog.

    The Ikiwiki site has instructions on doing this that are probably better than I could write. The most important thing is to use the created-after tag to limit your RSS feeds, so you don't spew new posts all over the planet (pun intended). For instance:

     [[inline pages="./blog/entry/*"
              feedpages="created_after(last_page_of_the_old_blog)"]]