Development Seed Blog

Migration Tricks and Challenges

What's your approach?

What's your approach?

This Wednesday we finished a migration from an old site running on OpenACS to a relatively complex Drupal site with organic groups, events, imagecache, attachments, books, and the whole spiel.

Having gone over budget on migrations before, this time we wanted to take a more systematic approach and make sure that we at least learned something if we wind up eating hours.

This is the approach we came up with:

1) Analyze, analyze, analyze

2) Write reusable code :)

3) Share our experience

1) Analyze, analyze, analyze

A problem in several previous migrations was that the real time hogs appeared late on the scene. Problematic data webs passed under our radar when estimating projects but took all our attention when the days got late. Another issue we had run into was that seemingly simple tasks turned out to be quite twisted and time consuming.

For this migration we did an exhaustive analysis of the legacy system and, most importantly, we did a detailed breakdown of tasks with hour estimates next to them. This allows us now to compare estimates with the actual work and draw conclusions that will improve our future estimates.    

2) Write reusable code

During past projects, we aggregated a good stock of code pieces for migrations, but they weren't reusable out-of-the-box. For every new migration project, functions needed to be copied and reconfigured. This meant that every time we created a couple new bugs during the reconfiguration and lost some of the hard earned experience with the quirks of programmatically creating content in Drupal.

With this migration, as much as possible, we wanted to  write project independent, reusable functionality that would cast solved problems into code for future migrations.

We whipped up a migration framework that does four things:

  • defines a simple call back based API for implementing chunks of migrations
  • handles backups and roll backs of migrations 
  • provides an organized place for collecting reusable create and helper functions
  • offers a legacy -> new id mapper 

We call this framework MTK - Migration Toolkit - and here is a screenshot of it: 

It's somewhat similar to Migrator module. The reason we went with our own concoction is simply because we had so much to learn and discover as we went. If you're interested, check out the code for MTK in my sandbox.

3) Share your experience

Well, this is what I'm doing right here :)  Have you been bitten by the migration time suck bug before? Have you found a remedy against it? I'm curious to know what your tricks to avoid it are. 

Now as we've put this migration behind ourselves, I'd say that the above approach made the task less daunting and more rewarding since we didn't just kill a nasty beast but also improved our toolkit in a sustainable way. Of course we had some hard nuts to crack, but as a result we've got pretty decent nut crackers. We learned.

One thing we couldn't get past though was digging through the legacy database and doing the popular what-is-what game. In our office, Tim Cullen has coined the telling term 'data archeology' for this. Data archeology will stay with us for a while. The legacy site we migrated from was built in 2004. I wonder if the person that migrates from the 'new' site in 2012 will have a much better handle on migration than we have right now.

Comments
Cool

Good job =) Keep it cool.

Tools and Procedures

Great article. Tools and procedures are definitely paramount from a certain size and scope of projects. Standardized and as far as possible automated procedures would save lots of headache and overtime. There are several discussion going on re migration and staging using Drupal and I´m looking forward to the Drupalcon to see what´s the latest ideas/approaches on that matter.

Let's get in touch.

We should have a BoF meeting on migration and deployment.

Anyone have any good "How I

Anyone have any good "How I slew the mojibake?" stories?

A recent client had a lot of arabic content to move over, generated from the arabic version of Microsoft Word, and we were using our XSLT transform to process the MS word exported HTML to get it into a Drupal-body friendly format.

Our resident indie hacker Tom MacWright nailed it I believe with a combination of regex + encoding conversion, but I know it was a headache not only to deal with the encoding issues but the various versions of Word the client has used over the years...

Amen!

1. In solving the 'what-is-what' problem, it's quite important to work with the client to find stable definitions for the legacy 'what.'
In a previous life, I worked in the grocery industry building backend data warehousing tools. Migrating old data for our customers was usually a months-long affair, for this very reason. Multiple legacy systems with slightly different overlapping data sets -- that one was always fun.

On older sites -- especially ones where loosey-goosey "Past HTML here" data schemas were in use, it's really common for things to drift over time, and accounting for those shifts can consume a lot of time.

Unpacking "data archaeology"

Let me just add a few process points to what Alex said above about data archaeology.

1. In solving the 'what-is-what' problem, it's quite important to work with the client to find stable definitions for the legacy 'what.'

2. Relatedly, it's also very helpful to use pages on the legacy site that list content (the legacy site's analogue to Drupal views) as a means of determining which sets of content are to migrate. These pages often yield queries in their source code which can be easily cannibalized for migration.

3. If you are lucky enough to have access to legacy documentation, we suggest the Доверяй, но проверяй approach: trust, but verify. This was because we often found code which contained hacks that produced functionality which differed from the documentation.

Post new comment
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <img> <p> <li> <ul> <ol>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options