Development Seed Blog
Computer Aided Translation and Drupal
Linking Translation Tools and Drupal
Linking Translation Tools and Drupal
Gábor Hojtsy is a core Drupal committer, a co-maintainer of Drupal 6 (still to be released), and a lead developer of expanded multilingual support. This is his second time guest blogging for us about multilingual support for open source platforms. You can read his first post here.
Having a great website management system like Drupal that has built-in content translation tools is an achievement in itself. But content is not always born in Drupal, and it’s most certainly not translated in Drupal. This makes it necessary, particularly in the context of multilingual websites, for Drupal to support interfacing so it can link in with external translation tools and their translation workflows.
Computer Aided Translation (CAT) supports translators in reusing previously translated text for new works and archives their current work so it can be reused in the future. This makes it ideal for hooking into website management systems to help with content translations. Without going into great detail, the OASIS XML Localization Interchange File Format (XLIFF) allows the interoperability between tools by defining a markup format and interchange language for localizable data. It’s also a well known standard in the CAT industry.
Most big players in the CAT industry support XLIFF, but until recently Drupal lacked even a basic ability to integrate into these workflows. Over the past few months, I’ve been searching for ways to fix this. I found Bryan Schnabel's XLIFF Roundtrip Tool that handles HTML to/from XLIFF conversions and integrated nicely into a Drupal module. While my XLIFF Tools module is more of a proof-of-concept than an industry proved implementation (so far at least), I welcome everybody interested to take a look and test the module with different CAT tools.
The philosophy behind CAT-based workflows is to extract resources from native formats and put them into a common standard localization format that is easier to use when building tools. The Gettext format and system that are used for interface localization in Drupal are ideal for interfacing text used in the application source code. However, translating user-generated content requires different translation tools and a common, reusable translation memory database. Using XLIFF, the translated resources are merged back into their native format when the translation is complete, and the results are stored in a translation memory. Filters and specifications for converting to and from XLIFF have been developed for a number of file types.
There are two types of mapping methods to choose from: a "minimalist" approach and a "maximalist" approach, as referred to by XLIFF standards. Here is a look at the minimalist approach.
The major difference between the two is in how markup information is retained throughout the translation process. The minimalist approach requires a skeleton generated from the original document and only the translatable resources extracted to XLIFF (possibly with some inline markup). With the maximalist approach, however, all structural and inline markup is encoded in the XLIFF document, and no skeleton is used.
The way the process works is that the extracted text is pre-translated from the previously collected translation memory, then reviewed and fixed by a human translator. The resulting translations are stored in the translation memory and a reverse conversion takes place to generate the translated document (possibly using a skeleton if available). Here is a look at a Drupal XLIFF integration using the maximalist approach.
One of the best parts about interfacing with tools like these is that Drupal only needs to do the conversion and reverse conversion. The other parts of the work are done outside of the web site with more tailored tools.
Comments
Finally :)
This looks fantastic. Its seems that finally a CMS has realised what translators need.
Combined this handoff with Pootle - http://translate.sf.net/ http://pootle.wordforge.org - and you allow content to be pushed into a Translation Management System (TMS) and the translation managed correctly and then pushed back into the CMS.
CMS developers and in fact all people who require translation often don't understand the needs of translators. We have good tools that allow us to manage terminology, to checks and of course use Translation Memory. All lost when we're forced into another way of doing things.
In the above schematics you could also in fact use PO. But XLIFF is emerging as a good standard to allow translation to move without being stuck to a vendors tool.
Pootle that I mention above can manage XLIFF and PO files. Allows people to manage various languages, downlaod the files or translate online.
Any reason why this wouldn't work for Drupal 6.x?
Hello Gábor.
Thanks for this, a very clear explanation, great diagrams and an interesting module.
I have quite a few questions ...
Do you know if this would work for Drupal 6.x if simply converted? I know that multilingual features are better in D6, but have those changes broken this approach?
I am guessing they will not have broken it, but you are the expert.
Also, does this module conflict with other upload type modules like filefield? I don't fully understand these yet, but think that some are replacement for the core upload.module.
... and ... would it be relatively easy to restrict a translator to a single language and specific nodes using existing D6 permissions and roles?
Thanks for any assistance.