Wiki translation now in motion

This week, I attended to WikiSym in Montreal. This was completely unexpected. Due to the heavy constraints on my schedule, I didn’t think I would be able to attend. For some reason, I ignored all constraints and went anyway. Those constraints coming back to me explain why it took me so long to post about this.

Those who have been following my blog for a while know how much interest I have for collaborative translation. The topic has been coming back year after year. Until now, they were only vague ideas. At WikiSym, it became a real project. Not only a development effort, but actual needs and real life scenarios. A website is up to document the various projects related to wiki translation. At this time, there is very little information, but the amount will grow considerably by the end of April 2008. Yes, there is also a time frame. I will work on the project as part the final project of my degree.

During WikiSym, there was quite a lot of attention focused on translation. While not everyone is interested, those who are see it as a critical problem in their uses of wikis. There are many cases where content has to be translated and depending on the context, different types of translations are required. During open spaces and various other hallway discussions, these situations were discussed (some may be missing, my memory isn’t so great):

  • Documentation project: Almost every major open source projects now use a wiki for user documentation purposes. It’s easier for people to collaborate in and removes some burden from the developers. At some point in the project life, people start requesting documentation. In this situation, everything needs to be translated and the translations are likely to have a very similar content structure.

    Some projects are more structured than others and might have dedicated translators. If they do, they will probably opt for a master version replicated to other versions. Since the documentation is a wiki, there is no way to prevent people from contributing in their language of choice. Changes by visitors will have to be replicated in all other languages, but since there is a dedicated staff, the change can simply be added to the master versions and others will replicate.

    Smaller projects are likely to have unsynchronized versions. Due to the lack of coordination and resources, this is the kind of chaos we have to live with. In this situation, visual indications for the visitors are important. If the content is out of date, alternatives must be proposed. Indications could also be used to invite the visitors to participate in the translation.

  • Government: In bilingual countries or regions, government organizations are often bound to translating all content. A similar situation can also occur in research facilities or largemultinational companies with similar policies. In those cases, the content is developed by a single group of people and translated once the content is completed. This is a very typical master document case.

  • Marketting information: Some product marketting teams develop content using wikis. In this case, the translation may not reflect the original version. Content is likely to be localized to the target culture. If the information contains case studies, case studies will need to be adapted to the region. Some general information needs to be translated and maintained in sync, but some is irrelevant to the translators. Not all changes made should trigger a translation process to begin. Since the content will be different on all translations, the structure of the content itself cannot be used to highlight the changes required.

Project objectives

In order to be truly successful, the project will have to accommodate all of the above. In a perfect world, it would also accommodate the situations we do not know of, but there is no way to verify them. It should also be possible to use the system The Wiki Way. It should be simple and have as little overhead as possible.

The main objective of the project will be to add the required mechanisms and interfaces in TikiWiki to support true synchronized multilingual content. As a collaborative project, a secondary objective is to document the effort to help other projects who would like to incorporate such features. Both successes and failures will have to be documented.

Wiki Translation is not only about synchronization of content and content management issues. It’s also about collaboration in building translation databases. While this aspect is not quite in the scope of my project, it will also affect the project. Creating a unified interface to access translation databases, dictionaries and automatic translation tools could also be required.

Synchronization technicalities

Right now, most wikis do not support translation at all. In the best cases, they can recognize pages in an other language as their equivalent. Just like in Wikipedia, Figure 1 demonstrates how pages evolve independently. There is no way for visitors to see if the page is up to date and it’s up to the maintainers of the other versions to make sure new content is incorporated.

Figure 1 : No synchronization

Figure 1 : No synchronization

Once you identified the need for translation synchronization, the most simple way to perform it is to use a master version paradigm. In controlled environments, it’s very frequent as it’s possible to ensure that all contributions to the content are made from the master version. At given milestones, the translations can be updated from the selected version. Figure one shows a simple representation of the model.

Figure 2 : Master version paradigm

Figure 2 : Master version paradigm

The primary flaw of the master version paradigm is that it does not apply at all in a collaborative environment. It’s not possible for people to contribute to the content in their language of choice.:

Once you refuse to limit editing to a single master version, the first thing that comes in mind is to determine which versions are equivalent between the different versions. The basic idea is to establish pairs of language equivalence in the timeline. Figure 3 presents such a model.

Figure 3 : Equivalence model

Figure 3 : Equivalence model

The concept seems easy enough to represent, but in reality, it’s much harder to apply. In most cases, the translator would need to update both pages in the pair to fully merge the changes made in both before saying the two pages are equivalent. This requires the translator to be efficient in both languages and doubles the effort. It’s also wrong in some other way. If the content is not meant to be identical, like the marketting scenario, the indication that the pages are synchronized is misleading.

This brought me to a much simpler concept. Change integrations are directional. In fact, they are very much like branch merging in most revision control systems. The person merging changes from an other language does not need to push his own changes back to the language he is merging from. The pages are not equivalent to, but they can be at least as good as. Figure 4 presents a representation of interaction between different languages. In fact, due to the large amount of line crossings, this model can be complex to understand. An important part of the work required will be to analyze the data and expose something meaningful to the user.

Figure 4 : Branch merging

Figure 4 : Branch merging

In the above image, French and English versions have been exposed to the same changes. They all include all the changes they made over their history and the information added in Spanish version 1. The Spanish version has a life of it’s own. It only includes the changes from French version 1 and English version 2.

An important concept to keep in mind is change propagation. Spanish translators do not need to understand French. As long as someone translating to English does, the changes from the French version will eventually get incorporated in the Spanish version. The change propagation must be tracked to make sure no pages are flagged as incomplete while they actually contain the information.

Due to the volunteer nature of some of the translation work, it might be required to support partial merges. If large changes were made, chances are that the volunteer translator won’t translate them all in a single effort. There is no real way to quantify how many changes we partially incorporated, but the partial merge could be used to help subsequent translators to figure out what was done and what is left to be done.

There are probably a few corner conditions that cannot be taken care of, but I think the branch merging model can handle most cases.

Leave a Reply

Your email address will not be published. Required fields are marked *