I18N Revisited
Two years ago, I discussed how translation could be handled in collaborative websites to ensure the different versions could be kept synchronized. After all this time, still no working implementation are available. As far as I know, no effort has taken a specific direction. The topic is always discussed around, but only more problems are raised. The topic was brought back to me yesterday night and my old post was quoted, I didn’t even remember I had it written.
The issues remain the same. Changes occur from everywhere in the documentation and there is no way to track these changes and make sure they are applied in all translations. From a user’s standpoint, there is no way to know if the translation he is currently looking at is up to date. The only way to find it would be to take a look at the master version and see if there is any more content. If everyone could read the master version, we wouldn’t have to care so much about translating everything.
So far, all the solutions I heard about are based on having a master version, which would ideally change from one language to an other as progress is made. There seems to be a global agreement that it’s the only way that it could be done. It sure is a solution, but it somehow removes the collaborative aspect we were trying to achieve in the first place. The structure you end up with is a sub-group working on a master version and improving the content, and the rest of the world translating. Until everything is translated, there is no way to make any improvement on the content, unless you can become a member of the current master sub-group.
That solution considers that all the translations will be in sync at some point. If the website is translated in two languages, this is likely to happen with enough efforts from the translating group. With three languages, it might still be possible, but for anything above that, I seriously doubt it. The only way it could happen is if a complete content freeze was called until translations were completed, but this really goes against the idea of collaborative development.
I think the solution is to accept that the versions will no be in sync and to refuse any form of master copy. Translations are a way to reach more people, but it’s not acceptable to slow down development just to wait for a minority, and reaching these people is not enough if you can’t allow them to contribute to the whole.
According to me, the way wiki applications currently handle revisions is slowing down this translation process. The initial version is 0 and it increments with any improvement. With each revision number, the modifications should be applied to the translations and revision X in English should match revision Y in French. If a modification is made in the french version, it should still be possible to match the revisions in both translations. If two concurrents modifications are made between change propagation, you’re busted. Revision numbers are way too linear to work in a collaborative environment.
Wikis work well because of their simplicity and the fact that they keep an history of all revisions is what makes them so effective in a collaborative environment. To the user, being able to see the exact changes that were made since the last time they looked at a page is really useful. Thinking that a page can be compared to an other “equivalent” page the same way it can be compared with it’s own history is a big mistake.
Getting rid of the master version
To get rid of the master version, you need to accept these revision numbers won’t match with each other and rely on an other technique to keep track of the changes that need to be applied. Each change has a purpose and an impact on the content. Let’s consider a situation where changes are made on concurrent revisions. For simplicity’s sake, the initial version was translated in all languages and the content matches initially.
| Label | Description | English | French | Spanish | German |
|---|---|---|---|---|---|
| A | Initial content | X | X | X | X |
| B | Adding related links | X | |||
| C | Correcting information | X | X | ||
| D | Adding information | X |
In the situation above, modification B was made in the English version and will eventually need to be integrated in the 3 other versions. The error in the initial content was noticed simultaneously by two different contributors. The correction will need to be applied in the two other version, but a way to indicate the two corrections were actually the same thing is needed. Modification D is similar to B, but I wanted to place the focus on the fact that English is not a master version.
When looking at the English version, the users should be indicated that the German version has a piece of additional content. On the other side, the users looking at the German version need to know that related links have been added and that a correction was made on the content. If the German user knows enough English to translate and decides to do so, he should be able to see the differences that were applied by the specific changes and perform them on his own translation. When making the revision, a way to indicate that the change has been integrated is required. This way, the next time a German user accesses the website, no missing information warning will be given.
After the changes, the revision history of the German version might look like this:
- Initial version translated from English [A]
- Correcting a typo
- Adding information [D]
- Applying correction and adding related links [B, C]
As indicated by this hypothetical revision history, multiple changes can be added in a single revision, and not all revisions create a change. As I mentioned two years ago, not all changes are relevant to other translations.
Using this technique, all versions can evolve independently and still give enough information to the translators to complete the information with the evolution coming from the other languages. As a side effect, translators who do not understand English can easily get a list of all other translations which integrated a given change, and can see what were the differences applied to perform this change.
In the case of the correction made at two different places, it’s not really possible to determine automatically that the two changes are actually the same. In a perfect world where the second contributor two perform the change understood the first change, he could simply mark the change as integrated, but if he doesn’t, some translator will eventually come into play and will have to sort out this confusion. A that point, there needs to be a way to indicate that both changes are the same and two options are possible: create a relation between the two changes to indicate they are equivalent, or destroy the later (or mark as obsolete) and link the revision to the prior change.
Categorizing changes
In the table above, I used a label and a description to identify the change. The label was a simple way to refer to a change from the text and the description was to give a meaning to the example. What do we really need to know about these changes to be integrated? The type of change (correction, improvement or new content) could be helpful, but alone it’s not enough to help the translators identify the change they are trying to integrate from a diff of two revisions, especially if multiple changes are made in the same revision change. In the example above, I considered multiple changes could be integrated in a single revision, but I see no reason why two different changes couldn’t be created in a single revision. Differentiating those changes require a powerful diff tool to split the changes and create a unique entity for each of them. If multiple changes are made in a single revision, the one thing you don’t want is to have your translators confuse these changes and mark the wrong change as integrated.
It seems like a description of the change is required to do this, but that brings some kind of recursive problem. In which language do you write that description? I don’t think there is an easy solution to solve this one. It could be possible to disable the multiple change per revision functionality, but since this is about a collaborative environment, there is no way of making sure the contributors will follow the guidelines. You could end up with single changes that actually contain multiple other changes, which will be a mess to track and integrate.
Now, how do you deal with faulty contributions? What if someone adds information that is completely wrong? There are many ways to deal with this. In a small community or a highly hierarchical one, you could have a person or comity to approve the changes before they are made visible to translators or you could install a rating system where the users or translators could indicate if the change is good or not. A bad change would eventually have to be removed from the version it was added in and be marked accordingly.
In two years, nothing really changed in this field. It will probably take a very long while before a real solution is applied. The main problem is that there is probably no one solution to fit all cases. Different project sizes and cultures will require different solutions. What I propose here is a technological solution, a way of organizing the content. Just like wikis, it will depend on the way people use it. The management of the translations is based on meta attributes added to the content and revisions to allow relations to be made, but if these relations are not applied correctly, there is no way to get it to work.
Wikis work because they are extremely simple to modify. Adding a system to manage the translations make it not as easy. Two years ago, I proposed to tag the changes based on their types to adjust the notification level. This new proposition asks a lot more from the users. Not only do they have to tag their changes with a type, they will also need to describe the purpose of their changes and indate what changes they integrate. It could probably be done without describing the changes, but I doubt integration of the changes will be as easy without these messages.