Summer report

For some reason I never quite understood, I always tend to be extremely busy in the summer when I would much rather enjoy the fresh air and take it slow, and be less busy during the winter when heading out is less attractive. This summer was no exception. After the traveling, I started a new mandate with a new client, and that brought my busyness to a whole new level.

In my last post, I mentioned a lot of wiki-related events happening over the summer and that I would attend them all. It turns out it was an exhausting stretch. Too many interesting people to meet, not enough time — even in days that never seem to end in Poland. As always, I was in a constant dilemma between attending sessions, the open space or just creating spontaneous hallway discussions. There was plenty of space for discussion. The city of Gdansk being not so large, at least not the touristic area in which everyone stayed, entering just about any bar or restaurant, at any time of the day, would lead to sitting with an other group of conference attendees. WikiMania did not end before the plane landed in Munich, which apparently was the connection city everyone used, at which point I had to run to catch my slightly tight connection to Barcelona.

I know, there are worst ways to spend par of the summer than having to go work in Barcelona.

I came to a few conclusions during WikiSym/WikiMania:

  • Sociotechnical is the chosen word by academics to discuss what the rest of us call the social web or web 2.0.
  • Adding a graph does not make a presentation look any more researched. It most likely exposes the flaws.
  • Wikipedia is much larger than I knew, and they still have a lot of ambitions.
  • Some people behind the scenes really enjoy office politics, which most likely creates a barrier with the rest of us.
  • One would think open source and academic research have close objectives, but collaboration remains hard.
  • The analysis performed leads to fascinating results.
  • The community is very diverse, and Truth in Numbers is a very good demonstration of it for those who could not be there.

As I came back home, I had a few days to wrap up projects before getting to work for a new client. All of which had to happen while fighting jet lag. I still did not get time to catch-up with the people I met, but I still plan on it.

One of the very nice surprises I had a few days ago is the recent formation of Montréal Ouvert (the site is also partially available in English), which held it’s first meeting last week. The meeting appeared like a success to me. I’m very bad at counting crowds, but it seemed to be somewhere between 40 and 50 people attending. Participants were from various professions and included some city representatives, which is very promising. However, the next steps are still a little fuzzy and how one may get involved is unclear. The organizers seemed to have matters well in hand. There will likely be some sort of hack fest in the coming weeks or months to build prototypes and show the case for open data. I don’t know how related this was to Make Web Not War a few months prior. It may just be one of those idea whose time has come.

I also got to spend a little time in Ottawa to meet with the BigBlueButton team and discuss further integration with Tiki. At this time, the integration is minimal because very few features are fully exposed. Discussions were fruitful and a lot more should be possible with the now in development version 0.8. Discussing the various use cases indicated that we did not approach the integration using the same metaphor, partially because it is not quite explicit in the API. The integration in Tiki is based on the concept of rooms as a permanent entity that you can reserve through alternate mechanisms, which maps quite closely to how meeting rooms work in physical spaces. The intended integration was mostly built around the concept of meetings happening at a specific moment in time. Detailed documentation cannot always explain the larger picture.

Upcoming events

This summer, I will have my largest event line-up around a single theme. None of which will be technical! It will begin on June 25th with RecentChangesCamp (RoCoCo, to give it a French flavor) in Montreal. I first attended that event the last time it was in Montreal and again last year in Portland. It’s the gathering of wiki enthusiasts, developers, and pretty much anyone who cares to attend (it’s free). The entire event is based around the concept of Open Space, which means you cannot really know what to expect. Both times I attended, it had a strong local feel, even though the event moves around.

Next in line is WikiSym, which will be held in GdaÅ„sk (Poland) on July 7-9th. I also attended it twice (Montreal in 2007, Porto 2008). I missed last year’s in Orlando due to a schedule conflict. WikiSym is an ACM conference, making it the most expensive wiki conference in the world (still fair, by other standards). Unlike the other ones which are more community-driven, this one is from the academic world (you know it when they refer to you as a practitioner). Most of the presentations are actually paper presentations. Because of that, attending the actual presentations is not so valuable as the entire content is provided as you get there. It’s much better to spend time chatting with everyone in the now-tradition Open Space. It really is a once per year opportunity to get everyone who spent years studying various topics around wikis from all over the world. Local audience is almost absent, except for the fact that the event tends to go to places where there is a non-null scientific wiki community.

Final stop will be WikiMania, at the exact same location as the previous one until July 11th. I really don’t know what to expect there. I never attended the official WikiMedia conference. However, it has a fantastic website with tons of relevant information for attendees. It probably has something to do with it being an open wiki and being attended by Wikipedia contributors.

I will next head toward Barcelona for a mandatory TikiFest. However, I don’t really consider this to be in the line-up as it’s mostly about meeting with friends.

That is three events on wikis and collaboration. Wikis being the simplest database that could possibly work, what could require 8-9 days on a single topic? It turns out the technology does not really matter. Just like software, writing is not hard. Getting many people to do it together is a much bigger challenge. Organizing the content alone to suit the needs of a community is challenging. Because the structure is so simple, it puts a lot of pressure on humans to link it all together, navigate the content and find the information they are looking for.

Wiki translation now in motion

This week, I attended to WikiSym in Montreal. This was completely unexpected. Due to the heavy constraints on my schedule, I didn’t think I would be able to attend. For some reason, I ignored all constraints and went anyway. Those constraints coming back to me explain why it took me so long to post about this.

Those who have been following my blog for a while know how much interest I have for collaborative translation. The topic has been coming back year after year. Until now, they were only vague ideas. At WikiSym, it became a real project. Not only a development effort, but actual needs and real life scenarios. A website is up to document the various projects related to wiki translation. At this time, there is very little information, but the amount will grow considerably by the end of April 2008. Yes, there is also a time frame. I will work on the project as part the final project of my degree.

During WikiSym, there was quite a lot of attention focused on translation. While not everyone is interested, those who are see it as a critical problem in their uses of wikis. There are many cases where content has to be translated and depending on the context, different types of translations are required. During open spaces and various other hallway discussions, these situations were discussed (some may be missing, my memory isn’t so great):

  • Documentation project: Almost every major open source projects now use a wiki for user documentation purposes. It’s easier for people to collaborate in and removes some burden from the developers. At some point in the project life, people start requesting documentation. In this situation, everything needs to be translated and the translations are likely to have a very similar content structure.

    Some projects are more structured than others and might have dedicated translators. If they do, they will probably opt for a master version replicated to other versions. Since the documentation is a wiki, there is no way to prevent people from contributing in their language of choice. Changes by visitors will have to be replicated in all other languages, but since there is a dedicated staff, the change can simply be added to the master versions and others will replicate.

    Smaller projects are likely to have unsynchronized versions. Due to the lack of coordination and resources, this is the kind of chaos we have to live with. In this situation, visual indications for the visitors are important. If the content is out of date, alternatives must be proposed. Indications could also be used to invite the visitors to participate in the translation.

  • Government: In bilingual countries or regions, government organizations are often bound to translating all content. A similar situation can also occur in research facilities or largemultinational companies with similar policies. In those cases, the content is developed by a single group of people and translated once the content is completed. This is a very typical master document case.

  • Marketting information: Some product marketting teams develop content using wikis. In this case, the translation may not reflect the original version. Content is likely to be localized to the target culture. If the information contains case studies, case studies will need to be adapted to the region. Some general information needs to be translated and maintained in sync, but some is irrelevant to the translators. Not all changes made should trigger a translation process to begin. Since the content will be different on all translations, the structure of the content itself cannot be used to highlight the changes required.

Project objectives

In order to be truly successful, the project will have to accommodate all of the above. In a perfect world, it would also accommodate the situations we do not know of, but there is no way to verify them. It should also be possible to use the system The Wiki Way. It should be simple and have as little overhead as possible.

The main objective of the project will be to add the required mechanisms and interfaces in TikiWiki to support true synchronized multilingual content. As a collaborative project, a secondary objective is to document the effort to help other projects who would like to incorporate such features. Both successes and failures will have to be documented.

Wiki Translation is not only about synchronization of content and content management issues. It’s also about collaboration in building translation databases. While this aspect is not quite in the scope of my project, it will also affect the project. Creating a unified interface to access translation databases, dictionaries and automatic translation tools could also be required.

Synchronization technicalities

Right now, most wikis do not support translation at all. In the best cases, they can recognize pages in an other language as their equivalent. Just like in Wikipedia, Figure 1 demonstrates how pages evolve independently. There is no way for visitors to see if the page is up to date and it’s up to the maintainers of the other versions to make sure new content is incorporated.

Figure 1 : No synchronization

Figure 1 : No synchronization

Once you identified the need for translation synchronization, the most simple way to perform it is to use a master version paradigm. In controlled environments, it’s very frequent as it’s possible to ensure that all contributions to the content are made from the master version. At given milestones, the translations can be updated from the selected version. Figure one shows a simple representation of the model.

Figure 2 : Master version paradigm

Figure 2 : Master version paradigm

The primary flaw of the master version paradigm is that it does not apply at all in a collaborative environment. It’s not possible for people to contribute to the content in their language of choice.:

Once you refuse to limit editing to a single master version, the first thing that comes in mind is to determine which versions are equivalent between the different versions. The basic idea is to establish pairs of language equivalence in the timeline. Figure 3 presents such a model.

Figure 3 : Equivalence model

Figure 3 : Equivalence model

The concept seems easy enough to represent, but in reality, it’s much harder to apply. In most cases, the translator would need to update both pages in the pair to fully merge the changes made in both before saying the two pages are equivalent. This requires the translator to be efficient in both languages and doubles the effort. It’s also wrong in some other way. If the content is not meant to be identical, like the marketting scenario, the indication that the pages are synchronized is misleading.

This brought me to a much simpler concept. Change integrations are directional. In fact, they are very much like branch merging in most revision control systems. The person merging changes from an other language does not need to push his own changes back to the language he is merging from. The pages are not equivalent to, but they can be at least as good as. Figure 4 presents a representation of interaction between different languages. In fact, due to the large amount of line crossings, this model can be complex to understand. An important part of the work required will be to analyze the data and expose something meaningful to the user.

Figure 4 : Branch merging

Figure 4 : Branch merging

In the above image, French and English versions have been exposed to the same changes. They all include all the changes they made over their history and the information added in Spanish version 1. The Spanish version has a life of it’s own. It only includes the changes from French version 1 and English version 2.

An important concept to keep in mind is change propagation. Spanish translators do not need to understand French. As long as someone translating to English does, the changes from the French version will eventually get incorporated in the Spanish version. The change propagation must be tracked to make sure no pages are flagged as incomplete while they actually contain the information.

Due to the volunteer nature of some of the translation work, it might be required to support partial merges. If large changes were made, chances are that the volunteer translator won’t translate them all in a single effort. There is no real way to quantify how many changes we partially incorporated, but the partial merge could be used to help subsequent translators to figure out what was done and what is left to be done.

There are probably a few corner conditions that cannot be taken care of, but I think the branch merging model can handle most cases.