L-P Huberdeau


City of Collaboration

Posted in General by Louis-Philippe Huberdeau on the September 17th, 2008

As I write this post, I am in the airport, leaving Porto after a long, thought-provoking week. For a few seconds, I thought Porto airport was the best in the world. Large advertisement indicate free wireless access. Filtered water is provided. I figured it would be a good place to spend the time I have to wait, which is longer than it should be as I left for the airport with Alain, whose flight was a few hours before mine. Plug in, boot the computer. No wireless. Great. At least I have water, which still makes it better than any other airports I have seen.

My trip to Porto was divided in three parts of equal length: TikiFest, WikiSym and tourism. Three days each.

TikiFest

TikiFests seem to multiply themselves. Any occasion is good to meet with the other developers and it’s always a great time. This edition did not have any particular theme from the start. A lot of us were planning to come and some were going to make it a last minute decision. Rather than booking several hotel rooms, which was hard due to other events in the city at the same time, Marc Laporte took the initiative of renting a house. The WikiHouse. It was probably the best accommodation we could think of. It had plenty of space for all of us to sleep, a yard to enjoy the sun, a dining room to work in, a living room to hold more discussions and a barbecue.

Living together changed something about the relationships. Even though most of us had met before, being together 24 hours a day got us to learn many aspects usually hidden. Among others, that Xavi is an amazing cook. In developer events, we tend to eat lots of pizza and burgers. This time, we had fresh salmon and octopus.

I can’t say so much work got done. For one, I did not reach any of my personal objectives, but no time was wasted. Most of the time was spent discussing various community aspects. Conveniently, Martin Cleaver was living with us and brought a camera. Multiple demos were filmed and hopefully, it will help remote community members who cannot travel to TikiFests to put faces on the names in IRC. Matthew did the postediting and organized TikiTV in a brilliant and lovely fashion.

The entire meeting was mostly freeform. Just like in a wiki, everyone contributed a part to the house and made it fun to live in. Most would usually go to sleep very late. Those of us from eastern Canada did not even have to change our sleeping habits to match the time zone change. Discussions took place when they had to. People working could ask questions at any time. All householding tasks would happen without anyone ever discussing it.

WikiSym

WikiSym is a special event in the sense that it gathers academics, vendors, open source project contributors and various consultants together. It really means that most discussions included people formally dressed and people wearing T-Shirts. It also mixes people from all over the world. According to the numbers I heard, just over 100 participants came from 21 different countries. The formal program is only half of the event. The rest of it is the open space where ideas can be shared and discussed. I have been torn between both for the whole duration.

The venue was terrific. I usually have serious doubts about conferences held in universities. It usually is very complicated to go from the meeting rooms to the place to get lunch. The different meeting rooms are usually spread around to accommodate the normal university schedule and most of it is usually sterile. This one was nothing like it. OK, we did get some trouble finding the meeting rooms, but the reunion room was on ground level, accessible from outside. We could hold most of the discussions sitting outside in the grass. Certainly, it removed a lot to the formal aspects. Students walking around seemed confused to see “international experts” (the event was advertised in prominent spaces) being so open.

My primary motivation to visiting WikiSym was to present my paper on multilingual collaboration. This was actually the excuse to fly to Porto. In fact, I felt like attending WikiSym again ever since it ended last year in Montreal. The presentation part went fairly well. I had a lot of people asking me additional information after the presentation, so I guess that was good. I also participated in the BabelWiki workshop on similar topics. However, I missed part of it due to an indigestion, food poisoning or anything else that caused my system to crash and prevent me from walking, talking and thinking altogether. That really was the only bad part of the whole trip. Even there, the organizers and the tiki community have been so helpful, it almost made it a good moment.

For the rest of WikiSym, I attended a few open space sessions, some tutorials and some research paper presentations. I felt that this year did not bring as many ideas as the previous one did. I have the feeling that it’s mostly because last year brought so many ideas, all of us did our homework and came to present what we came up with. Many of the discussions were about discussing the status-quo rather than searching for the next step. It’s not really a bad thing. There were many discussions about how we can better work together to share those results. Especially in the area of data analysis, multiple tools were developed with very similar objectives and capabilities over the year. It’s sad to see so many efforts being wasted.

The sub-field of application wiki seems to be the only one where significant activity is still going on in terms of technology. Most of WikiSym is about social sciences rather than software. However, application wiki as a term is so loosely defined, discussions around it are almost sterile. Some see it as a way to plug together components to display data. Some see it as a way to store semi-structured data. Both are true, but the camps are so far off, it would take a lot more than two hours to bring the discussion to a creative position. An other aspect is that both need a significant amount of work to reach. Promoters of both sides have invested a lot in getting where they are.

My feeling is that they are both right. Both using the wiki as a data store and as a mash-up front end make sense. They can both co-exist. Sure thing, if the data is not already managed by an external system, using the wiki to store it makes sense. Using components to mix and match external data and data from the wiki could create a compelling solution. Based on the new Runs Everything slogan of Tikiwiki, I guess we will go for both sides.

I will try to attend the 2009 edition in Orlando. I hope we will have the base definitions grounded by then to get better discussions.

Tourism

I had reserved the last three days of the trip for tourism. It turns out my definition of tourism is quite different from others. To me, it’s mostly about living in a different city. I spent my first day in the house. An unexpected visitor, Peter B. Meyer, joined us for a night after the conference because he couldn’t find a hotel to stay it. It turns out it created great discussions around open source economics, software licences and politics too. My original goal for that day was to wake up really late, stay in the house, and see the night life in Porto. The first one I did, not the second. On the second day, I woke up late again, and waited for amette to wake up (for some reason, he seems to live on PST timezone), then we went out for some drinks.

On the third day, it was mostly quiet. Almost everyone had left. Remaining residents were gone on a tour to visit the regions outside the city. Not quite my thing, and they were gone by the time I woke up. Additionally, I figured I had made enough of the tourist stuff with the conference program. On Tuesday night, a tour bus picked-up everyone from the university for a typical photo-taking tour around the city. Then went for Port Wine tasting, a boat ride on the river and then for a traditional meal restaurant. It was a good way to see the city, but I had enough after that. I decided to do one of my favorite activity: walk around with my backpack, containing a pen, a pad of paper and a book. I went downtown and walked around the city. Stopping for a coffee or some soft drinks whenever I felt tired or stormed by ideas. Very relaxing and a good way to appreciate the city.

One of the very important aspects of Porto is that it is built along the river and was most likely chosen as a location because it’s very easy to defend. The city contains very steep roads. Walking around can be exhausting. Luckily, there are plenty of cafés to rest in. The city looks very old and worn out when walking in the streets. Other than the orange-ish tiles on the roofs, none of the buildings seem to have a common look. Other than the obvious monuments, none of the buildings is much worth looking at in details, especially when you move away from the touristic areas. However, when seen from Gaia, the city on the other side of the river, Porto seems to be a complete whole and is truly magnificent. The whole is much greater than the sum of the parts. Could we speak of accidental design? Uncoordinated collaboration?

Airport Adventures

Quite some time has gone by…

I knew there was no way two connections could go without problems. Newark airport was a mess. First, it took around 15 minutes before the plane could dock and let us out. My connection was already short. To help the situation, the situation at the border was simply horrible. There were announcements being made, but with the noise, I couldn’t get it all. Something about their system being down. Lines were all mixed up. Just as I got close to the booth to say “nothing to declare, just in transit”, the few people in front of my line seemed to have a complex situation. Forms not filled out properly, if at all. Most of them got turned back after a few minutes. My connection was really close.

I thought I was good to go. All I had to do was change terminal, figure out the gate number and go through security. It turns out I had to pick-up my luggage, get them through the border and send them back for transfer. At least they didn’t check me. Why does it have to be so complicated?

So I ended up switching terminal, finding the gate and headed for security. Worst line-up I had ever seen. It was circling around the duty-free stores. I was certain to miss my flight. With a little hope left, I made my way through the line. Ran a little. Got to the gate at 16:25. That could have made it, but the plane had just left.

Got the ticket changed. Four marvellous hours to wait in an airport without wireless access. Best part was: my luggage never reached the plane anyway, so I would have had to wait 4 hours in Montreal to get them even if I had caught the plane.

Security in the wild

Posted in General, Programming by Louis-Philippe Huberdeau on the September 5th, 2008

Wikis are open in nature and it’s what brought them to success. Anyone can visit the page, edit it and see the changes live. The concept is really simple and became natural very fast. It’s all around. However, wiki applications evolved over time. The average wiki no longer is just text editable by anyone. They became the heart of complete content management systems with access rights and many other features. Wiki purists cringe when they hear of a wiki that is not editable by anyone. The corporate world does the same when they hear that their intranet could be modified by any employee.

In a standard wiki. The worst thing that can happen is that someone can get offended by false information (or simply offending spam). Undo last change. The world goes on.

As wikis evolved, usage called for higher level functionality. Pages are no longer only textual information, they tend to become full blown applications. They can generate dynamic lists and interact with external systems. This is done mostly though a syntax extension often called a plugin. The concept is very simple. A unified syntax contains a name and some arguments. When the parser runs into it, it calls a custom function and displays the result. In most cases, these will perform harmless operations and cannot cause any damage. All they do is display text, only text a little more complex.

The problem is that they can be used for a whole lot of things, and harmless really is context dependant. Consider a situation where content must be displayed from an other web application, probably a legacy intranet application. One way to do it is to get the server to fetch the HTML page, filter out some of the tags so it fits nicely in the page and display it. This technique is very vulnerable to content format changes and is quite hard to configure for normal people. An easier way would be to use an iframe and just load the page from whereever it is.

In a corporate setting, this probably works great because you can trust the people you work with not to screw up and load something they shouldn’t on the intranet’s home page.

If you want to use it on a public website where all edit rights are restricted, everything is fine. However, if you have a single page that allows public edit, you just opened up a very wide security gap that could allow sub-script-kiddy (talking about the kind of people who “hack” pages on wikipedia) to hijack sessions through XSS.

The main issue is that these extensions are installed or not. You could use it at some point in a completely safe environment, stop using it, and then change the context which made it safe. The extension is still active and you forgot about it. It’s installed site-wide. There is no way to enable it just on specific pages that are controlled. Because the plugin instantiation is part of the page’s content, you can’t prevent anyone with edit rights on a page from using it.

In implementing remote plugins, this was a major issue. Not only it was a plugin that can potentially do harm, it’s about plugins I don’t even know about. I had this vague idea of requiring input validation on the remote plugins before letting them run, so not anything could be called unless an administrator granted permission. All of it was fairly complicated because of implementation issues. During a discussion on IRC with sylvieg and ricks99, I realized that the problem existed beyond the remote service problem. So far, I had really considered if the context wasn’t safe, some extensions should not be installed. Rick was asking if there was a way to let admins add a plugin, but not anyone else. This got me to realize that the only reason it was hard to implement is that I was taking the problem from the wrong level. Applying the validation at the plugin-wide level made it much easier to deal with than if I did it specifically for the remote ones. It also added a whole lot more value too.

The final implementation is very simple in the end. When an extension can be dangerous, it declares it as part of the definition by identifying which parts require validation (body, arguments or both). When the wiki parser encounters a plugin that requires validation, it generates a fingerprint of the plugin and verify if that fingerprint is known. If it is, it goes on, otherwise it displays controls on the page for authorized users to perform the audits (non-authorized ones get an error message). The fingerprint is nothing more than the name of the plugin, a hash of the serialized arguments, a hash of the body and the size of both input to avoid collisions. Some arguments marked as safe can be excluded from the hash to allow some flexibility.

The end result is that any plugin can be enabled on any host in any context and the site’s users are still safe from XSS attacks. More capabilities for the public/open wikis. Of course, because of Tikiwiki policies, validation can be disabled, which is useful if you have one of those safe context.

It does have a downside thought. Validation is required when changes are made to the plugin, which means the page is not fully enabled until an auditor visited it, which may take some time. Notifications, tracking, … There are solutions, but viewing the changes is no longer possible as soon as you click save. The white list verification is a pessimistic approach to the problem, but it’s still better than letting a few identities be stolen until it’s caught.

The implementation is available in tikiwiki svn and will be released as part of 3.0 in April 2009.

Reaching for simplicity

Posted in Programming by Louis-Philippe Huberdeau on the September 1st, 2008

In designing a solution, it’s always a good thing to check out different options. In many cases, problems can be solved with a complete hack or complete gold plating. Both are terrible, but it’s important to visit those options and try to find a middle ground.

Where does your average solution fit in?

  1. 2 hours
  2. 2 days
  3. 2 weeks
  4. 2 months
  5. 2 years
  6. never made it to release

The time frames definitely depend on the technologies and application domains. I personally like the 2 days to 2 weeks range. If I can’t get a proof of concept and a base architecture in 2 days, the design is probably bad. If it can’t be completed in 2 weeks, it probably could be simplified even more.

Everything I worked on this year fits in this range. Short, high impact, high value, fun. I just hate wasting time on long projects. I may just be short sighted, but I like to see results fast.

There are probably cases where a more polished solution that what can be made in two weeks is required, but these should be exceptions. If you are to embark in a long project, make sure it’s for the good reasons. Make sure you explored the lightweight solutions from the lower scales before and that the benefits you get from the better solution are worth the 5x cost increase.

Is the only reason you feel like going up is that it would be fun to use new cool technologies? Go back to the academic architecture guidelines: what are your desired quality attributes? Do you need that much extensibility? Is performance so critical? Put on the executive hat: How much is it really worth? What could be sacrificed to fit the budget and bring the most value?

Little over a month ago, Nelson pointed me to Deki Extensions. The really nice thing about them is that they can be used to call webservices and really facilitate writing extensions to the wiki syntax. Tikiwiki already has plugins, which is somewhat the same concept as extensions, but they don’t allow webservice calls. The big advantage of such remote plugins is that it allows to integrate content from external systems really nicely without having to modify the code base. Think as a use case to load up bug tracking information from BugZilla as part of a wiki page to complement the discussion.

There were really two opposite solutions to this one:

  • Write a webservice plug-in to do an HTTP request and dump the output on the page (2 hours)
  • Support the Deki Extensions altogether

Deki Extensions are amazing. The problem is that to support them, you basically need to support the DekiScript language that runs in the wiki page and emulate their environment. There may also be legal issues. Are we even allow to support it? After implementation, we would always have to play catch-up as they evolved the specifications. Then would come incompatibilities, and we would have make sure all extensions out there are supported. Implementation would be long and painful.

The webservice plugin would do the job, but it really isn’t any elegant and it’s completely unsafe as far as XSS goes. Not really any useful in the end unless you fully trust all potential contributors to a page. Did I mention this is to run in a wiki? This solution is completely useless.

Something decent has to be somewhere in the middle. Let’s break down Deki Extensions and see what they are all about:

  1. A way to embed special content in a page
  2. Remote execution through a custom exchange format
  3. Possibly structured data output to be manipulated by local, user-defined, execution
  4. A registry to map remote services to local “function” names

Broken down that way, it looks a lot simpler. We already have an architecture to run custom code in a page called plugins. There are multiple standardized exchange formats out there, like JSON and YAML; we don’t really need DekiXML. A language to manipulate output really looks like a template engine. There are quite a few of those out there that can provide the necessary sandbox. The registry is really not complicated.

It does seem like it can be brought down to my preferred project size range by using existing components, which also has the side effect of reducing risk by multiple folds. It also starts to shape up to a standard exchange format, doesn’t it?