L-P Huberdeau


Read ahead

Posted in Programming by Louis-Philippe Huberdeau on the November 20th, 2008

PHP 5.3 is going to be fun. I personally didn’t really follow the news on this one. I heard of lambda/closures and that pleased me. I just always hated defining functions for callbacks used only once. Not that it takes time, but I always felt forced to create a ridiculously long name for it to avoid any possible conflicts. Those times will be over… at least from the moment it will be deployed enough to justify forcing 5.3 as a requirement.

One fact that I find really interesting about the implementation is that it added functors to PHP. Basically, you can now create function that allow callbacks as parameters, that callback being an object or just a function without too much trouble. Just like STL does.

<php

class Foo {
	function __invoke() {
		return 42;
	}
}

$f = new Foo;

echo $f(); // prints 42

?>

However, this is just a side effect.

One concept that was new to me was traits in Sebastian’s presentation. This one won’t be included in 5.3, so it will be for 5.4 or 6.0. The goal of these is to share code between classes without using inheritance, hence avoiding multiple inheritance. While I can see it’s a cleaner way to do it, I really don’t have the argument is compelling.

The implementation details for these are not fixed yet, so don’t worry so much about the syntax.

The main argument against multiple inheritance is the diamond effect. Consider an interface A which is implemented by B and C. If a class D inherits both B and C, problems occur because of an ambiguity. There is no way which parent method call should be used. Typically, the way this is resolved is by redefining that method from D class and choosing either implementation or both. Human decision is required.

I’m personally not really for multiple multiple inheritance, nor against. I just don’t care so much about the issue. I don’t mind if a language allows you to shoot yourself in the foot. You should be smart enough to make the right decisions, or be able to learn from your mistakes. I’m mostly against people being against it for that kind of argument. There are solutions for it and it’s not that complex to resolve. The fact that multiple languages out there have different ways of handling it is no issue.

I think the problems you can have with a less complex structure than that diamond are worst. The diamond problem really occurs because both classes inherit from the same interface most of the time. It means both classes’ implementation really have the same purpose. If you take of that common interface, the conflicts that occur are purely name conflicts, based on a lack of namespace. While it’s likely that they do something similar, it may not be the case. If A it taken off the picture and some function foo() in B returns a collection and foo() in C returns an integer, the problem you have is a whole lot worst. You can no longer just define a wrapper that does either or both implementations.

Traits fully allow that, and they allow it in a much more subtle way. I’m not saying traits are bad. I think it’s a really nice way to share code between different classes, but the argument against multiple inheritance just does not fit.

<?php

trait B {
	function foo() {
		return array( 1, 2, 3 );
	}

	function bar() {
		return 2;
	}
}

trait C {
	function foo() {
		return 5;
	}

	function baz() {
		return 4;
	}
}

class D {
	use B, C;
}

?>

In this kind of situation, you get the exact same conflict. The same function name exists in the traits. Which one do you use? In fact, you can get a conflict with an even simpler model:

<?php
trait B {
	function foo() {
		return 5;
	}

	function bar() {
		return 2;
	}
}

class D {
	use B;

	function foo() {
		return array( 1, 2, 3 );
	}
}
?>

This is an other kind of conflict. Logic would say that the class has precedence, but because the traits are really a compile time hack, it’s not that obvious. The solution to both kinds of conflicts is to have a syntax to alias the functions as they get copied to the new class. It works, but you can do that with multiple inheritance too by just redefining the functions and calling the appropriate parent.

Of course, those kinds of conflicts are not likely to occur at design time. When you make your design, everything is beautiful and works. Things break when they evolve. If you have control over all the code, it’s really easy to manage, because you have unit tests, don’t you? Well, as part of a language, these things will be used by frameworks. If the framework decides an additional method is required in the traits, it may break code all around. Oops. Not quite better than multiple inheritance.

I still think traits are a good thing, but you need the right argument for it. The reason why they make more sense is because they allow you to keep a coherent whole. Multiple inheritance is used to avoid copying code in multiple classes. It’s a matter of laziness. The relationships between the concepts don’t really have anything to do with inheritance. Traits allow to represent that relationship for what it really is. But don’t tell me it’s a perfect solution.

From what I understood, there was also some open questions about how to handle states from traits. Basically, the trait is stateless, but if it can’t access the object’s data, there is no way you can do anything more useful than you would from a static function. One of the current proposals seemed to be to allow the trait to define abstract functions from the trait itself, which would mean that the trait would effectively be an implementation and an interface. An abstract trait just seems confusing.

<?php

trait X {
	function foo() {
		return $this->bar() * 2;
	}

	abstract function bar();
}

?>

I think it could be made a lot more cleaner, and allow more reuse, if the trait instead had requirements:

<?php

interface Barable {
	function bar();
}

trait X requires Barable {
	function foo() {
		return $this->bar() * 2;
	}
}

class D implements Barable {
	use X;

	function bar() {
		return 5;
	}
}

?>

It might seem small, but it could avoid duplicating interfaces in traits, or needing to have traits to implement interfaces.

I think it’s really great to see PHP still evolving, and with the transition to PHP 5 getting faster every month, it means we will be truly able to use these features soon.

Saving a project

Posted in General by Louis-Philippe Huberdeau on the October 16th, 2008

Recently, I have been brought in a project to help it go forward. Until then, I had never really realized how lucky I have been to work in the conditions I have. I hadn’t seen anything so terrible in over 5 years. I almost forgot such projects existed.

To summarize the situation:

  • Like any good software project, it started from an idea
  • A package was purchased as a base to build on
  • Months of work had been spent to tweak the package to do what it had to

When I was called in, the project was almost done, it just needed a few fixes to be made and had to be launched the next day. When I saw the project, my basic reaction was not going to happen. The code base was in a terrible shape. From a moderately clean software package (if you abstract from the fact that the only HTML tags they knew were those to build tables and had absolutely no knowledge of what a PHP notice is), it became a complete mess. Duplicate files with -old, -old-old, -old2, -test, -test2, -tst2. In very few case the base file was even used.

Obviously, there was no version control. None. No version control damages code. Not using version control is professional negligence in the 21st century (it was already severely blamed in the 20th).

Well, setting up version control is not too hard, and once it’s done, clean-up can begin. Effects can be felt in a matter of hours. The code base was really small once you removed all the duplicate code. It’s not such a big deal.

I think the worst part was they had no idea what was on the critical path to launch the site. When I ask what is left to be done, I expect a list of things, not an endless discussion about technical details (which happened not to be on the critical path). At that point, I could fully understand why the project had been going on for months without any significant progress: there was no global direction. No strategy between the great idea and the results. Just blind hacking on code hoping it would one day do what it had to. Worst is, they actually thought calling someone to hack-along would make it go faster.

The project suffered from bad management. That’s all. What can be done?

  • Set up version control and issue tracking
  • Spend some time to explain to your consultant what your goals are
  • Identify your critical path
  • Enter the tasks and prioritize them
  • Start from the top of the list
  • Clean up as you go

Results? The project stopped going backwards, code quality improved and it all became predictable. In less than a week, the projects can go further than they had in months. It’s really sad to see months of work getting lost when it could have been completed in a week. Spending an hour to set-up the environment and a few hours to clarify the vision is a really cheap investment.

The hardest part in getting this working is to kick start it. People hacking all night long hoping to get things done rarely accept that stopping for a few hours will save them months. Once the process started, it really takes a matter of hours for results to show up. However, you generally need a few weeks to get people to realize they are going nowhere.

If you find yourself making copies of files to test things out. If you can’t say in 30 seconds how what your working on fits in getting closer to success. If you thought your project would be done within a week for over a month. Ask for help or kill the project. Keeping it up that way, the project will never finish.

None of this is rocket science. It’s not even close to advanced research. It’s so deep down the foundations software engineering that courses barely mention it.

Software development should be predictable. If it’s not, your strategy is not clear enough for you to work in the first place. For any task you pick-up, it’s necessary to allocate a time budget for it. If you go overboard, it’s time to re-evaluate. If you can’t allocate a time budget, break it down into steps you can grasp. There are hundreds of way to solve each problem. Getting the best solution does not matter. Picking any plan will be better than no plan at all.

City of Collaboration

Posted in General by Louis-Philippe Huberdeau on the September 17th, 2008

As I write this post, I am in the airport, leaving Porto after a long, thought-provoking week. For a few seconds, I thought Porto airport was the best in the world. Large advertisement indicate free wireless access. Filtered water is provided. I figured it would be a good place to spend the time I have to wait, which is longer than it should be as I left for the airport with Alain, whose flight was a few hours before mine. Plug in, boot the computer. No wireless. Great. At least I have water, which still makes it better than any other airports I have seen.

My trip to Porto was divided in three parts of equal length: TikiFest, WikiSym and tourism. Three days each.

TikiFest

TikiFests seem to multiply themselves. Any occasion is good to meet with the other developers and it’s always a great time. This edition did not have any particular theme from the start. A lot of us were planning to come and some were going to make it a last minute decision. Rather than booking several hotel rooms, which was hard due to other events in the city at the same time, Marc Laporte took the initiative of renting a house. The WikiHouse. It was probably the best accommodation we could think of. It had plenty of space for all of us to sleep, a yard to enjoy the sun, a dining room to work in, a living room to hold more discussions and a barbecue.

Living together changed something about the relationships. Even though most of us had met before, being together 24 hours a day got us to learn many aspects usually hidden. Among others, that Xavi is an amazing cook. In developer events, we tend to eat lots of pizza and burgers. This time, we had fresh salmon and octopus.

I can’t say so much work got done. For one, I did not reach any of my personal objectives, but no time was wasted. Most of the time was spent discussing various community aspects. Conveniently, Martin Cleaver was living with us and brought a camera. Multiple demos were filmed and hopefully, it will help remote community members who cannot travel to TikiFests to put faces on the names in IRC. Matthew did the postediting and organized TikiTV in a brilliant and lovely fashion.

The entire meeting was mostly freeform. Just like in a wiki, everyone contributed a part to the house and made it fun to live in. Most would usually go to sleep very late. Those of us from eastern Canada did not even have to change our sleeping habits to match the time zone change. Discussions took place when they had to. People working could ask questions at any time. All householding tasks would happen without anyone ever discussing it.

WikiSym

WikiSym is a special event in the sense that it gathers academics, vendors, open source project contributors and various consultants together. It really means that most discussions included people formally dressed and people wearing T-Shirts. It also mixes people from all over the world. According to the numbers I heard, just over 100 participants came from 21 different countries. The formal program is only half of the event. The rest of it is the open space where ideas can be shared and discussed. I have been torn between both for the whole duration.

The venue was terrific. I usually have serious doubts about conferences held in universities. It usually is very complicated to go from the meeting rooms to the place to get lunch. The different meeting rooms are usually spread around to accommodate the normal university schedule and most of it is usually sterile. This one was nothing like it. OK, we did get some trouble finding the meeting rooms, but the reunion room was on ground level, accessible from outside. We could hold most of the discussions sitting outside in the grass. Certainly, it removed a lot to the formal aspects. Students walking around seemed confused to see “international experts” (the event was advertised in prominent spaces) being so open.

My primary motivation to visiting WikiSym was to present my paper on multilingual collaboration. This was actually the excuse to fly to Porto. In fact, I felt like attending WikiSym again ever since it ended last year in Montreal. The presentation part went fairly well. I had a lot of people asking me additional information after the presentation, so I guess that was good. I also participated in the BabelWiki workshop on similar topics. However, I missed part of it due to an indigestion, food poisoning or anything else that caused my system to crash and prevent me from walking, talking and thinking altogether. That really was the only bad part of the whole trip. Even there, the organizers and the tiki community have been so helpful, it almost made it a good moment.

For the rest of WikiSym, I attended a few open space sessions, some tutorials and some research paper presentations. I felt that this year did not bring as many ideas as the previous one did. I have the feeling that it’s mostly because last year brought so many ideas, all of us did our homework and came to present what we came up with. Many of the discussions were about discussing the status-quo rather than searching for the next step. It’s not really a bad thing. There were many discussions about how we can better work together to share those results. Especially in the area of data analysis, multiple tools were developed with very similar objectives and capabilities over the year. It’s sad to see so many efforts being wasted.

The sub-field of application wiki seems to be the only one where significant activity is still going on in terms of technology. Most of WikiSym is about social sciences rather than software. However, application wiki as a term is so loosely defined, discussions around it are almost sterile. Some see it as a way to plug together components to display data. Some see it as a way to store semi-structured data. Both are true, but the camps are so far off, it would take a lot more than two hours to bring the discussion to a creative position. An other aspect is that both need a significant amount of work to reach. Promoters of both sides have invested a lot in getting where they are.

My feeling is that they are both right. Both using the wiki as a data store and as a mash-up front end make sense. They can both co-exist. Sure thing, if the data is not already managed by an external system, using the wiki to store it makes sense. Using components to mix and match external data and data from the wiki could create a compelling solution. Based on the new Runs Everything slogan of Tikiwiki, I guess we will go for both sides.

I will try to attend the 2009 edition in Orlando. I hope we will have the base definitions grounded by then to get better discussions.

Tourism

I had reserved the last three days of the trip for tourism. It turns out my definition of tourism is quite different from others. To me, it’s mostly about living in a different city. I spent my first day in the house. An unexpected visitor, Peter B. Meyer, joined us for a night after the conference because he couldn’t find a hotel to stay it. It turns out it created great discussions around open source economics, software licences and politics too. My original goal for that day was to wake up really late, stay in the house, and see the night life in Porto. The first one I did, not the second. On the second day, I woke up late again, and waited for amette to wake up (for some reason, he seems to live on PST timezone), then we went out for some drinks.

On the third day, it was mostly quiet. Almost everyone had left. Remaining residents were gone on a tour to visit the regions outside the city. Not quite my thing, and they were gone by the time I woke up. Additionally, I figured I had made enough of the tourist stuff with the conference program. On Tuesday night, a tour bus picked-up everyone from the university for a typical photo-taking tour around the city. Then went for Port Wine tasting, a boat ride on the river and then for a traditional meal restaurant. It was a good way to see the city, but I had enough after that. I decided to do one of my favorite activity: walk around with my backpack, containing a pen, a pad of paper and a book. I went downtown and walked around the city. Stopping for a coffee or some soft drinks whenever I felt tired or stormed by ideas. Very relaxing and a good way to appreciate the city.

One of the very important aspects of Porto is that it is built along the river and was most likely chosen as a location because it’s very easy to defend. The city contains very steep roads. Walking around can be exhausting. Luckily, there are plenty of cafés to rest in. The city looks very old and worn out when walking in the streets. Other than the orange-ish tiles on the roofs, none of the buildings seem to have a common look. Other than the obvious monuments, none of the buildings is much worth looking at in details, especially when you move away from the touristic areas. However, when seen from Gaia, the city on the other side of the river, Porto seems to be a complete whole and is truly magnificent. The whole is much greater than the sum of the parts. Could we speak of accidental design? Uncoordinated collaboration?

Airport Adventures

Quite some time has gone by…

I knew there was no way two connections could go without problems. Newark airport was a mess. First, it took around 15 minutes before the plane could dock and let us out. My connection was already short. To help the situation, the situation at the border was simply horrible. There were announcements being made, but with the noise, I couldn’t get it all. Something about their system being down. Lines were all mixed up. Just as I got close to the booth to say “nothing to declare, just in transit”, the few people in front of my line seemed to have a complex situation. Forms not filled out properly, if at all. Most of them got turned back after a few minutes. My connection was really close.

I thought I was good to go. All I had to do was change terminal, figure out the gate number and go through security. It turns out I had to pick-up my luggage, get them through the border and send them back for transfer. At least they didn’t check me. Why does it have to be so complicated?

So I ended up switching terminal, finding the gate and headed for security. Worst line-up I had ever seen. It was circling around the duty-free stores. I was certain to miss my flight. With a little hope left, I made my way through the line. Ran a little. Got to the gate at 16:25. That could have made it, but the plane had just left.

Got the ticket changed. Four marvellous hours to wait in an airport without wireless access. Best part was: my luggage never reached the plane anyway, so I would have had to wait 4 hours in Montreal to get them even if I had caught the plane.

Security in the wild

Posted in General, Programming by Louis-Philippe Huberdeau on the September 5th, 2008

Wikis are open in nature and it’s what brought them to success. Anyone can visit the page, edit it and see the changes live. The concept is really simple and became natural very fast. It’s all around. However, wiki applications evolved over time. The average wiki no longer is just text editable by anyone. They became the heart of complete content management systems with access rights and many other features. Wiki purists cringe when they hear of a wiki that is not editable by anyone. The corporate world does the same when they hear that their intranet could be modified by any employee.

In a standard wiki. The worst thing that can happen is that someone can get offended by false information (or simply offending spam). Undo last change. The world goes on.

As wikis evolved, usage called for higher level functionality. Pages are no longer only textual information, they tend to become full blown applications. They can generate dynamic lists and interact with external systems. This is done mostly though a syntax extension often called a plugin. The concept is very simple. A unified syntax contains a name and some arguments. When the parser runs into it, it calls a custom function and displays the result. In most cases, these will perform harmless operations and cannot cause any damage. All they do is display text, only text a little more complex.

The problem is that they can be used for a whole lot of things, and harmless really is context dependant. Consider a situation where content must be displayed from an other web application, probably a legacy intranet application. One way to do it is to get the server to fetch the HTML page, filter out some of the tags so it fits nicely in the page and display it. This technique is very vulnerable to content format changes and is quite hard to configure for normal people. An easier way would be to use an iframe and just load the page from whereever it is.

In a corporate setting, this probably works great because you can trust the people you work with not to screw up and load something they shouldn’t on the intranet’s home page.

If you want to use it on a public website where all edit rights are restricted, everything is fine. However, if you have a single page that allows public edit, you just opened up a very wide security gap that could allow sub-script-kiddy (talking about the kind of people who “hack” pages on wikipedia) to hijack sessions through XSS.

The main issue is that these extensions are installed or not. You could use it at some point in a completely safe environment, stop using it, and then change the context which made it safe. The extension is still active and you forgot about it. It’s installed site-wide. There is no way to enable it just on specific pages that are controlled. Because the plugin instantiation is part of the page’s content, you can’t prevent anyone with edit rights on a page from using it.

In implementing remote plugins, this was a major issue. Not only it was a plugin that can potentially do harm, it’s about plugins I don’t even know about. I had this vague idea of requiring input validation on the remote plugins before letting them run, so not anything could be called unless an administrator granted permission. All of it was fairly complicated because of implementation issues. During a discussion on IRC with sylvieg and ricks99, I realized that the problem existed beyond the remote service problem. So far, I had really considered if the context wasn’t safe, some extensions should not be installed. Rick was asking if there was a way to let admins add a plugin, but not anyone else. This got me to realize that the only reason it was hard to implement is that I was taking the problem from the wrong level. Applying the validation at the plugin-wide level made it much easier to deal with than if I did it specifically for the remote ones. It also added a whole lot more value too.

The final implementation is very simple in the end. When an extension can be dangerous, it declares it as part of the definition by identifying which parts require validation (body, arguments or both). When the wiki parser encounters a plugin that requires validation, it generates a fingerprint of the plugin and verify if that fingerprint is known. If it is, it goes on, otherwise it displays controls on the page for authorized users to perform the audits (non-authorized ones get an error message). The fingerprint is nothing more than the name of the plugin, a hash of the serialized arguments, a hash of the body and the size of both input to avoid collisions. Some arguments marked as safe can be excluded from the hash to allow some flexibility.

The end result is that any plugin can be enabled on any host in any context and the site’s users are still safe from XSS attacks. More capabilities for the public/open wikis. Of course, because of Tikiwiki policies, validation can be disabled, which is useful if you have one of those safe context.

It does have a downside thought. Validation is required when changes are made to the plugin, which means the page is not fully enabled until an auditor visited it, which may take some time. Notifications, tracking, … There are solutions, but viewing the changes is no longer possible as soon as you click save. The white list verification is a pessimistic approach to the problem, but it’s still better than letting a few identities be stolen until it’s caught.

The implementation is available in tikiwiki svn and will be released as part of 3.0 in April 2009.

Reaching for simplicity

Posted in Programming by Louis-Philippe Huberdeau on the September 1st, 2008

In designing a solution, it’s always a good thing to check out different options. In many cases, problems can be solved with a complete hack or complete gold plating. Both are terrible, but it’s important to visit those options and try to find a middle ground.

Where does your average solution fit in?

  1. 2 hours
  2. 2 days
  3. 2 weeks
  4. 2 months
  5. 2 years
  6. never made it to release

The time frames definitely depend on the technologies and application domains. I personally like the 2 days to 2 weeks range. If I can’t get a proof of concept and a base architecture in 2 days, the design is probably bad. If it can’t be completed in 2 weeks, it probably could be simplified even more.

Everything I worked on this year fits in this range. Short, high impact, high value, fun. I just hate wasting time on long projects. I may just be short sighted, but I like to see results fast.

There are probably cases where a more polished solution that what can be made in two weeks is required, but these should be exceptions. If you are to embark in a long project, make sure it’s for the good reasons. Make sure you explored the lightweight solutions from the lower scales before and that the benefits you get from the better solution are worth the 5x cost increase.

Is the only reason you feel like going up is that it would be fun to use new cool technologies? Go back to the academic architecture guidelines: what are your desired quality attributes? Do you need that much extensibility? Is performance so critical? Put on the executive hat: How much is it really worth? What could be sacrificed to fit the budget and bring the most value?

Little over a month ago, Nelson pointed me to Deki Extensions. The really nice thing about them is that they can be used to call webservices and really facilitate writing extensions to the wiki syntax. Tikiwiki already has plugins, which is somewhat the same concept as extensions, but they don’t allow webservice calls. The big advantage of such remote plugins is that it allows to integrate content from external systems really nicely without having to modify the code base. Think as a use case to load up bug tracking information from BugZilla as part of a wiki page to complement the discussion.

There were really two opposite solutions to this one:

  • Write a webservice plug-in to do an HTTP request and dump the output on the page (2 hours)
  • Support the Deki Extensions altogether

Deki Extensions are amazing. The problem is that to support them, you basically need to support the DekiScript language that runs in the wiki page and emulate their environment. There may also be legal issues. Are we even allow to support it? After implementation, we would always have to play catch-up as they evolved the specifications. Then would come incompatibilities, and we would have make sure all extensions out there are supported. Implementation would be long and painful.

The webservice plugin would do the job, but it really isn’t any elegant and it’s completely unsafe as far as XSS goes. Not really any useful in the end unless you fully trust all potential contributors to a page. Did I mention this is to run in a wiki? This solution is completely useless.

Something decent has to be somewhere in the middle. Let’s break down Deki Extensions and see what they are all about:

  1. A way to embed special content in a page
  2. Remote execution through a custom exchange format
  3. Possibly structured data output to be manipulated by local, user-defined, execution
  4. A registry to map remote services to local “function” names

Broken down that way, it looks a lot simpler. We already have an architecture to run custom code in a page called plugins. There are multiple standardized exchange formats out there, like JSON and YAML; we don’t really need DekiXML. A language to manipulate output really looks like a template engine. There are quite a few of those out there that can provide the necessary sandbox. The registry is really not complicated.

It does seem like it can be brought down to my preferred project size range by using existing components, which also has the side effect of reducing risk by multiple folds. It also starts to shape up to a standard exchange format, doesn’t it?

The End of Design By Committee

Posted in General, Programming by Louis-Philippe Huberdeau on the August 25th, 2008

The W3C always had great intentions. The goal has always been to create great standards encompassing for all possible situations and respecting all special needs. In the early days, great standards still living today were created, like HTML and CSS. Of course, there were problems. It took years for standards to be supported correctly because of ambiguities. Facing those problems, they decided not to make standards official until they were fully supported by enough implementations. XHTML 2, CSS 3 never saw the light.

Standards like RDF never became what they were meant to be nearly 10 years after the reccommended proposition. SOAP and WSDL are huge buzzwords in the SOA world, but it never quite works as well as it should. Implementations are still incompatible and subsets of the spec need to be used for communication to be handled properly. Not to mention there are still no traces of XLink, XPointer or XForms anywhere in the ecosystem. All these specifications appeared in the early 2000s or late 1990s. Who were they made for?

The big problem with all of them is that they are so abstract that no one outside the committee who designed them can understand their purpose, let alone any idea of trying to support them. The specifications are too large. Too complicated. Building around XML probably wasn’t the best idea ever. It really is unlike anything else and painful to work with, unless you use even more XML technologies. It does not map well to common programming paradigms. It was only ever similar to HTML and SGML. Maybe those should have been taken as exceptions rather than the rule.

I consider the best specification build around XML is XPath, but only because it removes all the burden of managing XML and it’s not XML-based. CSS is great for formatting HTML, again, not XML-based. XSL-T is not too bad because it plays nice with HTML, but I find some other techniques like Zope’s TAL to be a lot more elegant. It extends XML without adding to the tag soup.

By ignoring all the details, APIs like SimpleXML allow you to read XML seamlessly, but writing it is a completely different task. XML works everywhere, but it’s always alien to the environment.

Recently, I have noticed that the Web started to regain it’s original nature. Standards are emerging rather than cultivated. The days where companies assigned employees to a consortium in order to write a specification are over. The W3C is still working on their specifications, trying to get them out the door, but nothing new started in a long while. During that period, we got to see great standards establishing themselves, not because they were supported by the industry, but because they were good.

Think about JSON and YAML. JSON is a subset of JavaScript that is well specified, easy to understand, easy to read for a machine and easy to write for a machine. YAML is a human readable format that is formal enough to be parsed by a machine. What do they have in common? They map to programming concepts. All scripting languages out there can load them in a single function call into their internal formats and rewrite them just as easily.

In the end, the problem can be distilled down to a single preference. You can either write a complex specification, encompassing for all possible cases, and spend months implementing it and making sure it’s compatible, then spend 5 minutes configuring it to perform all the magic. Or you can have a simple data exchange format, hook it to a scripting language and spend anywhere from 15 minutes to a few hours to do what you need. On the large scale, complex standards are worth it, but in most cases, they are a waste of time.

One of the great aspects of web development is that there are so many problems, there are thousands of people thinking about them. Over the years, a great ecosystem of tools and techniques was built. These days, all you need to do is piece together existing components. HTTP is a good environment to make requests and get responses. Data serialization is available. All you need to choose is decide how to use them. Recently, Identi.ca/Laconi.ca wrote a small specification for open microblogging. OEmbed allows to export the location of images and videos. When you look at those, the first thing that comes to your mind is: Why hasn’t anyone thought about it before?

It doesn’t have to be great. It doesn’t have to be so smart. We only need to agree on something, or do something and get others to follow. There is nothing religious about saying which field name you will use to contain the location or the size of an image. There is no need for namespaces and extendability. It does not even deserves debates or discussions. Just decision making. It’s a simple problem and it deserves a simple solution.

The specifications fit on a few sheets of paper. They can be read and understood by anyone who cares without investing significant efforts. Simple use cases can be illustrated. People get it.

There are so many ways in which different websites can’t talk to each other, which makes it painful to develop applications and forces people to re-implement the same things over and over again. In the new Web-SaaS-driven world, it’s a shame. Especially since the underlying protocol does not prevent anything. It’s just that one took the time to write down the problem and write down the simplest solution that could work.

Sure, you could go out and write something generic that solves everything (it would probably end up looking like RDF). In the end, unless you know what you’re searching for, there is no way you will find it. Abstract tokens don’t help anyone.

I’m currently writing my own spec (more information soon). What are your problems in integrating with other applications?

Note to self: This post contains too many acronyms and references. I should look these up and link in the future.

Ease out transitions

Posted in Programming by Louis-Philippe Huberdeau on the July 24th, 2008

Most software design out there is a matter of personal taste. There are very few widely agreed upon rules. It happened to all of us. You get to read a particularly bad piece of code and think it requires a complete rewrite. In most cases, it wouldn’t be hard to get people to agree with you. Rewriting would make everything more beautiful and allow easier modification. However, it has a terrible cost. It will always take longer than you expected. Bad code has this ability to hide features inside. While it may happen that some portions of code are dead, most of the time, they serve a very specific purpose your great new design wouldn’t have considered.

Major backend rewrites also tend to leave the front-end part behind. It will break user experience. All the polishing that was made on the interface is likely to be gone because it was not re-implemented or simply left broken. It would probably take months to reach the same level of external quality (compared to days or weeks to improve the internal quality).

When you are at point A and think it’s not the best place to be, there is nothing wrong with trying to go to point B. However, teleportation does not exist in the world will live in, and your app won’t just appear in point B on the next day. Even if you rewrite everything, get a perfect backend and a better front-end. All this legacy data won’t just transfer itself. Data conversions are a pain. One of the reason the code was so bad in the first place is probably because the data model was messed up. Converting data for all the edge conditions is a very long process and is always prone to break, which leads to users complaining. Of course, this implies that your upgrade process even has a way to handle data conversions as part of the regular upgrade process.

Before thinking about your grand new design, think about transition. It will probably expense more effort than the development efforts. After you rewrote everything, will you be able to make it from A to B?

Recently, I started working on restructuring the wiki plugin API in Tikiwiki. The plugins are great. They allow to find the different features in the wiki and create create applicative wikis. The problem is that they are hard to use. The best ones have too many parameters and not all of them are really well documented in the UI (while sometimes documentation is better on the doc site). When too many parameters are used, it just becomes unreadable. I decided to rework these during TikiFestStrasbourg after all of us learnt one new thing about plugin capabilities during discussions.

These were some of the issues:

  • Documentation in UI was a short blob of text containing HTML. It was not meaningful to users and a pain for translators to manage. Parameters were rarely detailed.
  • Plugin list was not filtered. All plugins were listed, even if related features were turned off. This created too much noise.
  • The syntax was hard to understand and a user interface would help a lot.
  • Caching does not behave nicely with some plug-ins.

To solve these issues, more meta-data is required about the plugins. The naive solution is to rewrite the plugin API entirely and make it good. After all, at this time, each plug-in is stored in a separate file containing two functions with a naming conventions. This is not a modern GoF-endorsed design!

Well, rewriting is a bad idea. Tikiwiki ships with around 75 plugins, plus a few in a separate folder to be enabled by sys-admins in controlled environments, plus an unknown amount in mods, and all those custom plugins written for specific applications we have no control on. If we only had to consider our own work, it would still take around 40 hours to convert them all, considering no clean-up is made and documentation is only entered as-is without improvements. Rewriting the API would be a 3-4 hour job at most. The conversion is uncertain the result would break upgrades on all customized installations.

I’d much rather compromise a little bit on beauty to save some pain. Adding an extra function to provide all the meta-data and making sure that the functions who use these act conditionally based on if the new way of doing things is available. I can keep working on improvements without having to convert everything at once. No functionality is broken.

Of course, the transition still has to be made, but at least it provides us with more time to do it. No one will scream that their favorite plugin is broken. I will be able to merge back in trunk much faster and get help from the rest of the community.

There is more than just code involved.

TikiFestStrasbourg

Posted in General by Louis-Philippe Huberdeau on the July 19th, 2008

At this time, I think there is nothing more important in an Open Source project than to have face to face meetings. I spent the last week in Strasbourg for the TikiFest. I don’t think any of us really knew what was going to happen. I had this idea that it was a perfect moment to release the next major version of Tikiwiki. Some thought it was about security or other technical aspects. We were all wrong. During those days, we got to know each other a little more. For the last years, we have been discussing over IRC, exchanging an occasional email, but we really didn’t know each other. I think that this alone will change the project’s dynamics in the future.

Sitting together in the same room working on the project made it clear that we were all facing similar issues. We could discuss them and see how to improve the situation in the future. No mailing list or chat session could have done as much. We would get up early in the morning and not really see time fly by, alternating between discussions and code. Only our stomachs could bring us back to reality, and that was usually very late.

One of the most important changes that we made is change the release schedule, or create one for that matter. It has been over 3 years since the last major release. Trunk changed so much, we barely know what has been added. The changelog is terrific and certainly not useful to anyone due to the very long length. We decided to release twice a year. The primary reason why we could never really release is that new stuff kept being added in. The reason it came to be that way was that everyone needed that other feature to be added before the release. Otherwise, it would be years before it would be part of a stable release. Well, it has been years since we had a major stable release.

Getting more frequent releases was an easy decision. The discussion was in fact, very short. I think we had everything settled in the morning of the very first day. Later on, this allowed us to assign dates for some major changes, like fully dropping PHP 4 support. Timed-based releases have huge advantages when it comes to open source projects. It allows to observe how things are evolving, take decisions and still provide users with a decent amount of information. It’s not about saying that somewhere in the future we will drop PHP 4. We can provide a month and base it on the growth of PHP 5.

I really have the feeling I am now working on a new project, even if very little of the code changed during those last few days. Just like many other events where developers gather and discuss critical issues, I have some doubts about wether everything discussed will end up being done. However, I have a good feeling. The team is committed and deeply involved in the project. It’s a great thing to know we all share a common vision.

Enjoying a beer after a hard day of work

During the entire week, we didn’t have so many hot debates (except maybe one where I exaggeratedly bursted out). Some things are just plain good for a project, and when a project ran without any major guidance for years, some basic, typical, structures are welcome by everyone. One of the only issues we couldn’t quite agree on was what to accept in stable branches. Tikiwiki has a long history of being free for all, which was convenient for every major contributors. With the new model, stable branches are required. Not accepting new features is obvious. Some minor enhancements are acceptable, some not. UI enhancements are even harder to decide on. What is a bug anyway? Is it a major one? What is major? What to accept is an entire gray zone. It’s extremely subjective. I had to rollback commits. I hate doing that. Without people to discuss the issue in front of me, I don’t think I could have done it. In total, we reviewed around 8 commits made after RC1. It took us at least 2 hours to decide which one we couldn’t accept.

Rejecting someone’s contribution is hard. At least I could motivate myself by the fact that this was only the stable branch. They could still apply their patch on trunk. However, with faster release cycles, we will have to keep trunk as somewhat stable. We already have this idea of experimental branches (or feature branches) for new things to mature before being merged, but it really adds overhead. Subversion is quite hard to handle on the branching and merging issues. If you are familiar with version control, it really all makes sense, but it’s a barrier to entry. I started writing scripts to do the hard work, but they still require people using a shell to use them.

Establishing a software process to ensure quality without breaking the team dynamics is hard. Doing it on a community of volunteers is even harder. Some well known best practices just don’t apply. Unit testing? Forget it. Iterative development? We don’t even know what people will be working on or when they will have time to do it. With at least 6 years of development, people got used to their way and, let’s face it, it worked so far. Only part that didn’t work is that the software could never quite release. The decisions we took in Strasbourg were required to improve the releasability of Tikiwiki. I hope it will work out with the community.

For every other decision other than the release cycle, we basically had to take accountability for it. We built some sort of road map, but it can’t really apply to anyone else as we can’t assign tasks to anyone. People will keep volunteering on what they want and we have no control over that. This leads to a strange phenomenon in the roadmap. The roadmap is more about dropping things than about adding things. Tikiwiki is huge. It has features for everything. Letting people add whatever they want has always been very important. It lead to features no one could have imagined with incremental changes and clever combination of features. However, for all the great things that happened, some things were added and forgotten by their authors, remained obscur, unmaintained and unused. For all we know, it might not work at all as it rot and broke. These things are a burden for the project administrators, so they will gradually be removed. Ideally, things like removing unmaintained features and dropping support for older technologies will reduce the weight we have to carry and allow to build greater things.

He’s right, again

Posted in Books by Louis-Philippe Huberdeau on the June 30th, 2008

In some theoretical sense, developing the complete infrastructure before implementing any visible functionality might be efficient, but in a practical sense, managers, customers, and developers begin to get nervous when too much time goes by before they can actually see the software work. Infrastructure development has the potential to become a research project in creating a perfect theoretical framework[...]

Sounds like a familiar problem? Taken from Software Project Survival Guide. I can’t say it’s my favorite from McConnell, but it’s still right on.

Motivation Driven Development

Posted in General, Programming by Louis-Philippe Huberdeau on the June 24th, 2008

Today is just one of these days I can just look back and laugh at my own behavior. I have been working on a personal project for a while now (should go public soon). Of course, it started off with many great ideas and I could have fun just thinking about it. When came the time to actually code it, motivation dropped. The problem was really that, while I had a great goal, before getting close, I had to get all the ground work done.

What happened? Well, it froze. I stopped working on it for months, until recently. When I got back to this project, I came back because there was something specific I wanted to implement in it. When I took a good look at at what I had in progress, I realized that the things I was focusing on were not getting me any close to my goals. I just left everything as is. Tests were running. Some code was not used yet. No problem.

Instead of starting over from where I was, I mapped out the high level features I wanted it to achieve and wrote down a road map. It was not based on building good foundations, not based on a good architecture. It was based on what’s needed for the software to be any useful and what I felt like working on. What did it change?

  • Changes were visible on the final product
  • At every step, I would get closer to being able to use it, and find out different ways to use it
  • I got motivation to work on the project
  • The project evolved more in the last two weeks than ever before

So, why today? Well, that feature I had half started months back, I finished it today in just 30 minutes. Months of no progress to avoid 30 minutes of work. It’s not that it was long, certainly not hard. It was a boring task. It was necessary, but along it did not do any good. Today, writing it enabled a very powerful feature. Even if it was boring, I was happy to do it because I would then see the whole thing in action. It’s not quite complete yet as it still misses a few critical features required to normal use, but I can already use it for my own needs, which is great.

Now, if I had done it a few months ago, I’m pretty sure it would have been more than 30 minutes of work. It just takes me more time to do work when I’m not motivated. It’s also likely that the feature would have been more complete. Rather than doing what it has to in order to be useful, it would have been what it should be not to be so boring to write. Goldplating? Scopecreep?

The scary part is that I’m pretty certain it’s not the first time I’ve dropped a project just because I didn’t feel like doing a tiny little part.

Next Page »