<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>L-P Huberdeau&#039;s blog</title>
	<atom:link href="http://blog.lphuberdeau.com/wordpress/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.lphuberdeau.com/wordpress</link>
	<description>Software engineering and anthropology, annectodes, and more.</description>
	<lastBuildDate>Tue, 31 Aug 2010 22:44:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Summer report</title>
		<link>http://blog.lphuberdeau.com/wordpress/2010/08/summer-report/</link>
		<comments>http://blog.lphuberdeau.com/wordpress/2010/08/summer-report/#comments</comments>
		<pubDate>Tue, 31 Aug 2010 22:44:03 +0000</pubDate>
		<dc:creator>Louis-Philippe Huberdeau</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Wiki]]></category>

		<guid isPermaLink="false">http://blog.lphuberdeau.com/wordpress/?p=384</guid>
		<description><![CDATA[For some reason I never quite understood, I always tend to be extremely busy in the summer when I would much rather enjoy the fresh air and take it slow, and be less busy during the winter when heading out is less attractive. This summer was no exception. After the traveling, I started a new [...]]]></description>
			<content:encoded><![CDATA[<p>For some reason I never quite understood, I always tend to be extremely busy in the summer when I would much rather enjoy the fresh air and take it slow, and be less busy during the winter when heading out is less attractive. This summer was no exception. After the traveling, I started a new mandate with a new client, and that brought my busyness to a whole new level.</p>
<p>In <a href="http://blog.lphuberdeau.com/wordpress/2010/06/upcoming-events/">my last post</a>, I mentioned a lot of wiki-related events happening over the summer and that I would attend them all. It turns out it was an exhausting stretch. Too many interesting people to meet, not enough time &#8212; even in days that never seem to end in Poland. As always, I was in a constant dilemma between attending sessions, the open space or just creating spontaneous hallway discussions. There was plenty of space for discussion. The city of Gdansk being not so large, at least not the touristic area in which everyone stayed, entering just about any bar or restaurant, at any time of the day, would lead to sitting with an other group of conference attendees. WikiMania did not end before the plane landed in Munich, which apparently was the connection city everyone used, at which point I had to run to catch my slightly tight connection to Barcelona.</p>
<p>I know, there are worst ways to spend par of the summer than having to go work in Barcelona.</p>
<p>I came to a few conclusions during WikiSym/WikiMania:</p>
<ul>
<li><em>Sociotechnical</em> is the chosen word by academics to discuss what the rest of us call the social web or web 2.0.</li>
<li>Adding a graph does not make a presentation look any more researched. It most likely exposes the flaws.</li>
<li>Wikipedia is much larger than I knew, and they still have a lot of ambitions.</li>
<li>Some people behind the scenes really enjoy office politics, which most likely creates a barrier with the rest of us.</li>
<li>One would think open source and academic research have close objectives, but collaboration remains hard.</li>
<li>The analysis performed leads to fascinating results.</li>
<li>The community is very diverse, and <a href="http://wikidocumentary.org">Truth in Numbers</a> is a very good demonstration of it for those who could not be there.</li>
</ul>
<p>As I came back home, I had a few days to wrap up projects before getting to work for a new client. All of which had to happen while fighting jet lag. I still did not get time to catch-up with the people I met, but I still plan on it.</p>
<p>One of the very nice surprises I had a few days ago is the recent formation of <a href="http://montrealouvert.net/">Montréal Ouvert</a> (the site is also partially available in English), which held it&#8217;s <a href="http://montrealouvert.net/2010/08/11/1e-reunion-de-montreal-ouvert-ouvert-a-tous/?lang=en">first meeting last week</a>. The meeting appeared like a success to me. I&#8217;m very bad at counting crowds, but it seemed to be somewhere between 40 and 50 people attending. Participants were from various professions and included some city representatives, which is very promising. However, the next steps are still a little fuzzy and how one may get involved is unclear. The organizers seemed to have matters well in hand. There will likely be some sort of hack fest in the coming weeks or months to build prototypes and show the case for open data. I don&#8217;t know how related this was to <a href="http://port25.ca/archive/2010/06/29/war-is-over-if-you-want-it.aspx">Make Web Not War</a> a few <a href="http://blog.lphuberdeau.com/wordpress/2010/05/what-could-we-do/">months prior</a>. It may just be one of those idea whose time has come.</p>
<p>I also got to spend a little time in Ottawa to meet with the <a href="http://bigbluebutton.org/">BigBlueButton</a> team and discuss further integration with <a href="http://tiki.org">Tiki</a>. At this time, the integration is minimal because very few features are fully exposed. Discussions were fruitful and a lot more should be possible with the now in development version 0.8. Discussing the various use cases indicated that we did not approach the integration using the same metaphor, partially because it is not quite explicit in the API. The integration in Tiki is based on the concept of rooms as a permanent entity that you can reserve through alternate mechanisms, which maps quite closely to how meeting rooms work in physical spaces. The intended integration was mostly built around the concept of meetings happening at a specific moment in time. Detailed documentation cannot always explain the larger picture.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.lphuberdeau.com/wordpress/2010/08/summer-report/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Upcoming events</title>
		<link>http://blog.lphuberdeau.com/wordpress/2010/06/upcoming-events/</link>
		<comments>http://blog.lphuberdeau.com/wordpress/2010/06/upcoming-events/#comments</comments>
		<pubDate>Wed, 23 Jun 2010 14:37:24 +0000</pubDate>
		<dc:creator>Louis-Philippe Huberdeau</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Wiki]]></category>

		<guid isPermaLink="false">http://blog.lphuberdeau.com/wordpress/?p=376</guid>
		<description><![CDATA[This summer, I will have my largest event line-up around a single theme. None of which will be technical! It will begin on June 25th with RecentChangesCamp (RoCoCo, to give it a French flavor) in Montreal. I first attended that event the last time it was in Montreal and again last year in Portland. It&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>This summer, I will have my largest event line-up around a single theme. None of which will be technical! It will begin on June 25th with <a href="http://rococo2010.org/wiki/Main_Page">RecentChangesCamp</a> (RoCoCo, to give it a French flavor) in Montreal. I first attended that event the last time it was in Montreal and again last year in Portland. It&#8217;s the gathering of wiki enthusiasts, developers, and pretty much anyone who cares to attend (it&#8217;s free). The entire event is based around the concept of <a href="http://en.wikipedia.org/wiki/Open-space_meeting">Open Space</a>, which means you cannot really know what to expect. Both times I attended, it had a strong local feel, even though the event moves around.</p>
<p>Next in line is <a href="http://www.wikisym.org">WikiSym</a>, which will be held in Gdańsk (Poland) on July 7-9th. I also attended it twice (Montreal in 2007, Porto 2008). I missed last year&#8217;s in Orlando due to a schedule conflict. WikiSym is an ACM conference, making it the most expensive wiki conference in the world (still fair, by other standards). Unlike the other ones which are more community-driven, this one is from the academic world (you know it when they refer to you as a <em>practitioner</em>). Most of the presentations are actually paper presentations. Because of that, attending the actual presentations is not so valuable as the entire content is provided as you get there. It&#8217;s much better to spend time chatting with everyone in the now-tradition Open Space. It really is a once per year opportunity to get everyone who spent years studying various topics around wikis from all over the world. Local audience is almost absent, except for the fact that the event tends to go to places where there is a non-null scientific wiki community.</p>
<p>Final stop will be <a href="http://wikimania2010.wikimedia.org/wiki/Main_Page">WikiMania</a>, at the exact same location as the previous one until July 11th. I really don&#8217;t know what to expect there. I never attended the official WikiMedia conference. However, it has a fantastic website with tons of relevant information for attendees. It probably has something to do with it being an open wiki and being attended by Wikipedia contributors.</p>
<p>I will next head toward Barcelona for a mandatory <a href="http://tikiwiki.org/TikiFestBarcelona2">TikiFest</a>. However, I don&#8217;t really consider this to be in the line-up as it&#8217;s mostly about meeting with friends.</p>
<p>That is three events on wikis and collaboration. Wikis being the <a href="http://www.wiki.org/wiki.cgi?WhatIsWiki">simplest database that could possibly work</a>, what could require 8-9 days on a single topic? It turns out the technology does not really matter. Just like software, writing is not hard. Getting many people to do it together is a much bigger challenge. Organizing the content alone to suit the needs of a community is challenging. Because the structure is so simple, it puts a lot of pressure on humans to link it all together, navigate the content and find the information they are looking for.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.lphuberdeau.com/wordpress/2010/06/upcoming-events/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>More information overload please</title>
		<link>http://blog.lphuberdeau.com/wordpress/2010/05/what-could-we-do/</link>
		<comments>http://blog.lphuberdeau.com/wordpress/2010/05/what-could-we-do/#comments</comments>
		<pubDate>Thu, 27 May 2010 16:12:04 +0000</pubDate>
		<dc:creator>Louis-Philippe Huberdeau</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blog.lphuberdeau.com/wordpress/?p=370</guid>
		<description><![CDATA[A few months back, Microsoft made a presentation on OData at PHP Quebec. While I found the format interesting at first, with the way you can easily navigate and explore the dataset, I must admit I was a bit skeptical. After all, public organizations handing out their data to Microsoft does sound like a terrible [...]]]></description>
			<content:encoded><![CDATA[<p>A few months back, Microsoft made a presentation on <a href="http://www.odata.org/">OData</a> at PHP Quebec. While I found the format interesting at first, with the way you can easily navigate and explore the dataset, I must admit I was a bit skeptical. After all, public organizations handing out their data to Microsoft does sound like a terrible idea. While a lot of that data will be hosted on Microsoft technologies, the format remains open, and it appears to be picking up.</p>
<p>I guess what was missing originally for me was a real use case for it. The example at the presentation used a sample database with products and inventory. Completely boring stuff. Today, at <a href="http://www.webnotwar.ca/">Make Web Not War</a>, I had a conversation with <a href="http://blog.syntaxc4.net">Cory Fowler</a> (with <a href="http://port25.ca">Jenna Hoffman</a> sitting close by) who has been promoting OData for the city of Guelph. I got convinced right away that this was the right way to go. Not necessarily, OData as a format, but opening up information for citizens to explore and improve the city. If OData can emerge as a widespread standard to do it, it&#8217;s fine by me. The objective is far away from technology. In fact, when I look at it, I barely see it. It&#8217;s about providing open access to information for anyone to use. How they will use it is up to them.</p>
<p>The conference had a competition attached to it. Two projects among the finalists were using OData. One of them created a driving game with checkpoints in the city of Vancouver. They simply used the map data to build the streets and position buildings. That is a fairly ludicrous use of publicly available information, but still impressive that a small team could build a reasonable game environment in a short amount of time. The other project used data from Edmonton to rank houses based on the availability of nearby services, basically helping people seeking new properties to evaluate the neighborhood without actually getting away from their computer.</p>
<p>This is only the tip of the iceberg. The data made available at this time is mostly geographical. Cities expose the location of the various services they offer. The uses you can make out of it are quite limited. I&#8217;ve seen other applications helping you locate nearby parks or libraries. Sure, knowing there is a police station nearby is good, but there could be so much more. What we need is just more data: crime locations, incident reports, power usage, water consumption. Once relevant information is out there, small organizations and even businesses will be able to use it to find useful information and track it over time. At this time, a lot of the data is collected but only accessible by a few people. Effort duplication occurs when others attempt to collect it. Waste. Decisions are made based on poor evidence.</p>
<p>So there is information out there for Vancouver, Edmonton, even Guelph. Nothing about Montreal. Nothing in the province of Quebec that I could find. I think this is just sad.</p>
<p>Actually, if there is anything out there, it might be very hard to find. Even if there are these great data sources available openly, it remains hard to find them. There is no central index at this time. Even if there were, the question remains of what should go in it. Official sources? Collaborative sources? Not that there is anything like that, but consider people flagging potholes on streets with their mobile phones as they walk around. Of course, accuracy would vary, but it would serve as a great tool for the city employees to figure out which areas should become a priority. There are so many opportunities and so many challenges related to open data access. I don&#8217;t think we are fully prepared for the shift yet.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.lphuberdeau.com/wordpress/2010/05/what-could-we-do/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Only the words change</title>
		<link>http://blog.lphuberdeau.com/wordpress/2010/05/only-the-words-change/</link>
		<comments>http://blog.lphuberdeau.com/wordpress/2010/05/only-the-words-change/#comments</comments>
		<pubDate>Sat, 15 May 2010 19:44:19 +0000</pubDate>
		<dc:creator>Louis-Philippe Huberdeau</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.lphuberdeau.com/wordpress/?p=363</guid>
		<description><![CDATA[Amazon has brought me back to 1975 and the Mythical Man-Month. It had been on my reading list for quite a while, but at some point around two years ago, it became unavailable. After that, it sat on a shelf for a few months until the stack got down to it. I must say, skip [...]]]></description>
			<content:encoded><![CDATA[<p>Amazon has brought me back to 1975 and the Mythical Man-Month. It had been on my reading list for quite a while, but at some point around two years ago, it became unavailable. After that, it sat on a shelf for a few months until the stack got down to it. I must say, skip a few technical details and this book could very well have been written last year. After all, in 1975, structured programming (that is, conditions and loops) was a recent concept and not widely adopted. Surprisingly, Brooks knew a whole lot about software development, testing and management. I have the feeling we have learned <em>nothing</em> since it was written. Concepts were only refined, renamed and spread out. As far as I can tell, just a few paragraphs in chapter 13 lays out the founding concepts of TDD.</p>
<blockquote><p><strong>Build plenty of scaffolding.</strong> By scaffolding, I mean all programs and data built for debugging purposes but never intended to be in the final product. It is not unreasonable for the to be half as much code in scaffolding as there is in product.</p>
<p>One form of scaffolding is the <em>dummy component</em>, which consists only of interfaces and perhaps some faked data or some small test cases. For example, a system may include a sort program which isn&#8217;t finished yet. Its neighbors can be tested by using a dummy program that merely reads and tests the format of input data, and spews out a set of well-formatted meaningless but ordered data.</p>
<p>Another form is the <em>miniature file</em>. A very common form of system but is misunderstanding of formats for tape and disk files. So it is worthwhile to build some little files that have only a few typical records, but all the descriptions, pointers, etc.</p>
<p>[...]</p>
<p>Yet another form of scaffolding are auxiliary programs. Generators for test data, special analysis printouts, cross-reference table analyzers, are all examples of the special-purpose jigs and fixtures one may want to build.</p>
<p>[...]</p>
<p><strong>Add one component at a time.</strong> This precept, too, is obvious, but optimism and laziness tempt us to violate it. To do it requires dummies and other scaffolding, and that takes work. And after all, perhaps all that work won&#8217;t be needed? Perhaps there are no bugs?</p>
<p>No! Resist the temptation! That is what systematic system testing is all about. One must assume that there will be lots of bugs, and plan an orderly procedure for snaking them out.</p>
<p>Note that one must have thorough test cases, testing the partial systems after each new piece is added. And the old ones, run successfully on the last partial sum, must be rerun on the new one to test for system regression.</p></blockquote>
<p>Does it sound familiar? I see test cases, test data, mock objects, fuzzing and quite a lot of things we hear about these days. Certainly, it was different. They had different constraints at the time, like having to schedule to get access to a batch-processing machine. There is some discussion about interactive programming and how it would speed up the code and test cycles.</p>
<p>I find it impressive given that they had so little to work with. I wasn&#8217;t even born when they figured that out.</p>
<p>Because the experience is based on system programming for an operating system to be ran on a machine built in parallel, there is a strong emphasis on top-down design, which is <em>the most important new programming formalization of the decade</em>, and requirements. To me, the word requirement is a scary one. I don&#8217;t do system programming and for what I do, prototyping and communication does a much better job. However, I found the take interesting.</p>
<blockquote><p><strong>Designing the Bugs Out</strong></p>
<p><strong>Bug-proofing the definition.</strong> The most pernicious and subtle bugs are system bugs arising from mismatched assumptions made by the authors of various components. The approach to conceptual integrity discussed above in Chapters 4, 5 and 6 addresses these problems directly. In short, conceptual integrity of the product not only makes it easier to use, it also makes it easier to build and less subject to bugs.</p>
<p>So does the detailed, painstaking architectural effort implied by that approach, V. A. Vyssotsky, of Bell Telephone Laboratories&#8217; Safeguard Project, says, &#8220;The crucial task is to get the product defined. Many, many failures concern exactly those aspects that were never quite specified.&#8221; Careful function definition, careful specification, and the disciplined exorcism of frills of function and flights of technique all reduce the number of system bugs that have to be found.</p>
<p><strong>Testing the specification</strong></p>
<p>Long before any code exists, the specification must be handed to an outside testing group to be scrutinized for completeness and clarity. As Vyssotsky says, the developers themselves cannot do this: &#8220;They won&#8217;t tell you they don&#8217;t understand it; they will happily invent their way through the gaps and obscurities.&#8221;</p></blockquote>
<p>Beyond the punch line, this does call for very detailed specifications. It felt to me that those were in-retrospect comments. I don&#8217;t think it was ever made that the specification were fully detailed enough for bugs to be driven out. If it had been, you would end up with an issue introduced in the previous chapter: &#8220;There are those who would argue that the OS/360 six-foot shelf of manuals represents verbal diarrhea, that the very voluminosity introduces a new kind of incomprehensibility. And there is some truth in that.&#8221;</p>
<p>Very detailed specifications, just like exhaustive documentation, will reach a point where it does not bring value because no one can get through all of it in a reasonable time frame. Attempting to find inconsistencies in a few pages of requirements is possible. Not in thousands of pages. The effort required is just surrealist. Sadly, following the advice of strong and detailed requirements, the world of software development sunk in waterfall in the years that followed. For all the great insight in the book we collectively realized decades later, this one got too influential.</p>
<p>Imagine if instead, the industry had used the advice of smaller, more productive teams when possible, higher-level programming languages and extensive <em>scaffolding</em> as the primary advice of the book where software development would be today.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.lphuberdeau.com/wordpress/2010/05/only-the-words-change/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Branching, the cost is still too high</title>
		<link>http://blog.lphuberdeau.com/wordpress/2010/04/branching-the-cost-is-still-too-high/</link>
		<comments>http://blog.lphuberdeau.com/wordpress/2010/04/branching-the-cost-is-still-too-high/#comments</comments>
		<pubDate>Tue, 06 Apr 2010 17:03:16 +0000</pubDate>
		<dc:creator>Louis-Philippe Huberdeau</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.lphuberdeau.com/wordpress/?p=349</guid>
		<description><![CDATA[Everyone&#8217;s motivation to move to distributed version control systems (DVCS) was that the cost of branching was too high with Subversion. Part of it is true, but even with DVCS, I find the cost of branching to be too high for my taste. I can create feature branches for branches of a decent size, but [...]]]></description>
			<content:encoded><![CDATA[<p>Everyone&#8217;s motivation to move to distributed version control systems (DVCS) was that the cost of branching was too high with Subversion. Part of it is true, but even with DVCS, I find the cost of branching to be too high for my taste. I can create feature branches for branches of a decent size, but I think traceability needs even even more granularity.</p>
<p>Let&#8217;s begin by listing my typical process to handle feature branches these days.</p>
<ol>
<li>Branch trunk from the repository to my local copy</li>
<li>Copy configuration files from an other branch</li>
<li>Make minor changes</li>
<li>Run scripts to initialize the environment.</li>
<li>&#8230;</li>
<li>Develop, commit, pull, merge &#8211; all of this is great</li>
<li>&#8230;</li>
<li>Push to trunk</li>
</ol>
<p>My problem is that dealing with those configuration files takes too much time and that is still troublesome. However, there is no real way around it. The application needs to connect to MySQL, Gearman, Sphinx and Memcached. On development setups, they are all on the same machine. Still because I am way too lazy to create new database instances and I often don&#8217;t change my prefixes as much as I should, I end up with multiple branches sitting there with only one really usable at any time. Of course, it would all be solved if I were more disciplined, but if it annoys me, it prevents me from doing it right. Just having to do the configuration part encourages me to re-use branches.</p>
<p>The goal of fine-grained branches is to represent the decision-making process as part of the revision control. The way I see it, top level branches represent a goal. It could be implementing a new feature, enhancing a piece of the user interface or anything. However, to reach those top level objectives, it may be required to perform some refactoring or upgrade a library. If those changes are made atomically through a branch and merged as a single commit, there would be ways to look at the hierarchy of commits to understand the flow of intentions. Bazaar can generate graphs from forks and merges. I can imagine tools to help traceability if the decision making is organized in the branch structure.</p>
<p>Why traceability you might ask. For many things that don&#8217;t seem to make sense in code, there is a good historical reason (unless it&#8217;s due to accidental complexity). Even in my own code written a few months prior, I find places that need refactoring. Most of the time, it&#8217;s simply because I was trying to look too far ahead at the time. I was anticipating the final shape of the software, but by the time it got there, new and better ways to achieve the same result had been implemented, leaving legacy behind. When this happens to be in my own code, I can think about the process that led to it, figure out what the original intention was and decide how the design should be adapted to the new reality. When the code is written by someone else, the original intention can only be guessed. I hope creating a hierarchy of branches can provide an outline of the thought process that would explain the decisions made.</p>
<p>My Subversion reflexes pointed me towards <em>bzr switch</em>. It brings a change to the way I got used to work with a DVCS.  My transition was to switch the concept of working copy to branching. Check-outs simply had no use. I was wrong. They can actually fix my issue of configuration burden. If I keep a single check-out of the code that is configured for my local environment, I can then switch it from one branch to an other. Because we are in the distributed world, those other branches can be kept locally, just not in the working copy. The process then changes.</p>
<ol>
<li>Create a new branch locally</li>
<li>Switch the check-out to the new branch</li>
<li>&#8230;</li>
<li>Develop, commit, pull, merge</li>
<li>&#8230;</li>
<li>Switch check-out to parent branch</li>
<li>Merge local branch</li>
</ol>
<p>Of course, if changes happen in the configuration files outside of what was locally configured or the schema changes, this has to be dealt with, but I expect this to be much less frequent.</p>
<p>The next step will be to rebuild my development environment in a smarter way. Right now, I have way too many services running locally. I want to move all of those to a virtual machine, which I will fire up when I need them. For this step, I am waiting for the final release of Ubuntu 10.04, and probably a few more weeks. In the past, I had terrible experiences with pre-release OS and learned to stay away, no matter how fun and attractive new features are. It also means re-installing my entire machine, so I don&#8217;t look so much towards that. It should be easier now that almost everything is web-based, as long as I don&#8217;t loose those precious passwords.</p>
<p>Using virtual machine to keep your primary host clean of any excess is nothing new. I guess I did not do it before I though my disk space was more limited than it is. My laptop has a 64G SSD drive. It was a downscale from my previous laptop&#8217;s drive, which was continuously getting full. Too many check-outs, database dumps, log files. They just keep piling up over the years. It turns out the overhead of having an extra operating system isn&#8217;t that bad after all.</p>
<p>The good thing about virtual machines is that they are completely disposable. You can build it with the software you need, take a snapshot and move on from there. Simply reverting back to the snapshot will clean up all the mess created. Only one detail to keep in mind: no permanent data can be stored in there. I will keep my local branches on the main host and the check-out in the virtual machine. Having a shell on a virtual machine won&#8217;t make much of a difference than a shell locally.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.lphuberdeau.com/wordpress/2010/04/branching-the-cost-is-still-too-high/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Improving rendering speed</title>
		<link>http://blog.lphuberdeau.com/wordpress/2010/03/improving-rendering-speed/</link>
		<comments>http://blog.lphuberdeau.com/wordpress/2010/03/improving-rendering-speed/#comments</comments>
		<pubDate>Wed, 24 Mar 2010 15:26:57 +0000</pubDate>
		<dc:creator>Louis-Philippe Huberdeau</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.lphuberdeau.com/wordpress/?p=342</guid>
		<description><![CDATA[Speed is a matter of perception. We&#8217;d like to believe it&#8217;s all due to computational power or the execution speed of queries. There are barriers that should not be crossed, but in most case, getting your application to behave correctly while the user is waiting will improve the perspective. Improving the rendering speed is a [...]]]></description>
			<content:encoded><![CDATA[<p>Speed is a matter of perception. We&#8217;d like to believe it&#8217;s all due to computational power or the execution speed of queries. There are barriers that should not be crossed, but in most case, getting your application to behave correctly while the user is waiting will improve the perspective. Improving the rendering speed is a good step and tweaking a few settings will improve perception more than trimming off milliseconds from an SQL query.</p>
<p>A now classic example of the effects of perception is the one of <a href="http://www.chrisharrison.net/projects/progressbars/index.html">progress bars</a>. When moving forward at different rates, even though the total time remains the same, will give the impression of being shorter or longer.</p>
<p>Fiddling with HTTP headers is actually very simple and will help lower the load on your server too. A hit you don&#8217;t get is so much faster to serve. Both Yahoo! and Google turned this optimization pain into a game by providing scores to increase. If you are not familiar with them, consider installing <a href="http://developer.yahoo.com/yslow/">YSlow</a> and <a href="http://code.google.com/speed/page-speed/">Page Speed</a> right away.  Now, if you&#8217;ve never used them before, chances are running it on your own website will provide terrible scores. Actually, running it on most of the websites out there provides terrible scores.</p>
<p>Both of them will complain about a few items:</p>
<ul>
<li>Too many HTTP requests.</li>
<li>Missing expires headers</li>
<li>Uncompressed streams</li>
<li>Unminified CSS and Javascript</li>
<li>Recommend use of CDN</li>
</ul>
<h3>Fewer files</h3>
<p>Now, the too many HTTP requests are likely caused by those multiple JavaScript and CSS files you include. The JavaScript part is very simple. All you have to do is concatenate the scripts in the appropriate order, minify them and deliver it all as a single file. There are good tools out there to do it. Depending on how you deploy the application, some may be better than others. I&#8217;ve used a <a href="http://code.google.com/p/minify/">PHP implementation</a> to do it just in time and cached the result as a static file, and used a <a href="http://developer.yahoo.com/yui/compressor/">Java implementation</a> as part of a build process. I find the later to be a better option if it is possible.</p>
<p>This is easy enough for production environments, but it really makes development a pain. Debugging a minified script is not quite pleasant. In Tikiwiki, this simple became an other option. In a typical Zend Framework environment, APPLICATION_ENV is a good binding point for the behavior. Basically, you need to know the individual files that need to be served. If in a development environment, serve them individually. In a production or staging environment, serve the compiled file (or build it JIT if building is not an option).</p>
<p>Unless you live with an application that has been shielded from the real world for a decade, it&#8217;s very likely that most of the code you use was not written by you. It comes from a framework. You can skip those altogether by not distributing them at all. <a href="http://code.google.com/apis/ajaxlibs/">Google provides a content delivery network</a> (CDN) for those. Why is this faster? You don&#8217;t have to serve it, and your users likely won&#8217;t have to download it. Because the files are referenced by multiple websites, it&#8217;s very likely that they downloaded it and cached it locally in the past. They also serve the standard CSS files for JQuery UI (see <a href="http://jqueryui.com/development">bottom right corner</a>), although that&#8217;s not quite as well indicated (you should be able to find the <a href="http://ajax.googleapis.com/ajax/libs/jqueryui/1.8/themes/base/jquery-ui.css">pattern</a>).</p>
<p>Both of the minify libraries mentioned above also do the CSS minification. However, this is a bit more tricky as you will need to worry about the relative paths to images and imports of other CSS files.</p>
<p>The final step is to make sure all the CSS is in the header and the JavaScript at the bottom of the page.</p>
<h3>Web server tuning</h3>
<p>Now that the amount of files is reduced, your scores already improved significantly, an other class of issues will take over. Namely compression, expiry dates and improper ETags. The easiest to set-up is the compression. You will need to make sure mod_gzip or mod_deflate is installed in Apache. It almost always is. Everything is done transparently. All you need to do is make sure the right types are set. It can be done in the .htaccess file. Here is an example for mod deflate.<br />
<code>&lt;IfModule deflate_module&gt;<br />
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css application/javascript<br />
&lt;/IfModule&gt;</code><br />
Use firebug to see the content type of all files YSlow is still complaining about and add them to the list.</p>
<p>An other easy target is the ETag declaration. In most installs, Apache will generate an ETag for static files. ETags are a good idea. The browser remembers the last ETag it received for a given URI and requests it back asking if it changed. The server compares it and either sends 304 to indicate it was not modified or the new version. The problem is that your server still gets a hit. You&#8217;re better off not having them at all.<br />
<code>&lt;FilesMatch "\.(js|png|gif|jpg|css)$"&gt;<br />
FileEtag None<br />
&lt;/FilesMatch&gt;</code><br />
Expiry headers are a bit more tricky. When those occur in your scripts, you have to deal with them. Setting an expiry date means accepting that your users might not see the most recent version of the content because they won&#8217;t query your server to check. These may not be easy decisions to make.</p>
<p>However, static files are much easier to handle. You will need mod_expires in Apache, which is not quite as common as the compression counterpart. The goal is just to set an arbitrary date in the future. Page Speed likes dates further than a month away. YSlow seems to settle for 2 weeks. The documentation uses 10 years. It should be far enough.<br />
<code>&lt;FilesMatch "\.(js|png|gif|jpg|css|ico)$"&gt;<br />
ExpiresActive on<br />
ExpiresDefault "access plus 10 years"<br />
&lt;/FilesMatch&gt;</code></p>
<h3>Cookies</h3>
<p>Your website most likely uses a cookie to track the session. They are great for your PHP scripts that need them to track who&#8217;s visiting, but they also happen to be sent to static files as well because the browser does not know it makes no difference. Cookies alter the request and cause confusion to intermediate caches or whenever the cookies change, like when you <a href="http://ca.php.net/manual/en/function.session-regenerate-id.php">change the session id</a> to avoid session hijacks.</p>
<p>The easiest way avoid those cookies from being sent to the static files is to place them on a different server. Luckily, browsers don&#8217;t really know how things are organized on the other hand, so just using a different domain or sub-domain pointing to the exact same application will do the trick. If you have more load, you might want to serve them with a different HTTP server altogether, but that requires more infrastructure. It should be easy to push JavaScript and CSS to the other domain. Reaching the images will depend on the structure of your application. You will thank those view helpers if you have any.</p>
<p>If you serve some semi-dynamic files through that domain, make sure PHP does not start the session, otherwise, all this was futile.</p>
<p>You can then configure YSlow&#8217;s CDN list to include that other domain and the google CDN, and observe blazing scores. To modify the configuration, you need to edit Firefox preferences. Type <em>about:config</em> in the URL bar, say you will be careful, search for <em>yslow</em> and modify the cdnHostnames property to contain a comma separated list of domains.</p>
<h3>One more</h3>
<p>By default, PHP sends a ridiculous Cache-Control header. It basically asks the browser to verify for a new version of the script on every request. When you user presses back, you get a new request, and he will likely loose local modifications in the form. Not really nice, and one too many hit on your server. Setting the header to something like <em>max-age=3600, must-revalidate</em>, will resolve that issue and make navigation on your site look so much faster.</p>
<p>These items should cover most of the frequent issues. Both tools will report a few minor issues, some which may be easy to fix, some not so much. Make those verifications part of the release procedure. A new type may get introduced in the application and cause less than optimal behaviors due to the lack of a few headers. It may not be possible to get a perfect score on all pages of a site, but if you can cover the most important one, your users may believe your site is fast, even though you use a heavy framework.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.lphuberdeau.com/wordpress/2010/03/improving-rendering-speed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A bit too much</title>
		<link>http://blog.lphuberdeau.com/wordpress/2010/03/a-bit-too-much/</link>
		<comments>http://blog.lphuberdeau.com/wordpress/2010/03/a-bit-too-much/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 15:55:16 +0000</pubDate>
		<dc:creator>Louis-Philippe Huberdeau</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blog.lphuberdeau.com/wordpress/?p=340</guid>
		<description><![CDATA[Well, Confoo is now over. That is quite a lot of stress off my shoulders. Overall, I think the conference was a large success and opened up nice opportunities for the future. Over the years, PHP Quebec had evolved to include more and more topics related to PHP and web development. This year was the [...]]]></description>
			<content:encoded><![CDATA[<p>Well, Confoo is now over. That is quite a lot of stress off my shoulders. Overall, I think the conference was a large success and opened up nice opportunities for the future. Over the years, PHP Quebec had evolved to include more and more topics related to PHP and web development. This year was the natural extension to shift the focus away from PHP and towards web, including other programming languages such as Python and sharing common tracks for web standard, testing, project management and security. Most of the conference was still centered around PHP and that was made very clear on Thursday morning during Rasmus Lerdorf&#8217;s presentation (which had to be moved to the ballroom with 250-300 attendees, including some speakers who faced an empty audience), but hopefully, the other user groups will be able to grow in the next year.</p>
<p>Having 8 tracks in parallel was a bit too much. It made session selection hard, especially since I always keep some time for hallway sessions. I feel that &#8220;track&#8221; lost quite a lot of participants this year compared to the previous ones.</p>
<p>For my own sessions, I learned a big lesson this year. I should not bite more than I can chew. It turns out some topics are much, much, harder to approach than others. A session on refactoring legacy software seemed like a great idea, until I actually had to piece together the content for it. I had to attempt multiple ways to approach the topic and ended up with one way that made some sense to me, but very little to the audience it seems. I spent so much time distilling and organizing the content that I had very little time to prepare the actual presentation for it. What came out was mostly a terrible performance on my part. I am truly sorry for that.</p>
<p>Lesson of the year: Never submit topics that involve abstract complexity.</p>
<p>The plan I ended up with was a little like this:</p>
<ul>
<li>Explain why rewriting the software from scratch is not an option. Primarily because management will never accept, but also because we don&#8217;t know what the application does in details and the maintenance effort won&#8217;t stop during the rewrite.</li>
<li>Bringing a codebase back to life requires a break from the past. Developers must sit down and determine long term objectives and directions to take, figure out what aspects of the software must be kept and those that must change, and find a few concrete steps to be taken.</li>
<li>The effort is futile if the same practices that caused degradation are kept. Unit testing should be part of the strategy and coding standards must be brought higher.</li>
<li>The rest of the presentation was meant to be a bit more practical on how to gradually improve code quality by removing duplication, breaking dependencies to APIs, improving readability and removing complexity by breaking down very large functions in more manageable units.</li>
</ul>
<p>As I was presenting, my feeling was that I was on one side preaching to converts that had done this before and knew it worked, and the rest of the crowd who did not one to hear it would take a while and thought I was an idiot (emphasized by my poor performance, which I was aware of).</p>
<p>An other factor that came in the mix was that I actually had two presentations. Both of which I had never given before, so both had to be prepared. Luckily, the second one on unit testing was a much easier topic and I find that one went better. It was in a smaller room with fewer people. Everyone was close by, so it was a lot closer to a conversation. I accepted questions at any time. Surprisingly, they came in pretty much the same order I had prepared the content in for the most part. The objective of this session was to bootstrap with unit testing. My intuition told me that the main thing that prevented people from writing unit tests was that they never know where to start. My plan was:</p>
<ul>
<li>Explain quickly how unit testing fits in the development cycle and why test-first is really more effective if you want to write tests. I went over it quickly because I know everyone had that sermon before. I rather placed the emphasis on getting started with easy problems first as writing tests requires some habits. It&#8217;s perfectly fine to get started with new developments before going back to older code and test it.</li>
<li>Jump in a VM and go through the installation process for PHPUnit, setting up phpunit.xml and the bootstrap script, writing a first test to show it runs and can generate code coverage. I did it using TDD, showing how you first write the test, see it fail, then do what&#8217;s required to get it to pass.</li>
<li>Keeping it hands on, go through various assertions that help writing more expressive tests, using setUp, tearDown and data providers to shorten tests.</li>
<li>Move on to more advanced topics such as testing code that uses a database or other external dependency. I ran out of time on this one, so I could not make any live example of it.</li>
</ul>
<p>I was quite satisfied with the type of interaction I had with the audience during the presentation and the feedback was quite positive too. It was a small room organized in a way that I was surrounded by the audience close by rather than in a long room barely seeing who I was speaking to. Although there were only 15 attendees, I am confident they got something they can work with.</p>
<p>I could have used a dry run before the presentation. I had done one two weeks prior, but that wasn&#8217;t quite fresh in my mind, so it was not quite fluid, but some of it was desired to show where to find the information.</p>
<p>During the other sessions I attended, I made two nice discoveries: Doctrine 2 which came up with a very nice structure that I find very compatible with the PHP way and MongoDB, a document-based database with a very nice way to manipulate data and that has nice performance attributes for most web applications out there.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.lphuberdeau.com/wordpress/2010/03/a-bit-too-much/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Bad numbers</title>
		<link>http://blog.lphuberdeau.com/wordpress/2010/02/bad-numbers/</link>
		<comments>http://blog.lphuberdeau.com/wordpress/2010/02/bad-numbers/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 00:12:46 +0000</pubDate>
		<dc:creator>Louis-Philippe Huberdeau</dc:creator>
				<category><![CDATA[Books]]></category>
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blog.lphuberdeau.com/wordpress/?p=317</guid>
		<description><![CDATA[The most frequently quoted numbers in software engineering are probably those of the Standish Chaos report starting in 1995. To cut the story short, they are the ones that claim that only 16% of software projects succeed. Personally, I never really believed that. If it were that bad, I wouldn&#8217;t be making a living from [...]]]></description>
			<content:encoded><![CDATA[<p>The most frequently quoted numbers in software engineering are probably those of the Standish Chaos report starting in 1995. To cut the story short, they are the ones that claim that only 16% of software projects succeed. Personally, I never really believed that. If it were that bad, I wouldn&#8217;t be making a living from writing software today, 15 years later. The industry would have been shut down long before they even published the report. As I was reading <a href="http://www.amazon.ca/Practical-Software-Estimation-Insourced-Outsourced/dp/0321439104">Practical Software Estimation</a>, which had a mandatory citation of the above, I came to ask myself why there was such a difference. I don&#8217;t have numbers on this, but I would very surprised to hear any vendor claiming less than 90% success rate on projects. Seriously, with 16% odds of winning, you&#8217;re better off betting your company&#8217;s future on casino tables than on software projects.</p>
<p>The big question here is: what is a successful project? On what basis do they claim such a high failure rate?</p>
<p>Well, I probably don&#8217;t have the same definition of success and failure. I don&#8217;t think a project is a failure even if it&#8217;s a little behind schedule or a little over budget. From an economic standpoint, as long as it&#8217;s profitable, and above the <a href="http://en.wikipedia.org/wiki/Opportunity_cost">opportunity cost</a>, it was worth doing. Sometimes, projects are canceled for external factors. Even though we&#8217;d like to see the project development cycle as a black box in which you throw a fixed amount of money and get a product in the end, that&#8217;s not the way it is. The world keeps moving and if something comes out and makes your project obsolete, canceling it and changing direction is the right thing to do. Starting the project was also the right thing to do based on the information available at the time. Sure some money was lost, but if there are now cheaper ways to achieve the objective, you&#8217;re still winning. As Kent Beck reminds us, <a href="http://www.threeriversinstitute.org/blog/?p=438">sunk costs are a trap</a>.</p>
<p>When a project gets canceled, the executives might be really happy because they see the course change. The people that spent evenings on the project might not. Perspective matters when evaluating success or failure. However, when numbers after the fact are your only measure for success, that may be forgotten. Looking back, it won&#8217;t matter if a hockey team gave a good fight in the third period. On the scoreboard, they lost, and the scoreboard is what everyone will look back to.</p>
<p>Luckily, I wasn&#8217;t the only one to question what Standish was smoking when they drew those alarming figures. In the latest issue of IEEE Software, I came across a very interesting title: <a href="http://www.computer.org/portal/web/csdl/doi/10.1109/MS.2009.154">The Rise and Fall of the Chaos Report Figures</a>. In the article, J. Laurenz Eveleens and Chris Verhoef explain how the analysis made inevitably leads to this kind of number. The data used by Standish is not publicly available, so analyzing it correctly is not possible (how many times do we have to ask for open data?). However, they were able to tap into other sources of project data to perform the analysis and come up with a very clear explanation of their results.</p>
<p>First off, my question was answered. The definition of success for Standish is pretty much arriving under budget based on the initial estimate of the project with all requirements met. Failure is being canceled. Everything else goes into the <em>Challenged</em> bucket, including projects completing slightly over budget and projects with changed scope. Considering that bucket contains half of the projects, questioning the numbers is fairly valid.</p>
<p>I remember a presentation by a QA director (not certain of the exact title) from Tata Consulting at the Montreal Software Process Improvement Network a few years ago in which they were explaining their quality process. They presented a graph where they had some data collected from projects during the post-mortem and asking to explain the causes of some slippage or other types of failures (it was a long time ago, my memory is not that great), but there was a big column labeled <em>Miscellaneous</em>. At the time, I did not notice anything wrong with it. All survey graphs I had ever seen contained a big miscellaneous section. However, the presented highlighted the fact that this was unacceptable as it provided them with absolutely no information. In the next editions of the project survey, they replaced <em>Miscellaneous</em> with <em>Oversight</em>, a word which no manager in their right mind would use to describe the cause for their failures. Turns out the following results were more accurate for the causes. When information is unclear, you can&#8217;t just accept that it is unclear. You need to dig deeper and ask why.</p>
<p>The authors then explain how they used <a href="http://en.wikipedia.org/wiki/Barry_Boehm">Barry Boehm</a>&#8216;s long understood <a href="http://www.construx.com/Page.aspx?hid=1648">Cone of Uncertainty</a> and and <a href="http://en.wikipedia.org/wiki/Tom_DeMarco">Tom DeMarco</a>&#8216;s <a href="http://www.stickyminds.com/sitewide.asp?ObjectId=3392&amp;Function=DETAILBROWSE&amp;ObjectType=ART">Estimation Quality Factor</a> (published in 1982, long before the reports) to identify organizational bias in the estimates and explain how aggregating data without considering it leads to absolutely nothing of value. As an example, they point our a graph containing hundreds of estimations at various moments in the projects within an organization of forecast over actual ratio. The graphic is striking as apparently no dots exist above the 1.0 line (nearly none, there are a few very close by). All the dots indicate that the company only occasionally overestimates a project. However, the cone is very visible on the graph and there is a very significant correlation. They asked the right questions and asked why that was. The company simply, and deliberately, used the lowest possible outcome as their estimate, leading them to a 6% success rate based on the Standish definition.</p>
<p>I would be interested to see numbers on how many companies go for this approach rather than providing realistic (50-50) estimates.</p>
<p>Now, imagine companies buffering their estimates added to the data collection. You get random correlations at best because you&#8217;re comparing apples to oranges.</p>
<p>Actually, the article presents one of those cases of abusive (fraudulent?) padding. The company had used the Standish definition internally to judge performance. Those estimates were padded so badly, they barely reflected reality. Even at 80% completion some of the estimates were off by two orders of magnitude. Yes, that means 10000%. How can any sane decision be made out of those numbers? I have no idea. In fact, with such random numbers, you&#8217;re probably better off not wasting time on estimation at all. If anything, this is a great example of <a href="http://www.systemsguild.com/GuildSite/DandL/AustinForeword.html">dysfunction</a>.</p>
<p>The article concludes like this:</p>
<blockquote><p>We communicated our findings to the Standish Group, and Chairman Johnson replied: &#8220;All data and information in the Chaos reports and all Standish reports should be considered Standish opinion and the reader bears all risk in the use of this opinion.&#8221;</p>
<p>We fully support this disclaimer, which to our knowledge was never stated in the Chaos reports.</p></blockquote>
<p>It covers the general tone of the article. Above the entertainment value (yes, it&#8217;s the first time I ever associate entertainment with scientific reading) brought by tearing apart the Chaos report, what I found the most interesting was the well vulgarized use of theory to analyze the data. I highly recommend reading if you are a subscriber to the magazine or have access to the IEEE Digital Library. However, scientific publications remain restricted in access.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.lphuberdeau.com/wordpress/2010/02/bad-numbers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>CUSEC 2010</title>
		<link>http://blog.lphuberdeau.com/wordpress/2010/01/cusec-2010/</link>
		<comments>http://blog.lphuberdeau.com/wordpress/2010/01/cusec-2010/#comments</comments>
		<pubDate>Sun, 24 Jan 2010 02:58:45 +0000</pubDate>
		<dc:creator>Louis-Philippe Huberdeau</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blog.lphuberdeau.com/wordpress/?p=314</guid>
		<description><![CDATA[This year, I attended CUSEC for the 4th time, two of which I was an organizer. Even though the target audience is students and I graduated what feels a really long time ago now, I still wait avidly for the event every year. The conference isn&#8217;t really ever technically in depth, I see it as [...]]]></description>
			<content:encoded><![CDATA[<p>This year, I attended <a href="http://cusec.net/">CUSEC</a> for the 4th time, two of which I was an organizer. Even though the target audience is students and I graduated what feels a really long time ago now, I still wait avidly for the event every year. The conference isn&#8217;t really ever technically in depth, I see it as an opportunity to see some trends. Every single time, it seems to have a perfect mix. It tends to hook onto the new technologies and hypes. After all, the program <em>is</em> made up by students.</p>
<p>This year, I think everyone would agree that the highlight was <a href="http://pyre.third-bit.com/blog/">Greg Wilson</a> with a very strong invitation to raise the bar and ask for higher standards from software research. Very few quoted studies are actually statistically relevant in any way. I had seen the session before at Dev Days in Toronto (it was around 90% identical). I would see it again. Perhaps, there would be even fewer <em>FIXME</em> notes in the slides. Greg is currently in the process of publishing a book on evidence in software engineering practice to be published as part of the O&#8217;Reilly <em>Beautiful *</em> series. The book does not yet have a name, so I can&#8217;t pre-order on amazon and that&#8217;s truly disappointing.</p>
<p>One of the lower visibility session I found interesting was IBM&#8217;s David Turek on Blue Gene and scientific processing. Many discarded the session because it was given by a VP. Now, I don&#8217;t really care about scientific calculations. I believe it&#8217;s important, but it&#8217;s not where my interests lie. I am almost certain I will never use Blue Gene. However, what I found interesting was to see how they tackle extremely large problems. Basically, the objective is to have supercomputers with 1000 times the computational capacity we have today by the end of the decade. Using current technologies, you would need a nuclear power plant to provide it.</p>
<p>Finally, Thomas Ptacek&#8217;s session on security was mostly entertaining. It was one of those 3 hour session compressed into one hour. I don&#8217;t think I could catch everything, but he went over common developer flaws and how simple omissions can take down the entire security strategies. He concluded with a very useful decision making process: if your encryption strategy involved something else than GPG and SSL, refactor. It&#8217;s one of the problems I always had with cryptography APIs. There are too many options. Many of which are plain wrong and irresponsible to use. On the other hand, he was quite a pessimist during the question period, saying there is no hope to create secure software using the current tools and technologies. All software ever made eventually had flaws found in them.</p>
<p>One of the most troubling moments of the conference for me was to see how much some people can be disconnected. I actually came across a software engineering student (not a freshmen) who did not know what Twitter was. Not only did he not know, he had <em>never</em> heard of it. How is that even possible? I don&#8217;t use Twitter. I use <a href="http://status.net/">an open alternative</a>, and I&#8217;m not that much into microblogging. However, I do believe it somewhat reached mainstream. You can hear the word while watching news on TV. I really need to lower my assumptions about what people know.</p>
<p>Next conference for me will be <a href="http://confoo.ca/en">Confoo.ca</a>, where I will be presenting two sessions and struggling to choose which of 8 sessions to attend every hour for 3 days.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.lphuberdeau.com/wordpress/2010/01/cusec-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Decision making</title>
		<link>http://blog.lphuberdeau.com/wordpress/2010/01/decision-making/</link>
		<comments>http://blog.lphuberdeau.com/wordpress/2010/01/decision-making/#comments</comments>
		<pubDate>Mon, 11 Jan 2010 00:48:28 +0000</pubDate>
		<dc:creator>Louis-Philippe Huberdeau</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.lphuberdeau.com/wordpress/?p=309</guid>
		<description><![CDATA[As part of the day to day work of a software developer, decisions have to be taken every single day. Some have a minor impact and can be reverted at nearly no cost. Others have a significant impact on the project and reverting it would be a fundamental change. I have found that, in most [...]]]></description>
			<content:encoded><![CDATA[<p>As part of the day to day work of a software developer, decisions have to be taken every single day. Some have a minor impact and can be reverted at nearly no cost. Others have a significant impact on the project and reverting it would be a fundamental change. I have found that, in most cases, not making a decision at all is a much better solution. A lot of time is wasted evaluating technology. Out of all the options available out there, there is a natural tendency to do everything possible to pick the best of the crop, the one that will offer the most to the project and provides the largest amount of features for future developments. While the reasoning sounds valid, it&#8217;s an attempt to predict the future and will most likely be wrong.</p>
<p>Of course, the project you are working on is great and you truly believe it will be revolutionary. However, you&#8217;re not alone. Every day, thousands of other teams work on their own projects. Chances are they are not competitors, but most likely a complement to yours and will likely make the package you spent so much time selecting completely obsolete before you&#8217;ve used all those advanced features.</p>
<p>Too often, I see a failure to classify the type of decision that has to be made in projects. They are not all equal. Some deserve more time. In the end, it&#8217;s all about managing risks and contingencies. The very first step is to identify the real need and what the boundaries are with your system. No one needs Sphinx. People need to search for content in their system. Sphinx is one option. You could also use Lucene or even external engines if your data is public. What matters when integrating in this case is how the data will be indexed and how searching will be made. When trying out new technology, most will begin with a prototype, which then evolves into production code. At that point, a critical error was made. Your application became dependent on the API.</p>
<p>If you begin by making clear that the objective is to index the content in your system, you can design boundaries that isolate the engine-specific aspects and leave a cohesive &#8212; neutral &#8212; language in your application.</p>
<p>Effectively, having such a division allows you not to choose between Sphinx or Lucene or something else. You can implement one that makes sense for you today and be certain that required changes to move to something else will be localized. With your application logic to extract the data to be indexed and the logic for fetching results and displaying them left independent, the decision-making step becomes irrelevant.</p>
<p>Certainly, there is some overhead. You need to convert the data to a neutral format rather than simply fetching what the API wants and then convert it to the appropriate format. Some look at the additional layer and see a performance loss. In most cases when integrating with other systems, the <a href="http://c2.com/cgi/wiki?OneMoreLevelOfIndirection">additional level of indirection</a> really does not matter. You are about to call a remote system performing a complex operation over a network stack. If that wasn&#8217;t complex, you would have written it yourself.</p>
<p>A common pitfall is to create an abstraction that is too closely bound to the implementation rather than the real needs of the system. The abstraction must speak your system&#8217;s language and completely hide the implementation, otherwise, the layer serves no purpose. It&#8217;s a good idea to look at multiple packages and see how they work conceptually when designing the abstraction. While you&#8217;re not going to implement all of them, looking at other options gives a different perspective and helps in adjusting the level of abstraction.</p>
<p>Once the abstraction is in place. the integration problem is out of the critical path. You can implement the simplest solution, knowing that it won&#8217;t scale to the appropriate level down the road, but the simplest solution now will allow to focus on more important aspects until the limit is reached. When it will be, you will be able to re-asses the situation and select a better option knowing that changes will be localized to an adapter.</p>
<p>Abstracting away is a good design practice and it can be applied to almost any situation. It allows your code to remain clean, breaks dependencies to external systems that would otherwise make it hard to set-up the environment and decrease testability. Because the code is isolated, it leaves room for experimentation with a safety net. If the chosen technology proves to be unstable or a poor performer, you can always switch to something else.</p>
<p>While it works in most cases, it certainly does not work for some fundamental decisions, like the implementation language, unless you plan on <a href="http://www.codinghorror.com/blog/archives/000679.html">writing your own language that would compile in other languages</a>. Some abstractions just don&#8217;t make sense.</p>
<p>When you can&#8217;t defer decision making, stick with what you know. Sure you might want to try one this new framework in the cool new language. The core of your project, if you expect it to live, is no place to experiment. I have been using PHP for nearly a decade now. I&#8217;ve learned all the subtleties of the language. It is a better choice for me. I&#8217;ve used the Zend Framework on a few projects and know my way around it well enough. It&#8217;s a good solution for me. Both together are a much safer path than Python/Django or any alternative, no matter what Slashdot may say.</p>
<p>It might not sound like a good thing to say, but experimenting as part of projects is important. You can&#8217;t test a technology well enough unless it&#8217;s done part of a real project and a project is unlikely to be real unless it&#8217;s part of your job. It&#8217;s just important to isolate experiments to less critical aspects. It&#8217;s the responsible thing to do.</p>
<p>It&#8217;s all about risk management. Make sure all decisions you make are either irrelevant because they can be reverted at a low cost or use technologies you trust based on past experience and you will avoid bad surprises.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.lphuberdeau.com/wordpress/2010/01/decision-making/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
