From junior to senior

It came to me as a realization in the last few days. There is really a huge gap between juniors and seniors as programmers. It’s commonly said that there is a 1:20 variation in our profession, but how much is really associated to experience? That alone can account for an order of magnitude. It’s not a matter of not being smart. Juniors just waste time and don’t realize it. There are multiple reasons for it, but most of them are related to not asking for help.

The specs are unclear, but they don’t ask questions. Specs are always unclear. Just discussing the issue with anyone can help clarifying what has to be done. A junior will either sit idle and wait until he figures it out on his own or pick a random track and waste hours on it. In most cases, there is no right answers. It’s just a judgment call that can only be made by someone who knows the application throughout , understands the domain of application and the use cases. This takes years to learn, so just ask someone who knows.

They don’t have the knowledge necessary to go above a hump. The natural response when they don’t understand a piece of code is to search google for an entire day, even though someone a shout away could have provided the answer within 45 seconds. Sure the interruptions are bad and learning by yourself is a good skill to have, but wasting that amount of time is never a good deal. It takes a while to become comfortable with all the technologies involved in a non-trivial project.

They spend hours on a problem while they could have solved more important ones in less time. Prioritizing work is probably the most important aspect of being productive. Especially when you work in an old application that has countless problems, at the end of the day, you still need to get your objectives done. At some point as a programmer, you need to trust “management” or “technical direction” that the task that were assigned are probably the ones that bring the most value in the project, regardless of what you stumble across along the way.

All of this can be solved fairly easily. Before you begin a task, figure out what the real objectives are, how much time you think it’s going to take and how much time those who assigned it think it’s going to take. Unless they are technical, most managers have no clue how long something is going to take, but aligning the expectations is a key to successful projects. If you think it’s a one week effort and they thought you would only need to spend 2 hours on it, it’s probably better to set the clocks straight and not begin at all.

Even when money is not involved, either you are working for a governmental organization or an open source project as a volunteer, time remains valuable. All bugs are not equal. Not everything is worth spending time on and what should really be judged is the impact of the effort.

Even though most HR departments turned the concept of a time sheet into a joke by forcing all employees to report 40 hours of work per week, a real, detailed time sheet with tasks and how long they took to perform is a great tool for developers to improve their efficiency. Was all that time really worth spending?

At the end of each day, it’s a good thing to look back at what you worked on and ask yourself if it was really what you set out to do.

Once you’re half way through the allocated time, ask yourself if you’re still working on the real objectives. If you’re not, the solution is obvious: get back to it. If you’re still on the objective, but feel you are circling around, how about asking for help?

Once you’re past the allocated time, consider cutting your loss. Ask around. Maybe the request was just a nice to have. It’s really not worth spending too much time on. It may be more economic to assign it to someone else. Just inform about progress and expectations. It allows direction to re-align and support. There is nothing wrong with admitting failure. It just saves time and money in most cases.

Finding the appropriate words

Programming is easy. Once you know the constructs and the patterns, it really ends up being a mechanical task. Add a little discipline you catch one’s own mistakes and you’re on the right track to deliver working software. I find that one of the hardest task is to find the appropriate words to use. Writing code is mostly about creating abstractions and metaphors. Using the right words leads to code that is easy to read, understand and predict. Using the wrong ones lead to confusion and a whole stack of problems.

I don’t usually start writing code until I found suitable words to use. There is something magic to nailing the different concepts involved: responsibilities become clear and all the issues untangle themselves.  Code is so much simpler and faster to write when you don’t have to deal with exceptional conditions caused by mixed logic. Good class names set expectations for what the class actually does.

The analogy brought by the word is very important. Words are not context-exclusive. The other concepts they relate to in a different context are just as important. They provide the analogy and the expectations. The vocabulary surrounding the concept will organize the software. Will it make for the most efficient software? It might not, but it will be understandable. Last time I checked, code was meant to be read by humans, or we’d still be writing assembly.

Think about database, knowledge base, data storage and data warehouse. They all pretty much serve the same purpose, except that I don’t expect the same operations to be made on a database or a data storage.

This in fact has been one of my pet peeves for a long time. Business people use the word database for everything. If their grocery list was in a word document, it would probably be a database too. Talking to them is absurdly confusing. However, I figured out that the reason why they keep using it is that it’s really all the same thing to them and they understand each other around this word. Go figure how.

To software engineers, the word database has a much more precise meaning. The terminology surrounding data storage and manipulation is much richer. We simply don’t speak the same language. If may seem worst because we consume a lot of concepts, but I learned that all professions have similar issues. Telling an oncologist you have cancer is probably just as vague and meaningless.

This brings us back to word selection and speaking to the appropriate audience. The basic object oriented courses teach Domain Driven Development. Take the concepts in your domain and model your software around them. It makes a lot of sense at a high level and help the communication between the development team and the stakeholders. However, I have the feeling doing so restricts the software and prevents from making generic solutions. If you focus too much on a single use case, you don’t see the generic aspects and probably add complexity to the software to push in concepts that just don’t fit.

I see nothing wrong with using different words internally and externally. Code is to be understood by programmers, the front-end by the users. If you build a generic solution with words understood by programmers internally, adapting the front-end to a peculiar context is mostly about changing strings. If your application is domain specific all the way through, adapting to a different context means either rewriting, or asking someone to translate between two different domains that don’t match at all. I bet large consulting firms love domain driven because it allows them to rent a lot of staff for extended periods.

Anyway, good words don’t magically come up on a late evening when you need it in time. Good design requires time to draw itself, for the appropriate words to be found. If you aim for elegance in your software, you will very likely need to drop the deadlines, at least until all crucial blocks are in place. The only way I know to find the right words is to switch between contexts. Focusing too much on trying to find the right word is a bit like chasing your own tail. You won’t get it just because the focus is only on what you see. The right word is out there. I always keep books with me when I need to design software. They bring me back to a larger picture, and contain many words too.

Too much focus on static views

When it comes to software design, university programs put a lot of emphasis on UML. UML itself was a great effort to standardise the notation and representation of software concepts at a time that much needed it. If you pick up software design books dating 1995 and older, you will notice instantly how needed unification was. Today, UML is omnipresent and this is a good thing. It’s a great way to document and explain software. Including all diagram types and special syntaxes, it’s a great tool for any software developer to master.

My largest problem with UML is not UML itself, it’s in the people that swear by it and confuse diagrams with design. They build pretty class diagrams and believe they now have an architecture for their system. Belief is that all that needs to be done is generate the stubs, throw it at any code monkey to fill the loosely defined methods, compile the app, and you’re done. This is certainly better than the MDD belief saying that you can compile your diagram and the app will work. However, it’s still wrong.

From everything I have seen, this sort of Diagram Driven Design simply fails. The fact that the diagram is pretty, shows little coupling and high cohesion does not mean that the software does what it has to in the first place. A static view will never be enough to represent a program, because programs are not about the static aspects, their meaning comes from execution. The difference between a good design and a bad one is how it executes. How it initializes, binds with the problem to solve and interacts to the other components. Class diagrams can’t represent this. It has nothing to do with how it looks when printed on paper and how the boxes are arranged in a way that prevents lines from crossing.

Static views focus on conceptual purity. Methods get added because they make sense, not because they are needed. Or maybe they were in the head of the designer, but not in the head of the one implementing. Different people have different preferences in how they implement and get their code to work and if the designer did not communicate his ideas fully through dynamic views, the static view is just a piece of junk. On the other hand, if he was to build sequence diagrams for everything, he might as well write the code himself. I have yet to see a diagram tool that is faster to use than vim.

There are better ways to use a system architect than to ask him to draw diagrams. Get him to write the concepts and general directions in the most concise manner possible and let the implementer handle the low level design the way he prefers it. Results will come in faster, less waste will be generated and your architect will be able to do more, maybe even write code himself to make sure he stays grounded.

From the moment the important concepts and responsibilities involved are identified, a test-first approach will produce much better code. In fact, tests are not that important. Any implementation approach that places the focus on the facade of the components will produce better results. It’s just that doing this, tests are almost free, so why not do it. A well encapsulated component will be easy to integrate in a larger application, so developing it in isolation with tests is not much more trouble and there are additional benefits. Focus on how to use and encapsulate the code, not on how pretty it looks in a diagram. It will very likely be shorter, more efficient and contain less dead code.

At the other hand of the spectrum, some people focus too much on the low level details. A little while back, I saw a presentation on meta-programming. I must say that from a geek point of view, writing programs to rewite your programs is just cool. However, I can’t find a valuable use for it. I have serious doubts about the type of changes you can do from a low level static representation of your application. Examples included removing dead code. Sure, it sounds like a good idea, but dead code has nothing to do with static aspects, especially in dynamic languages. Another use was rewritting code in a different language. Sure, it can work if what you’re working on is a sort algorithm, but if you’re using a sort algorithm, it woul be pure luck if the algorithm you’re using will be called the same way in two different languages. This simply does not scale beyond the academic level. Code is rarely independent from the platform.

Sure, the static views remain important. I for one spend a lot of time formatting my code and reorganizing it to be more readable. This goes well beyond just following coding standards and it’s part of making the code self-documenting by making sure details don’t get tangled with business logic. I occasionally use diagrams to document my work (however I rarely — if ever — use class diagrams). In the end, it’s all about knowing who the artifact you’re building is aimed for. Code always is targetted towards humans. If it was not, we’d still be writing assembly and be happy with it. Then there is a distinction between writing the front-facing interfaces of a component and the internals of it. A poor design can be recognized by the amount of internal details bleeding through the facade the same way poor user documentation spends too much time explaining technical details.

Environment management

The saddest thing in the software industry is to see a project run flawlessly, and completely fail at deployment time. Of course, there is the worst case scenario of a team who has never seen a well run deployment and has no idea it can be done without a sleepless night or two. For a deployment or release to work well, it has to be considered from day 0 and be part of the development process.

Version control is now used by everyone serious in the software world. However, people are quick to check off software configuration management (SCM) off the list of software engineering best practices as soon as they see a subversion repository in place. SCM is not called version control for a reason: it goes well beyond it.

Deployment should be a simple task. Just putting up the new production code in place is unlikely to do it on a non-academic project. The typical PHP application will connect to a database, require some directories to be created and assigned the appropriate permissions. New dependencies may have been introduced, whether they are among the shared library or it’s an extension to the language. Covering all these bases is also part of SCM.

The big problem is that the programmer in the middle of his brand new feature will modify the database schema, create folders, change permissions and get everything working on their local copy. Then they loose track of it. In most cases, all developers will share a common database because it’s way too hard to handle those merges. When updating the code base, they will find the permission problem if they even bother using the new feature, fix it, then forget about it.

This kind of behavior is what causes deployments to be hard.

The environment in which the application runs has to be managed. The first step is to make it harder for developers in the development process. Force everyone on their own copy of the database. Fast enough, good database schema practices will be in place. Make sure there is a script containing all the changes to the environment. Ideally, updating the code base should be about running a single script and all dependencies would be resolved without anyone knowing.

If you can easily create, replicate and update development environments, you are on the right way for an easy deployment. When comes the time for a production upgrade, a sanity check can be made by restoring a production backup, restoring the copy on a similar environment (ideally identical) and trying out the update. Then you have enough confidence to do it in production within a few minutes with everyone on the team keeping their hairs and smiles.

The rule is simple, never perform an action you could automate in a script. When it comes to deployment, you want to keep human intervention to a minimum. Humans make errors, machines rarely do. It’s just about fine tuning the scripts to handle all the exception cases. Over time, they become rock solid.

Database schema

I know two ways to handle database schema updates in a sane way and use both on different projects. The first one is simple and guaranteed to work in all cases, the second one is a little more complex and can potentially have problems, but can handle much more complex cases too.

The naive way is to simply version the schema. You need a place to store the current schema version in the database and write version increment scripts when you need a modification to the database schema. The update process is about looking at the version currently in the database, looking at the version used by the code, and run all increment scripts. Because all updates are always executed in the same order, you know the final schema will always be the same.

However, this strategy will cause a problem if your development cycle includes branches. Two branches cannot increment the version number at the same time or it will break. It’s unlikely to cause problems in production as branches will have been merged and conflicts resolved, but it can cause wastes of time in development.

This is why there is a second way to handle database updates, which I read on a blog a long time ago, and with my excellent source management practices, I lost it. The idea is to store the list of installed patches rather than a simple version number. The update process becomes fetching the list of installed patches, comparing it with those contained in the code base, and install the missing ones. Of course, problems could be introduced if patches are not installed in the same order, like columns not being in the same order. I mitigate this risk by including a timestamp in the patch name and ordering them in the install queue, but with branching, the problem remains.

I implemented this technique for TikiWiki at the end of the last summer and it worked great ever since. The code had to have a few peculiarities to handle legacy and respect previous environments, but the code is available under LGPL, so feel free to rip it out.

I always plan for arbitrary code execution before and after each patch other than SQL. This is mostly to handle theoretical cases where data conversion would be required and it would not be possible to do it in plain SQL. My example in mind is always converting poorly implemented hierarchies into nested sets. I consider just having the option to safely be able to refactor the database and not have to bother about data conversion after the patch to be a nice thing.

Some propose having uninstall patches to undo the change. While I can see the value in them, especially in testing when trying to identify where a problem was introduced using binary search on the repository history, I tend to think it’s just one more place you can introduce problems. If rebuilding the entire database from scratch for every test is not too expensive, I don’t think it’s needed. Of course, the rules of the game always change when data quantities increase.

External dependencies

The easiest way to resolve the external dependency problem for libraries is to bundle them. If they get updated with the code base, it’s one less problem to deal with. However, this may not always be possible due to licensing constraints. In this case, they have to be handled externally, which means higher privileges are likely to be required.

You can either write your script in a way that it will require superuser privileges by running sudo and prompting for a password once in a while, or just refuse to install if the environment is incorrect. Perform a set of checks before running any part of the update and list the steps to take before moving along with the update. Both techniques are fine, but this is unlikely to be a technical consideration. It’s only a matter of how draconian your sysadmins are.

Physical environment

The two techniques used for database schema upgrade can certainly be applied to management of the physical environment management. Replace the version in the database with a file containing the version or log all the installed patches in a file. The important part is that what you compare with is on the same medium. If your database server crashes and you need to restore a backup dating before the update, if should be able to update itself even if the environment is already up to date. On the other hand, if the entire server crashes, it should be able to rebuild the environment even if the database is up to date. If you store the environment version information in the database, that just won’t happen.

Generally, using the same mechanisms in development as those used for migrations is the key to successful deployment. Scripts that are rarely used tend to rot. If the development team uses them often, they will work. Consider having policies to get everyone work from a fresh version of the database from production every week to make sure the updates always work. Sure, there are some privacy concerns, but anonymizing data can be automated too. If the data quantities are extremely large and loading a copy of the production database takes hours, better tools can probably be used, but this can probably be done automatically over the week-end while the development servers are idle.

Convergence in the field

During the PHP Quebec Conference, I had one of those weird feelings. It was just like my profession changed overnight, in a good way. It happened while I was talking to Eric David Benari close to the coffee machines at a time we both should have been sitting in an other room listening to a talk. Earlier that day, he had given an introduction talk about project management. He was surprised by how smart the audience was after one of his barometer questions. The question went a little like this:

Jeff is a hero in the company. He often pulls out all nighters, work on week-ends and always saves the day before releases. Rita works 9-to-5. Considering they both do the same amount of work, who should be getting the largest bonus at the end of the year?

I will leave to you to determine what the answer was. Eric David’s surprise was because, for the first time, he saw the audience completely divided, instead of being skewed towards one end.

I myself saw a difference after my presentation on software estimation, which I got scheduled to do only a few hours before the conference to fill up a canceled slot. I first gave that presentation in 2006. At the time, most people thought the idea was weird. While I hope some decided to try it out, I doubt that was the case for the entire audience. This time, I wanted to make it more interactive, and most figured out I wasn’t so prepared, but a few came to talk to me after the presentation to tell me how they handled empirical data and get some advise with some problems. For the first time, I was not alone in the room using valuable estimation practices.

Sure, I have not been alone in this world to do it. I mostly learned from other’s experience. The surprising part is that we are no longer isolated. Have we reached a critical mass? Are there enough software engineering practitioners out there to really make a difference and apply the principles? For a long time, the field has been founded and promoted by a few very bright individuals struggling to get their message through. Today, I see their message as finally spreading.

Every PHP conference I attend has a few sessions on either agile or best practices in some way. These sessions are not the least popular. Even technical crowds are moving their focus away from code. It really wasn’t the case back in 2003. Even some apparently technical talks are in fact process talks in disguise. Sebastian Bergmann‘s classics on PHPUnit are good examples.

How did it happen? Joel Spolsky mentioned that it was a bit useless to write about best practices because the programmers who need it are the less likely to read it. It seems that, while they don’t go out to read books and articles, there is only so much that can be ignored. Blogs made the best practices omnipresent. Podcasts brought them to the lazy ones. They reformulated the tried and true techniques in a way others can understand. Even thought not everyone is reading or listening, I have the feeling communication got to a point where every company out there can now have an evangelist.

Could time alone simply have made a difference? I’m part of a brand new generation of software developer. I was born after all the problems have been recognized. My training was focused on other’s mistakes so I don’t have to make them over again. Every year, a few hundred more software engineers graduate with a better training, knowing the best practices and development processes, not only the coding part of the equation.

Could it be the different crowd? Technical conferences attract younger people in general. People who are usually down in the trenches. Managers are rarely around. Most of what I read these days seems to have a common theme. The theme was present even in books written over 15 years ago, but it’s now getting louder. No matter what you try to do, only one thing will make a difference in the end: commitment to quality work. No methodology will work unless the team agrees with it and embraces it. No estimation technique unless will work unless those who estimate take it seriously. TDD will fail if developers barely attempt to reach code coverage standards. Robert D. Austin explained this phenomenon a long time ago.  Developers know this instinctively because they can see the difference between real and fake. How long will it take until organizations realize it?

The good part to this is that if you have a good team, almost anything will work. Even if you do it all wrong according to the books. There are tons of methodologies out there. Agile alone has a dozen, and there are even more unpublished variants. They all worked at some point, most of them probably failed to repeat with a different team. Formal methods are being laughed at these days, but I have no doubt they did work at the time for the context they were created for. However, simply taking them as a set of rules and enforcing them on other people is bound to fail.

Pushing it out the door

Parkinson’s law indicates that any task will take the amount of time allocated to it. Too often, this is abused by managers to squeeze developers in a short and unrealistic time line for the project. While they often abuse it, the foundation is still right. Given too much time and no pressure, some developers will create great abstractions and attempt to solve all the problems of the universe. Objectives are always good to have. Allocating a limited, but reasonable, amount of time for a project is a good way to insure no gold plating is made, but still allow for sufficient quality in the context.

A reasonable amount of time may be hard to define. It requires a new skill for developers: estimation. Done on an individual basis, estimation can be used by developers as a personal objective to attain, but can also be part of a greater plan towards self improvement. The practice of up front estimation has huge benefits on the long term, even if they are far off the target. Once the task is completed with a huge variation, it triggers awareness. What when wrong? Constantly answering these questions and making an effort at trying resolving the issues will lead to a better process, higher quality estimations and less stress to accomplish tasks.

A long time ago, after reading Watts Humphrey’s Personal Software Process (PSP), I became convinced of the value of estimation as part of my work. In Dreaming in Code, Scott Rosenberg reflects on Humphrey’s technique:

Humphrey’s success stood on two principles: Plans were mandatory. And plans had to be realistic. They had to be “bottom-up”, derived from the experience and knowledge of the programmers who would commit to meeting them, rather than “top-down”, imposed executive fiat or marketing wish.

A few initial attempts in 2006 gave me confidence that high precision estimates were possible and it wasn’t so hard to attain. However, when my work situation changed, I realized that the different projects I was working on did not have the same quality constraints. This lead to splitting up my excel sheets in multiple ways. The task of estimating became so tedious I eventually dropped all tools. Not because I was not satisfied of the results I obtained, but because of the time it took me to get to it. I reverted to paper estimates and my gut feeling of scale. Still, the simple fact of performing analysis, design and rough measurements gave me significant precision. Not everything was lost.

Estimation Interface
Estimation Interface

However, one thing I did loose was traceability. Paper gets buried under more, or lost altogether. Personal notes are not always clear enough to be understood in the future. I no longer had access to my historical data. I wanted my spreadsheet back, but couldn’t bear with having to organize it. Over a year ago, searching for a reason to try out new things, I started a personal project to build a tool that would satisfy my needs for organization and simplicity. It required a few features crucial to me.

  1. It had to make it easy to filter data to find the relevant parts to the task at hand
  2. It had to be flexible enough to allow me to try out new estimation techniques
  3. It had to be quick and fun to use, otherwise it would just be an other spreadsheet

I achieved a first usable version over the last summer, working on it in my spare time and gave it a test run in the following months. It was not good enough. Too linear. Too static. It did not accomplish what I needed and found myself reverting back to paper over again. What a terrible failure.

Spacial Editor
Spacial Editor

A few months later, I figured I had to make it paper-like and gave it a little more effort. After a dozen hours sitting in the airport over the last two weeks, I think I finally documented my work enough for others to understand. Sadly, even if the application is somewhat intuitive, the prerequisite skills required to perform estimation are not.

Today, I announce the first public beta release of TaskEstimation.com, a tool aimed for developers to estimate their work on a per-task basis and work towards self improvement. Don’t be confused, this is not built for project management. While it probably is flexible enough for it, any project manager using it should have it’s own self improvement in mind. Feedback is welcome on both the application and the documentation. I expect the later one to be lacking details, so feel free to ask questions.

Reuse… or not

When working on a project, there are hundreds of possibilities for reuse of existing components. On first sight, some of them seem to do all we need or to do it partially. However, we rarely know the details. Without looking at the code, there is no way to know how it does things or what the priorities were. In the best cases, good documentation with usage samples will be available (API documentation is useless in this case). We don’t know how flexible it is or what the limitations are. There is also the whole question of quality. There is nothing more annoying that reusing code only to realize it’s full of bugs, then go through the code and realize the documentation was wrong.

The question really is to reuse or to rewrite. Writing it ourself is a very safe route without too many surprises, but it’s a long one. When you write it yourself, you know the details. The implementation is usually light an tailored to your needs. It’s easy to extend in the future, because you know the details and the vision. Choosing the reuse route is accepting a packaged deal. The implementation does what it does. Evaluation will tell you it does what you need. It will also come with a whole lot of elements you don’t really need. It may end up being good, because you may need them in the future, but on the short term, it clutters your vision. Reusable code weights a lot more than custom-built code. It contains more validations and verifications to deal with all sorts of environments and to produce better errors for the implementor (if you produce a reusable library that is “lightweight” and does not contain all of this stuff, what you produced is a custom piece of code you decided to give out). All this code has a value. When reusing code, you have a peace of mind. If you migrate to a different platform, or browser technology evolves, you won’t have to deal with the problems yourself. A simple update will do (never considered reusing an unmaintained component).

However, when you develop your software and need something that does not quite fit with what it does, you’re in trouble. You can work around the problem, which is likely to be innefficient, or you can fork the library and deal with the trouble of updating (only works with Open Source, but you want to use Open Source to have this option). The latter is completely undesirable. The ideal solution would be to contact the authors and see if you can get your modifications, or at least the required hooks, to be part of the core. Good solution, but they might refuse, and you still need to go through their code and understand it.

Reuse works great as long as you don’t push the limits. Low initial cost, high cost to maintain if you go beyond what it does.

To summarize:

  • Writing a quick solution can be faster than reusing an existing component
  • The quick solution will be easier to extend
  • The quick solution will have a lot of maintenance to do if the environment changes
  • Reuse makes migration easy
  • Reuse will require more effort if you need to push the limits

The worst thing you can do is to decide to write it, publish it and maintain it. Down the road, you will have a clone of the solution you discarded for being “too heavy”.

So, if you want to reuse a component, the one thing you want to avoid is having to go beyond it’s limits, or expect those limits to be moved further away by the time you need it. It basically means you know the library is maintained, will be in the future, and has a good vision of where it’s going. This is a lot of trust required. Why do I prefer Zend Framework over CakePHP or CodeIgniter? Mostly because I met with the project leaders early on in the project. I discussed with them and figured out the project could be trusted. I put my trust in the people behind it. Problem is, not everyone gets to travel around and meet project leaders (not that it’s hard, just that most don’t think they can). So if you can’t have a face to face discussion with project leaders (which may give a positive bias anyway), what can be relied on? You want to make sure there is a community of people invested in the component and that the project will survive. One man shows working on hobby projects don’t really qualify.

Reusing a component adds weight to your project managment. Dependencies have to be maintained. You need to monitor releases and security advisory. It may call for an update that will require a few minutes or a few hours of work, and that call may not fit in your project’s schedule. You need to trust the project to be responsible about this too. If they reuse components themselves, do they follow the security advisory, or will you have to do that too?

One thing for sure, when you decide to reuse a component, you need to abstract it away. There are two reasons to this. One is self defense, so you can change to something else without too much effort if required, the other is more subtle and often forgotten: to preserve the aesthetics of your project. Each program is developed with a certain style. In a good program, you can guess how functions will be named and how objects need to be interacted with. The high quality components you reuse are probably like that too, but it’s not the sample style. Reusing without abstracting will create something that looks like patchwork, even when done by talented people.

These are the rational aspects of code reuse. There are also the irrational ones, which mainly cause to reject reuse.

  • Misunderstanding
  • Wrong expectations
  • Focus on other aspects

I’ve seen people discard solutions for having high hopes, and I probably did it in the past. The laziness that makes great programmers can turn bad. When looking for a reusable solution to their problem, they hope that it will do everything they need. A bit like the VB expectation of RunTheProgramINeedForAssignmentNumber6(). Sometimes, you need to accept that the library won’t do everything and making a feature request for it is ridiculous. If you want to reuse a data storage layer, it may not do the remote synchronization you need, and it’s right not to do it, but you should not discard the storage layer because it does not do everything you need.

Quality has a price

Software quality is a bit like world peace. Everyone wants it, but when issues get close to you, there are always excuses. Schedule pressure, complexity, beliefs, blaming it on someone else, you name it. Small deviations in the course of action to handle exceptional situations may not seem to matter, but they do. In the end, you get what you pay for and you can’t really expect to have quality software without paying attention to quality.

The base theory is really simple. Defects get injected in the software as part of the development effort. They are caused by human mistakes and oversights, and that won’t change any time soon. The goal is to remove them to make a “bug free” application. Every activity for quality will remove a certain amount of these injected defects. If the amount injected was known, it would be really easy to keep looking until it goes down to 0. The truth is, we never know how many were injected and it’s impossible to prove the software is completely defect free. All you can do is reach a certain confidence level. Sadly, raising the confidence level has exponential costs and unless you’re working on code for a nuclear central, you probably won’t be able to justify the costs.

Some people believe that the developers are responsible for the code quality. It’s true in a certain way, but soldiers are not responsible for wars. Leaders are. On an ethical point of view, the soldiers should refuse to engage in conflicts that kill innocent people, but they have to follow orders. In the same way, developers should stand their ground and request to be granted the time to do things right, but many will fear loosing their source of income.

Being meticulous and careful is one way to reduce the amount of defects injected, but you can’t hope that it will solve the quality issue. A client asked me if they should be using unit testing or checklists when performing QA. To me, the answer was obviously both, but it does not seem to obvious to everyone.

Let’s move on to a different analogy. If you cook a meal for guests, you might review the recipe, make sure you have all the ingredients, validate that their quality is good and be very meticulous while performing the procedure. The fact that you did everything well at the start certainly does not mean that you should not taste the resulting meal before serving it. On the other side, skipping the verification on the ingredients does not make sense either. If you taste in the end and realize the milk use was not fresh, you just made one hell of a waste. It might also be very difficult from the end result to understand why the food tastes so terrible. You would have to be one confident chef to serve a meal without ever tasting any of the components.

Being meticulous in development is like lining up the ingredients, verifying them and paying attention to every step you make. Unit testing is like tasting the sauce while it’s still cooking. QA is the final verification before you serve. Peer review does not really apply to cooking a meal, but having someone watch over you and make sure you don’t make mistakes isn’t a terrible idea in the first place. Prototyping is like accepting to waste some food to practice on a complex procedure.

A programmer cooking for himself alone, a mother cooking for her children and preparing a Christmas dinner for your whole family all require different levels of quality. The time it takes is widely different. The three examples here would be around 20 minutes, an hour and a full day. Notice how exponential it is? What affects the quality level required is the cost of failure. When I cook for myself alone, I don’t really care if some details are not completely right. The consequence is basically not enjoying my meal as much or a few dollars if I have to redo it. Think about the consequences for the two other scenarios. Is the time taken justified? Certainly is. It’s natural and everyone understands it.

The exact same thing is true with software. However, it’s abstract and invisible, so people don’t see it so quickly.

A lot of the techniques used to obtain higher confidence in software quality are about piling up redundant checks. Quality levels can generally be improved incrementally, but each check you add has a cost, and the cost is not only time. Final QA requires that you have some sort of specification or use cases. Peer review works a lot better if you have strong conventions and an agreement on what is OK and what’s not. Really not that easy to get. Unit testing is the most controversial because it affects how the software is designed. Code has to be written in a way that supports testability. It changes the design. It may have performance impacts. It adds complexity to the design and forces to put priorities on the quality attributes that are required.

Design decisions are a sensitive issue, but I think any reasoned decision is better than no decision at all. You can’t have all the quality attributes in your application. You need to prioritize what you need and live with those decisions. Having worked on projects for long periods, I tend to put maintainability and testability really high. Others may prefer flexibility and performance. It’s very hard to obtain a consensus.

Consequences have to be evaluated at the beginning and the quality objectives have to be determined from the start. Decisions have to be made. Impacts have to be know. Recovering from insufficient quality is hard and expensive. Failing can be catastrophic. Developers are accountable for quality, but so is the rest of the organization, including marketing, sales and management.

Adding collaboration and durability to code reviews

The idea first came to me over the summer while I was in Strasbourg discussing the future changes to TikiWiki. It all started because we decided to release more often. A lot more often. As it was not something done frequently before, the tools to do it were hard and tedious to use. Some manual steps are long and annoying. Packaging the releases was one of them. All those scripts were rewritten to be easier to use and to regroup multiple others. That part is good now. One of the other issues was the changelog. TikiWiki is a very active project. Having over 1000 commits per month is not uncommon. Many of them are really small, but it’s still 1000 entries in the changelog. Ideally, we’d like the changelog to be meaningful to people updating, but we just had no way to go through all the changelog before the release.

One solution that came up was to use a wiki page to hold the changelog, get new items to be appended, and then people could go though it and translate the commit messages from developer English to user English, filter the irrelevant ones and come up with something useful to someone. Well, we had no direct way to append the commit messages to the end of a page, so this was not done over the 6 month period and we’re now getting dangerously close to the next release. Looks like the next changelog will be in developer English.

Now, what does this have to do with code review? Not so much. I was reading “Best Kept Secrets of Peer Code Review (which they mail you for free because it’s publicity for their software, but is still worth reading because it still contains valid information) and figured TikiWiki could be a good platform to do code reviews on if only we could link it with Subversion in some way. After all, TikiWiki is already a good platform to collaborate over software development as it was developed for it’s own internal purposes for so long. When you dogfood a CMS for a long time, it tends to become good for software development (also tends to have complex UIs intuitive to developers though). It contains all you need to write quality documentation, track issues, and much more by just enabling features and setting it up.

Moreover, code review is done by the community over the Subversion mailing list. The only issue is that we don’t really know what was reviewed and what was not. I personally think a mail client is far from being the best environment to review changes in. Too often, I’d like to see just a couple lines more above to fully check the change or verify something in an other related file. The alternative at this time is to use subversion commands afterwards and open up files in VI. I wish it required less steps.

Wiki pages are great, because they allow you do do what you need and solve unanticipated issues on the spot. Specialized tools on the other hand are great at doing what they were meant to do, but they are often draconian in their ways, and forcing someone to do something they don’t want to never really works. I’ve made those errors in the past designing applications, and I felt Code Collaborator made the same ones. The book mentioned above contains a chapter on a very large case study where no code could be committed unless reviewed first. Result: few pages were spend explaining how they had to filter data for those cases the code was not really reviewed and people only validated it within seconds. I’d rather know it was not reviewed than get false positives.

Anyway, I started thinking of the many ways TikiWiki could allow to perform code reviews. The most simple layout I could think of was this one:

  • One wiki page per commit, grouping all the changes together. Reviewers can then simple edit the page to add their comments (or use the page comments feature, but I think adding to the page is more flexible) and add links to other relevant information to help others in the review.
  • A custom plugin to create links to other files at a given revision, just to make it easier to type. This could actually be a plugin alias to something else. No need for additional code.
  • A custom plugin to display a diff, allow to display full diff instead and link to full file versions for left and right.
  • An on-commit plugin for Suversion to make the link.

With commits linked to other related commits, to related documentation and discussions, other side features like wiki mind map and semantic links will surely prove to be insightful.

Then I went a little further and figured trackers could be used to log issues and perform stats on in the future. Checklists could be maintained in the wiki as well and displayed as modules on the side, always visible during the review. If issues are also maintained in a tracker, they could be closed as part of the commit process by analyzing the commit message. However, this is mostly an extra as I feel there is enough value in just having the review information publicly available. The great part of using a vast system is that all the features are already there. The solution can be adapted and improved as required without requiring complete new developments.

Now, the only real show stopper in this was that there is no direct way of creating all this from a subversion plugin. TikiWiki does not have a webservice-accessible API to create these things and is unlikely to have one any time soon. The script could load the internal libraries and call them if they were on the same server, but that’s unlikely to be the case. A custom script could be written to receive the call, but then it would not be generic, so hard to include in the core. As we don’t like to maintain things off the core (part of the fundamental philosophy making the project what it is), it’s not a good solution. With this and the changelog idea before, I felt there was still a need for something like this. I’m a software engineer, so I think with the tools I use, but I’m certain there are other cases out there that could use a back door to create various things.

To keep the story short, I started finding too many resemblances to the profiles feature. Basically, profiles YAML files containing descriptions of items to create. They are to be used from the administration panel to allow to install configurations on remote repositories, to configure the application faster based on common usage profiles. Most of the time, they are only used to create trackers, set preferences and the such. However, they contain a few handles to create pages, tracker items and some other elements to create sample data and instruction pages. If they could be ran multiple times, be called by non-administrator users, contain a few more options to handle data a little better (like being able to append to pages), they could pretty much do anything that’s required in here, and a lot more.

An other problem was that profile repositories are also TikiWiki instances. They require quite a few special configurations, like opening up some export features, using categories and such. I wouldn’t want all this stuff just to receive a commit notification, and I wouldn’t want to execute a remote configuration without supervision from the administrator. More changes were required to better handle access to local pages.

Still, those were still minor changes. A few hours later, Data Channels were born. What are these? It’s so simple it’s almost stupid. A name, a profile URI and list of user groups who can execute it from an HTTP call.

Next steps toward using TikiWiki as a code review tool:

  • Discuss this with the community
  • Write a profile to create the base setup and the base data channel profiles
  • Write a subversion plugin to call the channels
  • Write the plugins to display code differences

Interested in something like this for your project or company? Just drop me a line.

Looking back 2008

Seems like this is the time for retrospectives.

  • In January, I was coming back to my normal life after my last internship at Autodesk, moderately busy organizing CUSEC and starting what could have been my last semester of school if only the last courses I have were not precisely in the same time slot. To keep things light, there was also a CodeFest along the way.
  • February turned out to be a very important. It’s the month in which I decided to stop thinking about multilingual wikis and just do it. Proof of concept ready within a day. Few more days to bring it feature complete, and then spent the remaining weeks documenting it and explaining how it works. Most still don’t get it, but I live OK with that.
  • At the begining of March was the Blitzweekend in which we made a first beta release of TikiWiki 1.10. I then spend time preparing my presentation (with CS Games in between) for PHP Quebec that month, which turned out to be my best presentation ever. I spent most of the second half of the month writing reports for school, which was well worth it because I won a prize for it multiple months later. It might have been time better spent if I had actually worked for clients during that time, but it wouldn’t have been as fun. Closing out on 4 year old ideas feels great.
  • Although I was mostly done writing for school, it was far from over. In April, I wrote the paper that would be presented in Porto with Alain Désilets and Sébastien Paquet. These academic conferences get you to work before you are even accepted. Tech conferences are so much easier to handle. By the end, I figured my writing skills were good enough. It was also the month where I went on vaccation for the first time in a very long while.
  • Coming back from Cuba early May, I spent a few days not eating or doing anything because it required too much effort based on my new standards. Luckily, starvation got me to start doing stuff again. Other than an other CodeFest, May was mostly quiet.
  • In June, I mostly worked on Profiles, a configuration delivery systems for TikiWiki. Other than that, usual work.
  • I went to Strasbourg in July for TikiFest during which 2.0 saw it’s first RC and major changes in the community. It was my first trip in Europe, so I spent some time visiting Strasbourg and Paris. While I was there, I had the confirmation that my paper was accepted, so I had to handle the reviews and submit the final version while I was away from home. I was really glad I had co-authors to handle most of it.
  • August was quiet, other than the fact that I officially completed my studies. We held a small TikiFest in Montreal. I had a lot to cover in TikiWiki after the meetings in France, so I spent quite a lot of time finalizing the 2.0 release (which was made that month).
  • I then went to Porto in September for WikiSym and a TikiFest, living in the WikiHouse for nearly two weeks with 9 other people. It was a lot of fun. Porto was a great city. It turns out the part where I presented my paper was only a minor aspect of the trip. Surprising. While I was there, I learned I was accepted for php|works in November. It seemed like every time I traveled, I would need to travel more.
  • In October, I really had to catch up on work, so other than a CodeFest and preparing my presentation, I just worked.
  • November was a rush. Right after php|works in Atlanta, I had to be back in Montreal because the largest TikiFest ever was happening at the same time with people from all over the world.
  • Nothing happened in December, except a huge pile of work.

It was a lot of fun.