L-P Huberdeau


Truth behind 40-hours weeks

Posted in General by Louis-Philippe Huberdeau on the June 25th, 2006

Measurement

Now that you know how to count the size of your application, it’s time to get some data to base estimates on. Yes, the estimation is nothing more than an extrapolation of history data and size. The important thing to know is that the estimates will only be as good as you are. With an estimate based on history data, the estimate is based on your own real performance. Of course, if you took an average of 1.5 hour per size element (what ever you end up counting with), there is no reason to believe your next project will take only 1.2. Beyond these straight facts, only as good as you are also relates to the quality you produce. Bad size approximation or timesheets will only lead to bad data.

Size counting is subjective and beyond basic rules, there is nothing much that can be written down. On the other hand, collecting time is a lot more deterministic. Seconds, minutes and hours are well defined. Problems arrive when you speak of days and weeks. While a typical day of work is 8 hours and a week is 5 days, most can probably speak of days closer to 10 hours and weeks closer to 6 days. Writing down 40 hours per week on a project won’t give you much advice.

In a perfect world where there is no overtime, 40 hours a week is a good measure for the time spent in the office, but’s it’s not any close to the amount of worked time. In a typical day of work, a few hours should be taken off for those discussions about the football game the night before, coffee breaks, extended lunch breaks, time watching the score of the football game, personal phone calls or what ever employees do. This is not a bad thing, it improves the social interactions and probably even helps on the long run. With all these things, the original 8 hour is probably down to 7 hours (conservative value here).

Now, from the time spent working, those 7 hours remaining, how long can really be spent on a single project? Once you’ve removed the time spent helping a co-worker resolving a problem, pointing the manual to users, debugging the previous project, staff meetings and all those other distractions from the real work to be performed, I wouldn’t be surprised that the original 7 hours went down to 4. The worst part is that those hours are probably not consecutive and won’t allow you to be concentrated long enough.

Now, what should be entered on the timesheet and counted on the project? I count that remaining 4 hours. Knowing hour much time is used on a project is necessary to build the history data. How many hours a day is spent on a project is also important, since the past value is probably going to reflect the future values. If you think 4 hours a day is bad, Humphrey indicates that the average is closer to 12 or 15 hours per week.

Of course, you could decide to include the staff meeting time in the project. Again, as long as you remain constant, it shouldn’t be a problem (that is, as long as the meeting remains on topic).

Once you have history data, making an estimate is very easy. Multiply the counted size with average time per size element and you get the amount of work hours required. Divide the total work hours by the daily (or weekly) hours worked and you end up with a date. Even an history of two projects can provide impressive precision. I had 5-10% prevision on my first estimates, which were on 15-20 hours projects. The results obtained depend a lot on how regular you are in performing your work. If you are not very experimented with the language you are using, you are as likely to find a solution fast or stay on a problem for hours. Don’t expect to get good results in those situations.

As the history database grows, most analysis is possible. Average value is probably going to be replaced with linear regression. It will be possible to divide data is smaller groups to obtain greater correlations. This entry discussed generic topics you could read about in almost any book. The next entry will be about more specific techniques I use to sort my data.

I don’t buy it

Posted in General by Louis-Philippe Huberdeau on the June 21st, 2006

Microsoft

Reading the backlog in news and blog entries I found a post by a Microsoft employee originally posted on Slashdot. Out many things, he explains why Vista was delayed so often. Out of all those bold meaningless words statements, this one caught my attention.

We shouldn’t forget despite all this that Windows Vista remains the largest concerted software project in human history.

The fact that Vista was delayed by several years won’t kill it.

  • Microsoft is evil and so is Bill Gates.
  • Free Software is so much better.
  • Vista will take too much hardware resources.
  • Linux is free!
  • Windows is full of security holes.

Against the comments you can read on slashdot, none of these will cause Vista to fail.

The main issue is that the project was dead before ever starting. Going on for the largest project in software history is quite a bold move. If it were to save mankind or perform something else heroic, it would probably make sense. They will have to face it someday, their largest-project-mankind-ever-started does nothing more than burn CPU cycles. Not that these are valuable, it’s simply that an operating system has no value by itself. It does not allow you to perform your job better, especially since Vista is not exactly close to a user interface revolution. Take off all the eye candy and the end user does not get much more than with XP.

Great! They wrote a 50 million lines of code operating system! What do I get? Are there any advantages? Does it actually do anything more? No. They can spend as many millions as they want, architect their system to hell and fix all those circular dependencies in their system. They can have the best component model and be allowed to modify their universe by changing a pointer. It won’t make their software more useful. In the end, when you install Vista, you can’t do anything unless you install more software on it. In the end, what people want is that other software, not the operating system. What is Microsoft trying to build?

Their next release is a technical advance for them. Security might be a future for them, but for the rest of the world, it’s not a feature, it’s expected.

Think about it for a second. With all those resources deployed on a project, imagine what could have been achieved. They spent billions on a software that has no use by itself. The worst part is that it might still be a commercial success. I doubt they won’t be making any money out of this project, but it certainly won’t make it a success. It might be a success for shareholders and administrator, but for the software engineers behind it, it’s a complete failure since no goal was achieved. After years of development, status quo was reached. Don’t the developpers at Microsoft feel better when they are useful?

How big is your application?

Posted in General by Louis-Philippe Huberdeau on the June 17th, 2006

Measurement

Knowing how big your application is the very first step to performing an estimate. To perform any calculation, the value has to be numeric. There are many options on the type of unit that can be used to determine the size of the application. For most developers, lines of code seem like the most obvious solution. You might actually find a lot of literature using lines of code as the measurement units. Lines of code are easy to count. It can even be automated.

The main problem with it is that it’s completely unpredictable. Lines of code are easy to count once the application is written, but guessing the amount of lines of code to be used prior to any development does not make much sense. You can write the exact same thing using 10 lines or a hundred line, depending on the styles used. Counting the total amount of lines of code is easy, counting the increment is not so bad if you write down the number at the beginning. The real trouble starts when you try to count the amount of modified lines.

The solution to avoid counting lines of code is to count the functional size of the application. The objective is to get a meaningful value you can track your progress against. When using lines of code, the fact that you wrote 50% of the planned code does not mean 50% of the application is written in any way. With functional size, 50% really is 50%. Some pretend the functional size is abstract enough to be communicated to non-technical staff, but they say the same thing about UML, so I guess that depends on how technical is your non-technical staff.

There are many techniques to count functional size, such as IFPUG or COSMIC-FFP. Both methods are supported by international organizations and ruled by a standard. The COSMIC-FFP method is very simple to understand. It contains a few very simple steps:

  1. Identify the functional processes (ex: create blog post)
  2. Identify the data groups (Author, Post, Category, Confirmation Message)
  3. For each process, compose the process using basic operations on data groups. The 4 basic operations are Entry, Exit (or Display), Read and Write
  4. Count 1 point for each basic operation.

Using this method, creating a blog post would have a functional value of around 7 (entry, read categories, display categories, read author, display post, write post, exit confirmation message). The size you obtain may be a little different depending on what you considered was required based on your fictional requirements. The result I gave might not be 100% accurate as I’m not really a COSMIC-FFP specialist. I only give this “state of the art” example as a reference. I personally think that’s a little excessive for a web application. Considering reading a simple value, like your active author, makes no real sense as it’s not much harder than reading a variable. This level of detail is probably good if you actually have to read from a sensor in a real time application.

The technique I use is based off this COSMIC-FFP method, but it’s not as formal and I never wrote the rules for it. I basically take off the entry and exit data movements which are way too simple in a PHP application, and I pretty much group the reads and write. To summarize all this, I count the amount of affected data groups in a functional process. To make it even more simple, you can replace data groups with database tables excluding relational tables for database driven applications. I also tend to exclude the tables when all I do is a simple listing to fill in a drop list. In the end, that same blog post has a count of one, while the editing would have 2.

OK, I largely simplified the COSMIC-FFP method. I did it because it fits my needs better. I don’t care if my measure cannot be ported across different technologies, I want measurement to be fast and accurate. The method you use to measure does not matter as long as you remain consistent with it. In facts, you can even change the way you measure on the way if you realize it no longer fits your needs, but there is a price to pay, you will have to re-measure all your past projects unless you want to stop using them in your estimates.

I use a spreadsheet to perform my calculations and perform most manipulations, but I keep my analysis data in a wiki, where I list my processes and the size attributed to each of them. I include as much information about it as I feel like including. Keeping the source data of your measures is what allows you to adapt your measurement techniques. If you don’t have your process list and your data groups listed anywhere, there is no way you will be able to adapt the count and make sure it’s consistent.

I think I mentioned measurement of an application is subjective. The only way to improve is to measure applications. Multiple questions will come up as what should be measured the first few times. As those questions get answered, go back to the previous project and make sure they were answered the same way.

Measurement and estimation context

Posted in General by Louis-Philippe Huberdeau on the June 15th, 2006

Measure

Before starting to get into technical details, I’d like to clarify what is the purpose of all this and where it fits into the development process. It’s a vast topic and I won’t pretend to cover it all. I am restricted by the context I have used it in. I don’t plan on reformulating books in here.

Estimates can be made at multiple level. The estimate you give your client is most likely not going to be the same as the one you use to track your progress, simply because the objectives are not the same. Your own estimates are meant to be honest and try to be as accurate as possible. The end result is as likely to be above or under the target value. The one you give to a client is usually supposed to be a ceiling type of estimate. Clients don’t like to see the price going up and you don’t like loosing money for underestimating.

I don’t perform estimates for clients. I use them to track my own development and communicate. The process of estimating provides numbers about the development that can be used to track the progress of the project. If you know the total size of the project and know how much you have done, you can obtain a nice metric called earned value. Ever heard of a project at 90% completion for months? Tracking earned value is a good way to avoid this problem as you can find out at all time what is the exact completion percentage of the project. Combined with the percentage of used time over the total estimate, it can indicate if you project will be early or late long before the deadline is reached.

Managers have their own scheddules. They make their estimates based on various criterias. In the best cases, they will ask the developers for their own estimates. Since all those MS Project files are long to update, being able to give a warning a few weeks in advance that the project will take longer or shorter than originally planned can be useful. As a developer, you might get to have the feeling something is going wrong and the project will end up being late. It’s possible to support this feeling with facts.

Even if there is no money envolved, estimates help anticipate problems. Using half an hour to perform an estimate before the project starts and a few verifications during the development provides insights that can be very valuable.

The estimates can be performed on complete projects or smaller tasks. It’s probably a good idea to start training on smaller tasks. They provide data much faster and allow to adjust the process to your specific needs. There is no ultimate way to do these things. If there was, it would probably be supported by all development environments and there would be no need to write about it. Measurement techniques has to be tailored, collected data has to be selected and observed metrics must be identified. The process might seem complex, but it’s actually very simple to perform, and this series of post aims to give a practical aspect to all those academic terms.

Academics tend to generalise measurements and estimation processes to become organisation-wide. I have the feeling it makes no sense. In the end, each developer has to perform the work seriously. The results can be aggregated for work teams, but I doubt it can go far beyond that. It’s useless to measure something you don’t need. If the measurement requirements are organisation-wide, chances are you will have to measure for everyone’s need, which can be a lot of measurement. Too much measurement is time consuming. If you don’t know what the idea behind it is, it’s hard to take it seriously. I will discuss the limitations of measurement once I have covered more specific aspects. For now, keep in mind that it’s better to start locally and focus on direct results.

The techniques I will present apply to the developer’s own development process. Now that the context is set, I can begin writing about some technical and practical aspects. In the next post of this series, I will cover the size measurement.

Estimation requires no arcane magic

Posted in General by Louis-Philippe Huberdeau on the June 13th, 2006

Measure

Estimating the duration of a software project is a fairly simple task. It’s a common misconception that it requires many years of experience and only few actually succeed. Many projects fail because they are underestimated. The reason is not that the estimate was wrong, it’s that there were probably no estimation at all. Staring at the ceiling for a few minutes and writing down a number is not an estimate, it’s simply a guess. Some are quite talented at guessing, but it doesn’t make their final number more valuable.

What makes a good estimate? I would be quite nice if two different estimators could reach a similar value. Even better, it would be good if those estimators could defend their estimates months later. It has to be possible to track the source of the values produced. This is possible. Estimation has nothing to do with arcane magic, it’s simply about following a few steps and taking them seriously. There are two things you need to know about: how to count and know how good you are. The first step is learning how to count. How do you know how big what you have to develop is? Once you know how to count, it’s only about collecting data and you will find out how good you are. It doesn’t take that much data to obtain accurate results. A simple average based on your last project will give you a value that’s closer to reality than a guess.

Where does the misconception about estimation come from in the developer communities? There are dozens of books being written on the topic, quite a few of them being best-sellers. If they are best-sellers, why don’t you ever hear about a developer systematically making accurate estimates? It probably has something to do with most of the developers you hear about are blogging, work with bleeding edge technologies and never bothered reading a book unless it had Web 2.0 in the title. The software engineering knowledge about estimations come from military industry and gray-haired consultants. No need to say, it doesn’t really reach web development.

I have been experimenting with some of the techniques recently, applied to web applications. I have to admit the results were impressive. I didn’t need to spend tons of hours filling speadsheets or write a thousand page of documentation. In the next few weeks, I will discuss of certain aspects of my experimentation. Stay tuned.

Nested sets * Large amount of data = Need more RAM

Posted in Programming by Louis-Philippe Huberdeau on the June 2nd, 2006

MySQL

When I say more, I mean a lot more. A terabyte would probably do it. I got to play with nested sets quite a lot recently and they are really amazing. The relations you can pull out of them are simply incredible. For those who have no idea what a nested set is, I highly recommand reading Mike Hillyer’s article on the topic.

I was working with faily large sets, little over 100 nodes. Queries were so fast I could barely notice the page reloading, and those queries were using an average of 10 joins. The problems really came when I got this great idea of importing a truck’s (yes, it’s large) spare part list into the nested set. Going from hundreds to thousands of nodes really makes nested sets SLOW. A simple listing with depth value would take around 40 seconds, and that’s only a 2 table join.

I made an attempt to run one of the more complex queries overnight to see how long it actually took to process. I never found out. When I got back to work this morning, the computer’s fans were still at full speed, leaving an loud noise in the room. The display on the monitor was gone crazy. The display problems were due to the connector being loose. My guess was that all those vibrations caused some trouble. I double checked the indexes, ran EXPLAIN to verify if all the conditions were processed correctly. Everything was in order. I then made some quick calculations to find out that the joins generated a few billion rows, each containing quite a few columns. At that point, I realized I had to find an other solution.

Luckly enough, I didn’t really need the hierarchical data about the components.