Archive

Archive for June, 2008

He's right, again

In some theoretical sense, developing the complete infrastructure before implementing any visible functionality might be efficient, but in a practical sense, managers, customers, and developers begin to get nervous when too much time goes by before they can actually see the software work. Infrastructure development has the potential to become a research project in creating a perfect theoretical framework[...]

Sounds like a familiar problem? Taken from Software Project Survival Guide. I can’t say it’s my favorite from McConnell, but it’s still right on.

Categories: Books Tags:

Motivation Driven Development

Today is just one of these days I can just look back and laugh at my own behavior. I have been working on a personal project for a while now (should go public soon). Of course, it started off with many great ideas and I could have fun just thinking about it. When came the time to actually code it, motivation dropped. The problem was really that, while I had a great goal, before getting close, I had to get all the ground work done.

What happened? Well, it froze. I stopped working on it for months, until recently. When I got back to this project, I came back because there was something specific I wanted to implement in it. When I took a good look at at what I had in progress, I realized that the things I was focusing on were not getting me any close to my goals. I just left everything as is. Tests were running. Some code was not used yet. No problem.

Instead of starting over from where I was, I mapped out the high level features I wanted it to achieve and wrote down a road map. It was not based on building good foundations, not based on a good architecture. It was based on what’s needed for the software to be any useful and what I felt like working on. What did it change?

  • Changes were visible on the final product
  • At every step, I would get closer to being able to use it, and find out different ways to use it
  • I got motivation to work on the project
  • The project evolved more in the last two weeks than ever before

So, why today? Well, that feature I had half started months back, I finished it today in just 30 minutes. Months of no progress to avoid 30 minutes of work. It’s not that it was long, certainly not hard. It was a boring task. It was necessary, but along it did not do any good. Today, writing it enabled a very powerful feature. Even if it was boring, I was happy to do it because I would then see the whole thing in action. It’s not quite complete yet as it still misses a few critical features required to normal use, but I can already use it for my own needs, which is great.

Now, if I had done it a few months ago, I’m pretty sure it would have been more than 30 minutes of work. It just takes me more time to do work when I’m not motivated. It’s also likely that the feature would have been more complete. Rather than doing what it has to in order to be useful, it would have been what it should be not to be so boring to write. Goldplating? Scopecreep?

The scary part is that I’m pretty certain it’s not the first time I’ve dropped a project just because I didn’t feel like doing a tiny little part.

Categories: General, Programming Tags:

Who reads code samples?

Recently I have been reading Programming Collective Intelligence by Toby Seragan. I love the subject. It’s all about handling large data sets and finding useful information out of it. Finally an algorithm book that covers useful algorithms. I don’t read code-centric books very often because I think they are boring, but this one has a great variety of examples that keep it interesting as the chapters advance. There are also real world examples using web services to fetch realistic data.

My only problem with the book is that there are way too many code samples. It may just be my training, but there are some situations where just writing the formula would have been a lot better. Code is good, but when there is a strong mathematical foundation to it, the formula should be provided. Unlike computer languages, mathematics as a language has been developed for hundreds of years and it provides a concise, unambiguous syntax. I like the author’s effort to write the code as a proof of concept, but I think it belongs in an appendix or on the web rather than between paragraphs.

Which one do you prefer?

def rbf(v1,v2,gamma=20):
    dv=[v1[i]-v2[i] for i in range(len(v1))]
    l=veclength(dv)
    return math.e**(-gamma*l)

or

For that kind of code, I vote #2 any time. I’m not a Python programmer. I can read it without any problem, but that vector substraction did seem a little arcane at first and it took me a few seconds to figure out, and I’m about certain that even a seasoned Python programmer would have stopped on that one. It’s not that it takes really long to figure it out, but it really keeps you away from what is really important about the function. What was important was that you want to score points that are far away from each other a lower value than those that are close by. Anyone who has done math could figure it out from the formula because it’s a common pattern. From the code, would you even bother to read it?

This is a very short code sample. In fact, it’s small enough that every single detail of it can fit into your short term memory. Here is an example that probably does not. In fact, I made your life a lot easier here because this code was scattered across 4 different pages in the book.

def euclidean(p,q):
    sumSq=0.0
    for i in range(len(p)):
        sumSq+=(p[i]-q[i])**2
    return (sumSq**0.5)

def getdistances(data,vec1):
    distancelist=[]
    for i in range(len(data)):
        vec2=data[i]['input']
        distancelist.append((euclidean(vec1,vec2),i)
    distancelist.sort()
    return distancelist

def gaussian(dist,sigma=10.0):
    exp=math.e**(-dist**2/(2*sigma**2))
    return (1/(sigma*(2*math.pi)**0.5))*exp

def weightedknn(data,vec1,k=5,weightf=gaussian):
    dlist=getdistances(data,vec1)
    avg=0.0
    totalweight=0.0

    for i in range(k):
        dist=dlist[i][0]
        idx=dlist[i][1]
        weight=weightf(dist)
        avg+=weight*data[idx]['result']
        totalweight+=weight

    avg=avg/totalweight
    return avg

or

The formula is insanely shorter, and the notation could certainly be improved. What’s the trick? It relies on well documented language features like vector operations and trims out all the python-specific code. I actually wrote more than I had to because gaussian itself is well defined in math. Because all operations used are well defined, whichever language you use will probably support them and you can use the best possible tool for your platform. The odds that I use Python when I get to use those algorithms is low, so why should I have to bother with the language specifics?

The author actually included the formula for some function in the appendix. I just think it should be the other way around.

Categories: Books, Programming Tags: