Branching, the cost is still too high

Everyone’s motivation to move to distributed version control systems (DVCS) was that the cost of branching was too high with Subversion. Part of it is true, but even with DVCS, I find the cost of branching to be too high for my taste. I can create feature branches for branches of a decent size, but I think traceability needs even even more granularity.

Let’s begin by listing my typical process to handle feature branches these days.

  1. Branch trunk from the repository to my local copy
  2. Copy configuration files from an other branch
  3. Make minor changes
  4. Run scripts to initialize the environment.
  5. Develop, commit, pull, merge – all of this is great
  6. Push to trunk

My problem is that dealing with those configuration files takes too much time and that is still troublesome. However, there is no real way around it. The application needs to connect to MySQL, Gearman, Sphinx and Memcached. On development setups, they are all on the same machine. Still because I am way too lazy to create new database instances and I often don’t change my prefixes as much as I should, I end up with multiple branches sitting there with only one really usable at any time. Of course, it would all be solved if I were more disciplined, but if it annoys me, it prevents me from doing it right. Just having to do the configuration part encourages me to re-use branches.

The goal of fine-grained branches is to represent the decision-making process as part of the revision control. The way I see it, top level branches represent a goal. It could be implementing a new feature, enhancing a piece of the user interface or anything. However, to reach those top level objectives, it may be required to perform some refactoring or upgrade a library. If those changes are made atomically through a branch and merged as a single commit, there would be ways to look at the hierarchy of commits to understand the flow of intentions. Bazaar can generate graphs from forks and merges. I can imagine tools to help traceability if the decision making is organized in the branch structure.

Why traceability you might ask. For many things that don’t seem to make sense in code, there is a good historical reason (unless it’s due to accidental complexity). Even in my own code written a few months prior, I find places that need refactoring. Most of the time, it’s simply because I was trying to look too far ahead at the time. I was anticipating the final shape of the software, but by the time it got there, new and better ways to achieve the same result had been implemented, leaving legacy behind. When this happens to be in my own code, I can think about the process that led to it, figure out what the original intention was and decide how the design should be adapted to the new reality. When the code is written by someone else, the original intention can only be guessed. I hope creating a hierarchy of branches can provide an outline of the thought process that would explain the decisions made.

My Subversion reflexes pointed me towards bzr switch. It brings a change to the way I got used to work with a DVCS.  My transition was to switch the concept of working copy to branching. Check-outs simply had no use. I was wrong. They can actually fix my issue of configuration burden. If I keep a single check-out of the code that is configured for my local environment, I can then switch it from one branch to an other. Because we are in the distributed world, those other branches can be kept locally, just not in the working copy. The process then changes.

  1. Create a new branch locally
  2. Switch the check-out to the new branch
  3. Develop, commit, pull, merge
  4. Switch check-out to parent branch
  5. Merge local branch

Of course, if changes happen in the configuration files outside of what was locally configured or the schema changes, this has to be dealt with, but I expect this to be much less frequent.

The next step will be to rebuild my development environment in a smarter way. Right now, I have way too many services running locally. I want to move all of those to a virtual machine, which I will fire up when I need them. For this step, I am waiting for the final release of Ubuntu 10.04, and probably a few more weeks. In the past, I had terrible experiences with pre-release OS and learned to stay away, no matter how fun and attractive new features are. It also means re-installing my entire machine, so I don’t look so much towards that. It should be easier now that almost everything is web-based, as long as I don’t loose those precious passwords.

Using virtual machine to keep your primary host clean of any excess is nothing new. I guess I did not do it before I though my disk space was more limited than it is. My laptop has a 64G SSD drive. It was a downscale from my previous laptop’s drive, which was continuously getting full. Too many check-outs, database dumps, log files. They just keep piling up over the years. It turns out the overhead of having an extra operating system isn’t that bad after all.

The good thing about virtual machines is that they are completely disposable. You can build it with the software you need, take a snapshot and move on from there. Simply reverting back to the snapshot will clean up all the mess created. Only one detail to keep in mind: no permanent data can be stored in there. I will keep my local branches on the main host and the check-out in the virtual machine. Having a shell on a virtual machine won’t make much of a difference than a shell locally.

Leave a Reply

Your email address will not be published.