Archive

Archive for May, 2010

More information overload please

A few months back, Microsoft made a presentation on OData at PHP Quebec. While I found the format interesting at first, with the way you can easily navigate and explore the dataset, I must admit I was a bit skeptical. After all, public organizations handing out their data to Microsoft does sound like a terrible idea. While a lot of that data will be hosted on Microsoft technologies, the format remains open, and it appears to be picking up.

I guess what was missing originally for me was a real use case for it. The example at the presentation used a sample database with products and inventory. Completely boring stuff. Today, at Make Web Not War, I had a conversation with Cory Fowler (with Jenna Hoffman sitting close by) who has been promoting OData for the city of Guelph. I got convinced right away that this was the right way to go. Not necessarily, OData as a format, but opening up information for citizens to explore and improve the city. If OData can emerge as a widespread standard to do it, it’s fine by me. The objective is far away from technology. In fact, when I look at it, I barely see it. It’s about providing open access to information for anyone to use. How they will use it is up to them.

The conference had a competition attached to it. Two projects among the finalists were using OData. One of them created a driving game with checkpoints in the city of Vancouver. They simply used the map data to build the streets and position buildings. That is a fairly ludicrous use of publicly available information, but still impressive that a small team could build a reasonable game environment in a short amount of time. The other project used data from Edmonton to rank houses based on the availability of nearby services, basically helping people seeking new properties to evaluate the neighborhood without actually getting away from their computer.

This is only the tip of the iceberg. The data made available at this time is mostly geographical. Cities expose the location of the various services they offer. The uses you can make out of it are quite limited. I’ve seen other applications helping you locate nearby parks or libraries. Sure, knowing there is a police station nearby is good, but there could be so much more. What we need is just more data: crime locations, incident reports, power usage, water consumption. Once relevant information is out there, small organizations and even businesses will be able to use it to find useful information and track it over time. At this time, a lot of the data is collected but only accessible by a few people. Effort duplication occurs when others attempt to collect it. Waste. Decisions are made based on poor evidence.

So there is information out there for Vancouver, Edmonton, even Guelph. Nothing about Montreal. Nothing in the province of Quebec that I could find. I think this is just sad.

Actually, if there is anything out there, it might be very hard to find. Even if there are these great data sources available openly, it remains hard to find them. There is no central index at this time. Even if there were, the question remains of what should go in it. Official sources? Collaborative sources? Not that there is anything like that, but consider people flagging potholes on streets with their mobile phones as they walk around. Of course, accuracy would vary, but it would serve as a great tool for the city employees to figure out which areas should become a priority. There are so many opportunities and so many challenges related to open data access. I don’t think we are fully prepared for the shift yet.

Categories: General Tags:

Only the words change

Amazon has brought me back to 1975 and the Mythical Man-Month. It had been on my reading list for quite a while, but at some point around two years ago, it became unavailable. After that, it sat on a shelf for a few months until the stack got down to it. I must say, skip a few technical details and this book could very well have been written last year. After all, in 1975, structured programming (that is, conditions and loops) was a recent concept and not widely adopted. Surprisingly, Brooks knew a whole lot about software development, testing and management. I have the feeling we have learned nothing since it was written. Concepts were only refined, renamed and spread out. As far as I can tell, just a few paragraphs in chapter 13 lays out the founding concepts of TDD.

Build plenty of scaffolding. By scaffolding, I mean all programs and data built for debugging purposes but never intended to be in the final product. It is not unreasonable for the to be half as much code in scaffolding as there is in product.

One form of scaffolding is the dummy component, which consists only of interfaces and perhaps some faked data or some small test cases. For example, a system may include a sort program which isn’t finished yet. Its neighbors can be tested by using a dummy program that merely reads and tests the format of input data, and spews out a set of well-formatted meaningless but ordered data.

Another form is the miniature file. A very common form of system but is misunderstanding of formats for tape and disk files. So it is worthwhile to build some little files that have only a few typical records, but all the descriptions, pointers, etc.

[...]

Yet another form of scaffolding are auxiliary programs. Generators for test data, special analysis printouts, cross-reference table analyzers, are all examples of the special-purpose jigs and fixtures one may want to build.

[...]

Add one component at a time. This precept, too, is obvious, but optimism and laziness tempt us to violate it. To do it requires dummies and other scaffolding, and that takes work. And after all, perhaps all that work won’t be needed? Perhaps there are no bugs?

No! Resist the temptation! That is what systematic system testing is all about. One must assume that there will be lots of bugs, and plan an orderly procedure for snaking them out.

Note that one must have thorough test cases, testing the partial systems after each new piece is added. And the old ones, run successfully on the last partial sum, must be rerun on the new one to test for system regression.

Does it sound familiar? I see test cases, test data, mock objects, fuzzing and quite a lot of things we hear about these days. Certainly, it was different. They had different constraints at the time, like having to schedule to get access to a batch-processing machine. There is some discussion about interactive programming and how it would speed up the code and test cycles.

I find it impressive given that they had so little to work with. I wasn’t even born when they figured that out.

Because the experience is based on system programming for an operating system to be ran on a machine built in parallel, there is a strong emphasis on top-down design, which is the most important new programming formalization of the decade, and requirements. To me, the word requirement is a scary one. I don’t do system programming and for what I do, prototyping and communication does a much better job. However, I found the take interesting.

Designing the Bugs Out

Bug-proofing the definition. The most pernicious and subtle bugs are system bugs arising from mismatched assumptions made by the authors of various components. The approach to conceptual integrity discussed above in Chapters 4, 5 and 6 addresses these problems directly. In short, conceptual integrity of the product not only makes it easier to use, it also makes it easier to build and less subject to bugs.

So does the detailed, painstaking architectural effort implied by that approach, V. A. Vyssotsky, of Bell Telephone Laboratories’ Safeguard Project, says, “The crucial task is to get the product defined. Many, many failures concern exactly those aspects that were never quite specified.” Careful function definition, careful specification, and the disciplined exorcism of frills of function and flights of technique all reduce the number of system bugs that have to be found.

Testing the specification

Long before any code exists, the specification must be handed to an outside testing group to be scrutinized for completeness and clarity. As Vyssotsky says, the developers themselves cannot do this: “They won’t tell you they don’t understand it; they will happily invent their way through the gaps and obscurities.”

Beyond the punch line, this does call for very detailed specifications. It felt to me that those were in-retrospect comments. I don’t think it was ever made that the specification were fully detailed enough for bugs to be driven out. If it had been, you would end up with an issue introduced in the previous chapter: “There are those who would argue that the OS/360 six-foot shelf of manuals represents verbal diarrhea, that the very voluminosity introduces a new kind of incomprehensibility. And there is some truth in that.”

Very detailed specifications, just like exhaustive documentation, will reach a point where it does not bring value because no one can get through all of it in a reasonable time frame. Attempting to find inconsistencies in a few pages of requirements is possible. Not in thousands of pages. The effort required is just surrealist. Sadly, following the advice of strong and detailed requirements, the world of software development sunk in waterfall in the years that followed. For all the great insight in the book we collectively realized decades later, this one got too influential.

Imagine if instead, the industry had used the advice of smaller, more productive teams when possible, higher-level programming languages and extensive scaffolding as the primary advice of the book where software development would be today.

Categories: Uncategorized Tags: