Formal proofs may be good after all
In the past few days, I spent a lot of time working on the Cross Lingual Wiki Engine Project. I finally found the solution to change tracking across multiple languages without restricting contributions. In fact, the core of it only took a few hours to write. Then I had to spend a few more to mine information out of the table. Most of the time I spent on the project recently was to write an article on how the thing works.
My primary purpose in writing the article in the first place was to explain the article to other people working with me on the project. I made a few attempts before writing it down and it didn’t work out so well. I figured I was better off trying to structure my mind before trying to communicate, and I know of no better way to do so than by writing. Since the article is part of an academic project, I figured I should give it an academic feel (and I don’t mean to make it boring). I started writing it in LaTeX using LyX. The tools are just amazing. Using them kicks me in flow in a matter of minutes. During my writing session at the Pub, I never noticed 5 hours had passed, sun had went down and the room became crowded.
Anyway, in giving it some academic feel, I went down to basic math concepts like sets and graphs. It’s quite nice because the theory maps really well to the problem. In fact, it was probably a strong influence. In all those years thinking about the problem, I followed multiple math courses and one of them was discreet maths, so I guess it oriented me towards the solution. It turned out just mapping the architecture to those concepts allowed me to explain even more details. Mathematics are a very expressive language and it has a set of solutions to many problems if you can express them correctly.
So for pages and pages, I go on explaining how it all works in math terms. Then comes the implementation section where I actually explain how it works using real technologies. Today, after the review of a first draft, I decided to add an explanation of an additional query, which happens to be quite central. I didn’t know that before. I went on to describe it and explain in which ways it is correct. That was until I realized it was wrong. Simply not accurate. False in all possible ways. I had the query rewritten twice already because I forgot some corner conditions the two first times. I was pretty certain it was good this time. Mapping it back to theory, I realized how wrong I was.
Then I went back to my code and made an attempt to catch that newly discovered corner condition. After expanding the query significantly (it had 4-5 levels of subqueries by that time), I realized that last level of nesting was really close to the real purpose. I went on and removed some code, and then more. The final expression is so simple. All I did before was run in the wrong direction.
I can’t be certain it’s right now, but at least it fits the model. My advice of the day: when working on a though data modeling problem, try to explain it in terms an academic would understand. Write the formulas and prove them. Draw diagrams using GraphViz or restrict your diagrams to 2 primitives. Write it in LaTeX to give it all an old-school academic look. The final document will be good, but nothing of value compared to what you will learn on the way.
I still need to make a few changes based on the last review, and will probably go for a second round of review, but that article will eventually be published somewhere. Stay tuned.