Dealing with data

I often had to deal with converting data from various formats to others. Most of the time, the task is trivial and only requires writing a single script that does the job. It might be convert data from an old system to a new one, to allow different tools to perform analysis on it or simply to import data from an external entity. In most cases, errors can have huge implications. Since conversions don’t usually occur very often, it’s usually wise to waste a few CPU cycles to make sure incoming data is right and the output makes sense.

In a perfect world, you decide what the input format is and what the content is. In the real world, it’s not always possible due to restrictions on the capacities of the exporting application. The worst case will cause informations to be missing or unaccessible due to bad references. It’s not uncommon to see data scattered across multiple applications and some tasks might require to access multiple sources. Accessing multiple databases or gathering informations from multiple files should not be a technical problem. The problems come with inconsistencies in the data. I have seen systems where the same entry is not referenced to with the same number in different systems or where the numbering conventions simply changed with time and were never documented. The task was actually to make the user’s work more efficient when accessing various systems.

In those cases where the conversion can’t be 100% automated due to the lack of information, user input is required to perform parts of the conversions and resolve ambiguities. Users hate it and users usually hate developpers for it. The only way to actually make it better is to make the interface as friendly as possible. Reducing the amount of clicks and thinking required is a must. Using color highlights, suggesting possibilities and setting the most likely option as the default value usually helps the process (in the case of a web interface, JavaScript for validation or common tasks can be added). Sitting down with an expert user for a few hours to figure out what the common rules are and coding them into the system can cause miracles. When 100% automation can’t be reached, the goal should remain to get as close as possible to it.

A good trick to solve further problems is to write a good user documentation (yet an other task most developpers hate). Documentation is the only protection against user errors and fake bug reports a developper can have. The documentation shouldn’t only explain the different interface details of the application, but also what is the scope of the application (what it can do, what it can’t do and why it exists). A simple RTFM will then solve most problems.

Leave a Reply

Your email address will not be published. Required fields are marked *