As part of the day to day work of a software developer, decisions have to be taken every single day. Some have a minor impact and can be reverted at nearly no cost. Others have a significant impact on the project and reverting it would be a fundamental change. I have found that, in most cases, not making a decision at all is a much better solution. A lot of time is wasted evaluating technology. Out of all the options available out there, there is a natural tendency to do everything possible to pick the best of the crop, the one that will offer the most to the project and provides the largest amount of features for future developments. While the reasoning sounds valid, it’s an attempt to predict the future and will most likely be wrong.
Of course, the project you are working on is great and you truly believe it will be revolutionary. However, you’re not alone. Every day, thousands of other teams work on their own projects. Chances are they are not competitors, but most likely a complement to yours and will likely make the package you spent so much time selecting completely obsolete before you’ve used all those advanced features.
Too often, I see a failure to classify the type of decision that has to be made in projects. They are not all equal. Some deserve more time. In the end, it’s all about managing risks and contingencies. The very first step is to identify the real need and what the boundaries are with your system. No one needs Sphinx. People need to search for content in their system. Sphinx is one option. You could also use Lucene or even external engines if your data is public. What matters when integrating in this case is how the data will be indexed and how searching will be made. When trying out new technology, most will begin with a prototype, which then evolves into production code. At that point, a critical error was made. Your application became dependent on the API.
If you begin by making clear that the objective is to index the content in your system, you can design boundaries that isolate the engine-specific aspects and leave a cohesive — neutral — language in your application.
Effectively, having such a division allows you not to choose between Sphinx or Lucene or something else. You can implement one that makes sense for you today and be certain that required changes to move to something else will be localized. With your application logic to extract the data to be indexed and the logic for fetching results and displaying them left independent, the decision-making step becomes irrelevant.
Certainly, there is some overhead. You need to convert the data to a neutral format rather than simply fetching what the API wants and then convert it to the appropriate format. Some look at the additional layer and see a performance loss. In most cases when integrating with other systems, the additional level of indirection really does not matter. You are about to call a remote system performing a complex operation over a network stack. If that wasn’t complex, you would have written it yourself.
A common pitfall is to create an abstraction that is too closely bound to the implementation rather than the real needs of the system. The abstraction must speak your system’s language and completely hide the implementation, otherwise, the layer serves no purpose. It’s a good idea to look at multiple packages and see how they work conceptually when designing the abstraction. While you’re not going to implement all of them, looking at other options gives a different perspective and helps in adjusting the level of abstraction.
Once the abstraction is in place. the integration problem is out of the critical path. You can implement the simplest solution, knowing that it won’t scale to the appropriate level down the road, but the simplest solution now will allow to focus on more important aspects until the limit is reached. When it will be, you will be able to re-asses the situation and select a better option knowing that changes will be localized to an adapter.
Abstracting away is a good design practice and it can be applied to almost any situation. It allows your code to remain clean, breaks dependencies to external systems that would otherwise make it hard to set-up the environment and decrease testability. Because the code is isolated, it leaves room for experimentation with a safety net. If the chosen technology proves to be unstable or a poor performer, you can always switch to something else.
While it works in most cases, it certainly does not work for some fundamental decisions, like the implementation language, unless you plan on writing your own language that would compile in other languages. Some abstractions just don’t make sense.
When you can’t defer decision making, stick with what you know. Sure you might want to try one this new framework in the cool new language. The core of your project, if you expect it to live, is no place to experiment. I have been using PHP for nearly a decade now. I’ve learned all the subtleties of the language. It is a better choice for me. I’ve used the Zend Framework on a few projects and know my way around it well enough. It’s a good solution for me. Both together are a much safer path than Python/Django or any alternative, no matter what Slashdot may say.
It might not sound like a good thing to say, but experimenting as part of projects is important. You can’t test a technology well enough unless it’s done part of a real project and a project is unlikely to be real unless it’s part of your job. It’s just important to isolate experiments to less critical aspects. It’s the responsible thing to do.
It’s all about risk management. Make sure all decisions you make are either irrelevant because they can be reverted at a low cost or use technologies you trust based on past experience and you will avoid bad surprises.