Nested sets * Large amount of data = Need more RAM


When I say more, I mean a lot more. A terabyte would probably do it. I got to play with nested sets quite a lot recently and they are really amazing. The relations you can pull out of them are simply incredible. For those who have no idea what a nested set is, I highly recommand reading Mike Hillyer’s article on the topic.

I was working with faily large sets, little over 100 nodes. Queries were so fast I could barely notice the page reloading, and those queries were using an average of 10 joins. The problems really came when I got this great idea of importing a truck’s (yes, it’s large) spare part list into the nested set. Going from hundreds to thousands of nodes really makes nested sets SLOW. A simple listing with depth value would take around 40 seconds, and that’s only a 2 table join.

I made an attempt to run one of the more complex queries overnight to see how long it actually took to process. I never found out. When I got back to work this morning, the computer’s fans were still at full speed, leaving an loud noise in the room. The display on the monitor was gone crazy. The display problems were due to the connector being loose. My guess was that all those vibrations caused some trouble. I double checked the indexes, ran EXPLAIN to verify if all the conditions were processed correctly. Everything was in order. I then made some quick calculations to find out that the joins generated a few billion rows, each containing quite a few columns. At that point, I realized I had to find an other solution.

Luckly enough, I didn’t really need the hierarchical data about the components.

Leave a Reply

Your email address will not be published. Required fields are marked *