A good, well tested binary search is a pain in the ass to write...
1. Your sorted array MUST use exactly the same ordering as the search is expecting. We built and sorted the array on a PC, and ran the search on the device, so this was not guaranteed.
2. There can be NO DUPLICATE KEYS.. Duplicate keys will eventually send the search down a blind alley and off into hyperspace. Also devilishly difficult to find if the array has a lot of items, and you don't know what the problem is.
3. The above 2 problems can sneak in during production with new data.
In a well-implemented binary search duplicated keys are not a problem. You can even tune your code to return the first, the last, or a "random" one.
The point about the sort order matching the search order is more serious.
I've written all sorts of fundamental algorithms for embedded systems and I only really started to get good when I wrote the spec, wrote the proof, then derived the code. This was done informally, but a background in Pure Math let me do it without too much effort.
And it was worth it. Code developed that way tended to run first time with no surprises. After a while we increased the number of things we needed to prove about the code, and reliability was solid.
It seems the world of programming divides into several sub-classes, including (but not limited to):
+ Those who program web sites and databases
+ Those who program numerical systems and simulations
+ Those who program embedded devices and fundamental algorithms.
These are broad, and there are no doubt many more, but the skills required in "programming" seem to be dividing and become more differentiated.
Yeah, the duplicate key business came in because of other code hanging off the side of the search (a rather hairy algorithm that was searching for the next key with a different letter in a position -- given KLXX, increment the 2nd position until there's a key matching that criteria -- KLXX --> KMAA, or if there are no KM's, KN, KO, etc....
Yeah, I was just bitten the sort order one a few days ago; the key was 12octet-long unsigned integer, I combined 64bit comparison + 32bit comparison, and gotten the endian mixed up.
I used the same compare function to generate the sorted table, and also testing. So everything seemed to work fine in my unit, but once I took externally generated data...
My unit also involves b+tree and somewhat complicated merge sorting, so I first suspected they were broken and wasted several hours hunting down the wrong path. Indeed this article is good to remind me that I'm in 90% of folks.
A good, well tested binary search is a pain in the ass to write...
1. Your sorted array MUST use exactly the same ordering as the search is expecting. We built and sorted the array on a PC, and ran the search on the device, so this was not guaranteed.
2. There can be NO DUPLICATE KEYS.. Duplicate keys will eventually send the search down a blind alley and off into hyperspace. Also devilishly difficult to find if the array has a lot of items, and you don't know what the problem is.
3. The above 2 problems can sneak in during production with new data.