What is a binary search algorithm

14. Elementary search methods

was derived. The center of the interval is calculated by adding half the length of the interval to the left endpoint. The interpolation search simply boils down to replacing the value 1/2 in this formula with an estimate of where the key might be within the available values: 1/2 would be appropriate if v is in the middle of the interval of a [l] .key to a [r] .key would lie, but is x: = l + (v-a [l] .key) * (r-l)div(a [r] .key-a [l] .key) a better choice. Of course, this assumes an equal distribution of the key values.

Let's assume in our example that the i-th letter of the alphabet by number i is pictured. Then, when searching for M, the first position considered in table 9 would be, since 1 + (13 - 1) * (17 - 1) / (24 - 1) = 9.3 ... This search is completed in just three steps as shown in Figure 14.4. Other search keys are found even more efficiently: For example, the first and the last element are found in the first step. Figure 14.5 shows the interpolation search for the 95 element file of Figure 14.2; it only uses four comparisons, while binary search required seven.

Property 14.4Interpolation search requires less than lg lg N + 1 comparisons for both successful and unsuccessful searches in files with random keys.

Proof of this fact is well beyond the scope of this book. This function is a very slowly growing function that, for practical purposes, can be viewed as a constant: if N is one billion, is lg lg N <5. Thus, each data set can be found using only a few accesses (on average), which is a significant improvement over the conventional method of binary search.

The interpolation search, however, is very much dependent on the assumption that the keys are very well distributed over the interval. It can be "tricked" by keys with a very uneven distribution, which is common in practice. In addition, the method requires some calculations: For small N come like lg N cautious cost of simple binary search lg lg N quite close, so that the effort of interpolating should probably not be worthwhile. On the other hand, the interpolation search should certainly be considered for large files, for applications where comparisons are particularly costly, or for external methods where access costs are high.