Skip lists

A skip list is a data structure for fast access to an ordered set of data. It can be used as an alternative to a binary search tree, and in fact can be viewed as a (not necessarily binary) tree structure. It is actually a sequence of lists, with the initial list containing all the data in sorted order. Each successive list is defined as a copy of the previous with some elements deleted, so each provides a coarser partition of the data, until the empty list is reached. The heads of the lists are linked together from the shortest to the initial list. Each element that was not deleted is also linked to the same element in the previous list. If the lists are viewed as horizontal, then vertical links connect the heads of the list and all the copies of the same element, with all vertical links going from the shortest list to the longest.

The empty list corresponds to the root of the tree, and the vertical links go to the leftmost child. The horizontal links are used to find the remaining children (if any).

A randomized skip list can be defined by deleting each element independently with probability 1/2. This makes the lengths of the vertical chains geometric random variables, and for a given chain, the probability that its height is at least k is 1/2^k. the probability that the height of the tallest chain is at most t is thus bounded by n/2^t, so with high probability the height of the skip list is O( lg n ).

The Find operation starts at the root, and at each step it follows one vertical link and zero or more horizontal links until it either finds the element it's looking for or stops just before a larger element. The expected number of horizontal links that Find follows at each level is at most 2, since it stops at the point when an element was deleted in the next level, and each element is deleted with probability 1/2. Thus the expected time for Find is at most twice the expected height, and is also O( lg n ). Note: Calculating the joint expectation of the product of two random variables is like calculating a double sum or a double integral. Just as in these cases, it may not be equal to the product of the individual expectations. Here, we can get a uniform bound of 2 on the inner expectation so the multiplication by 2 is valid, as it would be if the inner sum or integral always evaluated to (or was bounded by) a constant. An Insert can be performed by letting the height be zero with probability 1/2, one with probability 1/4, two with probability 1/8, etc. as for the other elements. If the result is larger than the current height new levels are created. In any case, intervals in previous levels are split in the order they would be encountered in a Find operation. Similarly, Delete involves merging intervals in the order they would be encountered in a Find operation (imagine searching for a nonexistent element adjacent to the one being inserted or deleted). In either case, an argument similar to the one used for Find shows that all three operations take time O(lg n ).

References:

Willam Pugh, Skip lists: a probabilistic alternative to balanced trees, Communications of the ACM, vol 33, no. 6, June 1990, pp. 668-76.

Randomized Algorithms, by Rajeev Motwani and Prabhakar Raghavan, Cambridge University Press, 1995