Radix Sort Revealed

KEYs are integers consisting of N digits. For our purpose, we may need to pad small KEYs on the left with zeros. For example, if our keys have at most 3 digits, a small key such as 12 will be regarded as consisting of the 3 digits: 0, 1 and 2 (as if it were written 012).

Radix Sort proceeds in N steps. The input to Step k is a list sorted according to criterion C(k-1), and its output is a list sorted according to criterion C(k), where C(k) is a stronger criterion than C(k-1). The final criterion C(N) is the usual order relation <=.

What is this criterion C(k)? Simply this: it is <= applied only to the k least significant digits.

Initially, k = 0 and the list is trivially ordered according to criterion C(0) because, when using no digit, all keys look the same. They are all equal according to criterion C(0).

The technical trick used by Radix Sort requires that the list L be represented, not as one single long list, but rather as an array of R buckets, where each bucket B(i) is a list. L corresponds to the concatenation of the B(i)'s.

Each bucket is a list ordered from smallest to largest (according to C(k)) - we call this Condition 1 - and everything in B(i+1) is larger than everything in B(i) - we call this Condition 2. Therefore, L = JOIN(B(1),...,B(R)) is ordered from smallest to largest according to criterion C(k).

Step k takes as input an array B of buckets ordered according to criterion C(k-1), and produces as output a new array B' of buckets ordered according to C(k). It achieves this result by inserting one element at a time into the output array B' in such a way that Condition 1 and Condition 2 are always preserved.

Here is the method: Step k processes each element of L in increasing C(k-1) order. This is achieved by processing input buckets from B(1) to B(R), and each bucket from front to back.

For each element: it uses the k-th least significant digit as an index and adds the element at the end of the corresponding output bucket. E.g. if the digit has value i, the element ends up at the end of bucket B'(i).

Thus every key in output bucket B'(i) has i in least significant digit position k. Consequently everything that is put in B'(i+1) is greater, according to C(k), than everything that is put in B'(i) (because the most significant digit used in the comparison is the one in position k and they are respectively i+1 and i). Thus Condition 2 is preserved.

By processing elements in increasing order according to C(k-1) and always inserting at the end of output buckets, we maintain the invariant that each output bucket remains sorted according to C(k-1). Why? Because every element that is inserted later in the output array is thus guaranteed to be larger, according to C(k-1), than everything that is already in the output array.

Now, notice that every element in output bucket B'(i) has i in least significant digit position k. Therefore, within B'(i), comparing keys according to C(k) is the same as comparing them according to C(k-1). Therefore, B'(i) is also sorted according to C(k). Thus Condition 1 is preserved.

We initialize the process by putting the original list, which, as we remarked, is trivially sorted according to C(0), into B(0). Thus Conditions 1 and 2 are satisfied by the initial array.

Finally, we concatenate the buckets returned by step N to obtain the complete sorted list: JOIN(B(1),...,B(R)).

Note:

The above discussion is best understood by assuming that R = 10 and that each digit lies in the range 0-9; in other words, that we use the digits of the keys expressed in base 10; but this doesn't have to be the case. We could use an arbitrary base (or radix) R, in which case we would use arrays of R buckets (because digits in base R range from 0 to R-1).

In C, base 256 is often used because the digits are then just the bytes used to represent the key, and bytes are easy to extract by regarding the key as an array of unsigned char.