First, we shall consider a very widely used method for speeding up computations that are done repeatedly, like computing the size of a list. To begin with, whenever we compute the value we store it - in the header. Then when we are asked for the value we do not have to recompute it, we can just retrieve it from the header. This is called caching, and it is a very common, very useful technique for speeding up a computation that is repeatedly requested.
Ideally, every operation that affects the cached value updates it, so that the value in the header is always correct. This is how the size of a collection is `computed'.
But this ideal situation is not always possible. We've seen that the SPLIT operation ruins this strategy, because when a list is split there is no way of deducing the respective sizes of the resulting lists. All we can do is compute one of them from scratch - the other one can then be derived from this. We could do that inside the SPLIT procedure, but then SPLIT wouldn't be a constant time operation. Of course, computing the sizes from scratch really is our only option. What we are trying to decide is when should this recomputation should be done?
The obvious answer is: recompute the value as soon as its cached
value becomes invalid. In our example, this would mean recomputing
size
inside the SPLIT procedure. What is the defect in
this obvious, simple strategy?
Answer: It leads to wasted computation, possibly a great
deal of wasted computation. For example, suppose one SPLIT is
immediately followed by another, we would have wasted our time
computing the size. And in programs that recursively process lists, it
is extremely common to SPLIT the list repeatedly until all the parts
are size 1 or 2. This is N/2 SPLIT operations in a row - if each one
recomputed size
, an enormous amount of wasted computation
would have been done.
So, when should we recompute size
? The answer is: wait
until we absolutely need the value. That way there is no
wasted computation. We do what is necessary and nothing more. If we
never need to know size
, we'll never recompute it.
Our new strategy is to have a place in the header for the size, just like before. In addition, we have a Boolean variable that indicates whether or not the size is currently known.
size
is almost
constant time.
How do we know when we absolutely need to know the value
of size
? The answer is: any time the SIZE function itself
is called, it must return an answer. If we write our code so
that all access to the size information is done through the SIZE
function, then a call to SIZE is the ONLY time when we absolutely need
to know the size value.