Lecture 5 - Caching and Lazy Evaluation

2.1. Caching and Lazy Evaluation

First, we shall consider a very widely used method for speeding up computations that are done repeatedly, like computing the size of a list. To begin with, whenever we compute the value we store it - in the header. Then when we are asked for the value we do not have to recompute it, we can just retrieve it from the header. This is called caching, and it is a very common, very useful technique for speeding up a computation that is repeatedly requested.

Ideally, every operation that affects the cached value updates it, so that the value in the header is always correct. This is how the size of a collection is `computed'.

But this ideal situation is not always possible. We've seen that the SPLIT operation ruins this strategy, because when a list is split there is no way of deducing the respective sizes of the resulting lists. All we can do is compute one of them from scratch - the other one can then be derived from this. We could do that inside the SPLIT procedure, but then SPLIT wouldn't be a constant time operation. Of course, computing the sizes from scratch really is our only option. What we are trying to decide is when should this recomputation should be done?

The obvious answer is: recompute the value as soon as its cached value becomes invalid. In our example, this would mean recomputing size inside the SPLIT procedure. What is the defect in this obvious, simple strategy?

Answer: It leads to wasted computation, possibly a great deal of wasted computation. For example, suppose one SPLIT is immediately followed by another, we would have wasted our time computing the size. And in programs that recursively process lists, it is extremely common to SPLIT the list repeatedly until all the parts are size 1 or 2. This is N/2 SPLIT operations in a row - if each one recomputed size, an enormous amount of wasted computation would have been done.

So, when should we recompute size? The answer is: wait until we absolutely need the value. That way there is no wasted computation. We do what is necessary and nothing more. If we never need to know size, we'll never recompute it.

Our new strategy is to have a place in the header for the size, just like before. In addition, we have a Boolean variable that indicates whether or not the size is currently known.

: Show the type declarations used in the assignment.

If the size is currently known (Boolean = true), then we can use and update the stored value just like before. If the size is not currently known, then the operations cannot update it, and if we ever really need to know the value, we compute it from scratch, store it in the header, and change the Boolean to indicate that the size is now known. This strategy is called lazy evaluation - lazy because we only compute a value when we absolutely need to know it. The combination of lazy evaluation and caching is very useful - by using both, the computation of size is almost constant time.

How do we know when we absolutely need to know the value of size? The answer is: any time the SIZE function itself is called, it must return an answer. If we write our code so that all access to the size information is done through the SIZE function, then a call to SIZE is the ONLY time when we absolutely need to know the size value.