4. Node And Arc Diagrams

In designing dynamically-allocated data structures and the algorithms that operate upon them, it is not a good idea to directly think about code in a particular programming language. Much better is to use a pictorial representation, which I call node-and-arc diagrams.

In node-and-arc diagrams, there are three types of things:

circular nodes, which represent dynamically allocated memory
arcs (or arrows), which represent pointers,
boxes, which represent statically allocated memory: these always have a symbolic name beside them.

For example, if you have a program with the declarations:

        float X ;    /* X is a real */
        float *P ;   /* P is a Pointer to a real */

The compiler allocates (statically) space for these variables. In my diagrams, this would produce two boxes, one for X and one for P.

The question marks indicate that these memory locations contain undefined values. Of course, we can assign values to these variables in the usual way:

        X = 3.9;   P = NULL;

We use the function GET_MEMORY to dynamically allocate memory; it allocates a memory cell and returns the cell's address which we can store in P, e.g. P = GET_MEMORY() ; In the pictures we could show the result like this:

But this does not highlight the intimate relation between P and the newly allocated memory. So instead we draw an arrow from P to the new memory, and don't bother with the exact address at all.

The distinction between P and (*P) is now very clear. P, as always, refers to the statically allocated box, *P to the memory the arrow points to. I think of * as meaning ``follow the arrow''. The `name' of the new memory is (*P); to set a value into the new memory we assign a value to this name.

        (*P) = 4.2;

If at this stage we do P = NULL, what is the result? Well, this means exactly what it says: store the value NULL in the memory location P.

Can anyone see a problem with this? The problem is that the dynamically allocated memory is `lost' or `dangling'. We have no way to access it, we cannot even return it to the global pool. The reason is that we have no name for it, no way to refer to it. *P was the only name we had for it, and now that P has been changed, that name is gone. In the diagram, a `lost' memory cell is very obvious - it has nothing pointing to it. Of course P = NULL is not the only thing that would have caused this problem... any change to P would have done it.

The way we get rid of unwanted memory is with the procedure RETURN_MEMORY. If we were in the previous state:

calling RETURN_MEMORY with P as a parameter would produce just what we want:

Note that the value of P is now undefined. It is probably the case that P still contains the address of the cell it used to point to, e.g. 1908, but it would be an error to try and dereference this value.

Suppose now that we are in the state:

There is no harm in creating a second pointer to the dynamically allocated memory. If we had declared float *Q; we'd have:

And we can perfectly legitimately say Q = P. What does this do? Is any new memory allocated? No! This copies the value that is stored in P into Q.

Remember, the value stored in P is an address; a copy of that address is placed in Q. In the diagrams we use arrows to show addresses, so the result of Q = P is:

What we have now are two names for the same memory location. (*P) and (*Q) refer to exactly the same memory cell. Therefore when we change that cell using one of the names, e.g. (*P) = 2.5;

You can see that the value `in' (*Q) has also changed. We can test for pointer equality in the usual way: P==Q asks if P and Q are identical, is the same address stored in both boxes? In the diagrams this questions means, are P and Q pointing at exactly the same place? The answer in the above picture is yes. What about in this picture:

P==Q is false. P and Q do not point at the same memory cell. The cells they point to happen to contain the same value, but that is not what P==Q tests. Because they point at different places, if we change one of these places, the other one is unaffected.

For example, if we say (*Q) = 3.7; we get:

How can we test if the memory cells they point to contain the same value? Well, the name for the cell P points to is (*P) and the name of the cell Q points to is (*Q), so we can just test (*P)==(*Q).

It is very important to always remember the difference between P, the box, and (*P) the node that P points to. For example, in the preceding picture what is the difference between P = Q and (*P) = (*Q)?

Answer: P = Q produces:

Note that we have some `lost' memory now. By contrast, (*P) = (*Q) produces: