3.3. a few remarks on our implementation of Binary Trees

The decision about how to implement trees is extremely similar to the decision about how to implement lists. The main issues are:

efficiency:
Which operations are to be constant time?

memory sharing:
What is the effect of the assignment statements T1 = T2? Do you get a copy or do you unify the two things? If you subsequently change one of them, what happens to the other? Similarly for JOIN.

destroy:
If we destroy a tree/subtree what happens to the tree that contains it and the subtrees it contains?
We could use exactly the same implementation strategy we did for lists. In fact, to change our list implementation into a tree implementation we don't have to do much more than add a second `next' pointer to each node (we need one for `left' and one for `right').

However, in the interest of exploring alternative possibilities the actual implementation you'll be working with does differ in some fundamental ways from the list implementation you've been working with.

Efficiency

All operations will be constant time (except DESTROY which, as always, cannot avoid being linear time, since it must return the node memory one node at a time). In order for GET_TREE(T,up) and WHICH_SIDE(T) to be constant time, we need a back pointer from a child to its parent. The SIZE information that we were so concerned to compute in constant time for lists is of no concern for trees - we didn't even include a SIZE operation in the specification.

Memory Sharing

The specification has been carefully arranged so that we could have the same ``absolutely no sharing'' policy that we had with lists. Recall in lists that two lists only shared memory if they were absolutely identical. This was achieved by having a list header, so that L1 = L2 meant that L1 and L2 pointed to the same header ... i.e. all the memory that contained any of the information about the list was identical whether one used L1 or L2 to access it. We could use the same header structure for trees ... with virtually no change.

But we will not do the same thing we did for lists. In order for you to see what is good, and what is bad, about not having headers for data structures we will implement trees without headers.

In our implementation a tree is a pointer to its root node. An empty tree is represented by a NULL pointer. This has one significant drawback: we cannot `side-effect' an empty tree like we could side-effect an empty list.

Consider insert(l,e): l is a pointer to a header. If l is empty, its header's first field contains a NULL pointer; after insertion, it contains a pointer to the first node in the list and that node contains element e.

Now, if a list was implemented by a pointer to its first node, and an empty list by a NULL pointer, insert would not be able to operate on an empty list by means of side-effects. Consider the function definition:

        void insert (list l,element e) { ... }
And suppose it is called on an empty list:
        list l = (list) NULL;
        insert(l,e);
insert(NULL,e) allocates a new node to hold e, but what is it supposed to do with the pointer to that node? In order to modify the original list l, we could pass a pointer to l instead of l itself:
        void insert (list *l,element e) { ... }
but this complicates the calling sequence insert(&l,e) since we need to pass the address of l rather than l itself and has other undesirable consequences.

Instead, we will insist that the modified list be returned as the value of the call. Thus, a typical calling sequence would be: l = insert(l,e), and the function definition would be of the form:

        list insert (list l,element e) { ... }

Our implementation of trees will follow this latter approach. In particular, the JOIN operation will be required to return the result of joining its two argument trees.

Destroy

If we destroy a subtree we will first prune it from the tree that contains it. In this way, the subtree is safely deleted from the larger tree. Note that this is very different than what we did when we destroyed an window for lists: destroying a window had no effect on the list, or even the node the window was on. We could have done the same here, destroy a subtree could leave the larger tree alone. But we will have it do more.

Now what happens to the subtrees that are contained in the tree we destroy? Well, the memory for the nodes in our tree will be returned, so all the subtrees (which are really just pointers to these nodes) will become invalid. Without having headers, there is no way for us to access these subtrees and set them to NULL. We have no choice, without headers we can do nothing. The subtrees will just become invalid pointers.

The C data type definitions

        typedef enum {Left,Right,Up} direction;
        typedef int element;
        typedef struct node {
          element value;
          struct node *left,*right,*up;
        } node;
        typedef node *binary_tree;
As you can see each binary tree contains a root value as well as pointers to all its immediate `relatives', its left and right subtrees, and its parent tree. We have defined binary trees in general, and for the next couple of lectures we are going to look at special kinds of binary trees: binary trees that have specific properties that make certain kinds of processing very efficient. The first special kind of binary tree we will look at is Binary Search Trees (section 8.2). These are binary trees with specific properties that make it very efficient to search for a value in the tree.