The operations are very simple and for most I only have a few minor comments to add to the description in the handout. Note that all preconditions are checked by the user. Also look carefully for the C-primed (C') notation: if there is an input, C, that is changed by an operation, this will be indicated by having an output named C'.
The allocation of storage depends on low-level implementation details; it turns out that sometimes (often, in fact) dynamically-allocated storage is needed even for an empty collection.
This is the inverse of CREATE.
Extremely frequently used. As mentioned above, we have included this as a primitive operation in order to ensure that it is computed in constant time.
Of course, it is possible to write this function in terms of the SIZE function: IS_EMPTY(C) <==> (SIZE(C) = 0). However, there are often ways of computing IS_EMPTY that are more efficient than computing SIZE, and we therefore include this as a separate operation.
A collection in which you are permitted to have multiple copies of a value is sometimes called a bag, or a multiset. This is the kind of collection we are going to implement, primarily because we are thinking ahead to lists: we certainly want to allow multiple occurrences of an element in our lists. In a list different occurrences of a value can be distinguished by their position - e.g. we talk of the first occurrence of a value - in a collection there is no way to distinguish between the different occurrences.
If V does not occur in C, the operation does nothing. If there are several copies of V in C, just one is deleted. If possible, the storage associated with the deleted occurrence of V is returned to the global pool.
JOIN does just what the name suggests. If C1 = { A, B, C } and C2 = { D, E }, then JOIN(C1,C2) changes C1 to { A, B, C, D, E }. What about C2? The options were discussed above, in the context of lists. As the postconditions state, C2 is undefined after the JOIN operation.
This is an extremely powerful operation, even though it is very simple. It goes through the collection one element at a time, and calls the procedure with each of the elements.
Suppose, for example, that C = { 9, 3, 1 } and P(X
,D)
= printf("%d\n",X)
then MAP(C,P,D) would call P
three times: P(9,D), P(3,D), and P(1,D). This would result in
the values in the collection being written out onto the
terminal, one value on each line. We cannot be sure what order
the values will get written out: the postconditions do not say
in what order the elements are `processed'.
The purpose of the extra data parameter D will be illustrated
in the following. It is always a pointer. By convention
it is assumed to be of type char*
, and it is the
responsibility of procedure P
to cast it to the
appropriate pointer type.
C allows procedures to be passed as parameters. Assuming collection
elements are of type int
, the above example could be
written:
typedef int collection_element; void print_element (collection_element *x,char *ignored) { printf("%d\n",*x); } void print_collection (collection c) { MAP(c,print_element,NULL); }Notice that the call to
MAP
supplies a NULL
pointer for the data parameter and that
print_element
ignores this argument. Here is a way of
using MAP to compute the size of a collection: for each element, we
add 1 to a counter. We need to pass a procedure to MAP that adds 1 to
a counter. The extra data parameter will be a pointer to this counter.
void increment_counter (collection_element *x,char *n) { * ((int *) n) += 1; } int size (collection c) { int n = 0; MAP(c,increment_counter,&n); return n; }There is no magic here. MAP calls the function
increment_counter
once with each element in the given
collection. increment_counter
increments, i.e.
adds one to, the counter pointed to by its second argument. Notice
that this pointer is received as a char*
and must be cast
to int*
.
Notice in the specification of MAP that procedure P's first parameter is a pointer to an element. This means that the procedure can modify the element itself. Therefore MAP can be used to change the elements in a collection.
For example, suppose we wanted to double every element in a collection. This is easily done, by mapping across the collection a procedure that doubles the value it is given:
void double (collection_element *x,char *ignored) { *x += *x; } void double_everything (collection c) { MAP(c,double,NULL); }As a final example, how could we write a function that takes a value, V, and a collection C, and counts the number of times V occurs in C? We need to compare V with each element of C, and add 1 to a counter if V is equal to the element. We'll write a procedure to do this, and pass this procedure to MAP - the fact that we will process the elements of C one by one is what tells us we can use MAP. The data parameter to this procedure must contain both the value V and the counter. Therefore we will implement it by a pointer to a structure with two fields: a collection element and an integer;
typedef struct { collection_element v; int n; } occurrence_data; void increment_if_equal (collection_element *x,char *data) { occurrence_data * odata = (occurrence_data *) data; if (*x == odata->v) odata->n++; } int number_of_occurrences (collection c,collection_element v) { occurrence_data data; data.v = v; data.n = 0; MAP(c,increment_if_equal,(char *) &data); return data->n; }
Before moving on to the specification of lists, let me briefly explain a few details of our initial implementation of Collection, which you will be completing and using in the next few assignments.
The main things that need to be explained are the data type definitions:
typedef int collection_element; typedef struct collection_node { collection_element value; struct collection_node * next; } collection_node; typedef struct { int size; collection_node *first,*last; } collection_header; typedef collection_header *collection;For this assignment, each element in a collection will be an integer. Each element will be stored in a node. The memory needed for a node will be dynamically allocated... you are given procedures called
get_node_memory
and
return_node_memory
for allocating and releasing this
memory.
The nodes will be linked together by pointers to form a chain. The
link/pointer is stored inside the node in the field next
.
This, of course, imposes an order on the nodes, but this order is
completely arbitrary - it can be whatever you like (whatever is
easiest to implement and most efficient to execute). You get this
freedom because by definition, the elements in a collection are
unordered: The ordering on nodes is just an implementation trick
which the user has no say about (as far as he is concerned there is
no order).
A collection is just a pointer to a record called a header. Headers are a very useful idea - we'll discuss them further below. For now, all you need to know is that a collection is a pointer to a header, which is a record containing three pieces of information:
size
, the total number of elements currently in the
collection.
first
, a pointer to the first node in the chain of
nodes.
last
, a pointer to the last node in the chain of
nodes.
size
will be 0, and
first
and last
will be NULL
(i.e. the distinguished pointer to nothing). Note that every
operation that changes the size of a collection must update the
size
field in this record; otherwise the value stored
there will be wrong. The memory for headers is allocated dynamically -
you are given procedures called get_header_memory
and
return_header_memory
for allocating and releasing this
memory.
To give a concrete example: the collection C={ 9, 1, 5 } would be represented:
header_count
and
node_count
memory business.