The k-Colorable Graph Generator Manual

The hope is that these will aid in identifying features that can be used to compare and distinguish various coloring algorithms, as well as providing a means of studying those settings which make for particularly hard classes of graphs to color with a specified coloring.

The program is initiated by

generator filenameand output is to

The user is prompted for an input to specify which file type is desired.

c cheat 100 10 cx 10 11 1 1 0 1 3 2 6 3 cx 5 11 7 6 9 7 14 1 8 9 .........The cheat is a set of comment lines, the first using the keyword "cheat", followed by the number of vertices and the number of colors per line in the following portion. Each cx-line contains a sequence of colors from the range[0..k-1], which if applied to the vertices in order 1..n, will be a legal k-coloring. This may be used when present to verify that indeed a k-coloring exists. (The last cx-line may have fewer colors.)

We use this cheat in our coloring program to compare the coloring(s) we
find with the hidden coloring. See for example Lawrence Hubert and Phipps
Arabie, *Comparing Partitions*, Journal of Classification 2, 193-218
(1985).

The cheat option is the last requested input from the generator program.

As its second input, the program prompts for a random number seed.

1 No hidden coloring 2 Equi-partitioned 3 k-colorable 4 k-colorable(smooth) 5 k-colorable with delta variation Alternate graph: 6 Flatgraph

For classes tested so far, and using a suite of various algorithms,
it appears that these present the most difficulty in obtaining the specified
coloring, or good approximations to it.

(See
Exploring the k-colorable Landscape with Iterated Greedy)

For variability greater than 0,
the class is also called the *highly variable k-colorable graphs*.
For each vertex in turn, the generator
chooses an integer *h* at random from the range *[0,delta]*.
It then assigns the vertex to the partition number *i*
chosen at random from the range *[h,k-1]*.
(We use colors *[0...k-1]*).

Classes 3, 4 and 5 seem to be easier to approximately color the larger is the amount of the variation in size that is allowed.

After selecting the color class, the user will be prompted for the number of vertices in the graph, and the number of colors, and if one of the highly variable classes the variation control parameter.

The original Flat Graph Generator together with a description and discussion can be found here. The hidden coloring used in the flat graphs is equi-partite.

1 IID (independent random edge assignment) 2 Girth and Degree Inhibited 3 Geometric 4 Weight Biased Graph (encourages or inhibits cliques) 5 Clique driven 6 Cycle driven

The input *g* indicates that no
cycle is to be created in the graph of size less than *g*. To
this end, whenever an edge *(v,w)* is about to be added to the
graph, every pair of vertices *x,y* which will have distance less
than *g* after the addition is marked as ``blocked''.
*(x,y)* will then never be selected as a candidate edge. Our
algorithm for doing this is not very efficient, and so the cost of
generating these graphs goes up sharply with *g* and *n*.

*p* is the probability that a pair selected as a candidate edge
will be used as an edge. Since limiting the girth already drastically
limits the number of edges in a graph, most of our experiments so far
use *p=1.0*.

*Max delta* and *delta*
are limits set on the allowed variation in the degree
sequence. *Max delta* is a hard limit
on the allowed difference between
the average degree over all vertices and the maximum degree of any vertex.
*delta* is a soft limit that may be increased if the program detects
it is unable to add any more edges at the current setting.
The edge selection terminates if *delta > Max Delta *.

The graphs are generated using two queues of vertex pairs and a blocking matrix. The blocking matrix B is like an adjacency matrix except when B{ij} = 1 it means that the vertex pair (i,j) cannot be used as an edge in the graph. Initially the blocking matrix is assigned 0 everywhere, then 1's are assigned for every pair of vertices which occur in a same set in the specified coloring.

All pairs of vertices are assigned to one queue initially, and then
the queue is randomly shuffled. This is to reduce the biases that would
otherwise be present if we were to consider the vertices adjacent to another
vertex all at once, as would happen in usual traversal methods.
As pairs are removed from the queue, they are simply discarded
if the edge is blocked in B. Unblocked pairs are tested to see if adding
this edge would violate the soft degree constraint. If so they are
added to the second queue. Otherwise, with probability *p* the edge
is added, and if added the blocking graph is updated to prevent small
cycles.

When the queue is exhausted, the second queue is randomly shuffled and the roles of the two queues are switched. If a pass through the queue fails to add any edges to the graph, the soft limit is increased. The program terminates when either the soft limit exceeds the hard limit, or the queues become empty.

In our system, we generalize this class.
We allow the dimension to vary from D=1 upwards.
Then the user is asked for the distance *r* which determines whether an
edge is to be assigned.
As in Johnson *et al*, there are two complementary versions: In the first
a pair of vertices is joined by an edge if the D-dimensional Euclidean
distance is less than *r* and the vertices are
in distinct partition elements.
In the second, the vertices are joined by an edge if the distance is greater
than *r* (and the vertices are in distinct partition elements).

The program asks for a wrap parameter that can be set if desired. If wrap is set, then the definition of distance used is the minimum of the distance between the vertices and the distance that would result by identifying opposite facets of the hypercube. Otherwise we use the usual definition of distance. The reason for this parameter is to reduce the variance in degree that occurs because some vertices are near corners or edges of the space.

Finally, the program is capable of generating an output file readable by ``gnuplot'', if the dimension is D=2 or D=3. The user is prompted for this option.

The first parameter for this class is the edge probability *p*.
This is used to terminate the selection of edges if and when the number of
selected edges exceeds *pn(n-1)/2*.
For many settings this bound is never reached.

The next parameter is an integer weight W. This weight is assigned to all pairs of vertices which are not in the same hidden independent set. Pairs in the same set are assigned a weight of 0. Pairs are selected as edges with probability proportional to their weight.

The weights are adjusted under the control of two further parameters,
*alpha* and *gamma*.
When a new edge is selected by the program, *gamma* indicates the amount
of change applied to every pair of vertices incident to the new edge, if the
pair has not yet been selected as an edge. *Alpha* only applies to those
not yet selected pairs incident to the new edge which would form a triangle
with the new edge and some other previously selected edge.

The adjustment parameters,
*alpha* and *gamma*,
may either be applied multiplicatively or additively.
The method of application is requested before the parameter itself.

The reason for allowing multiplicative adjustments was to take the most
advantage of the proportional selection.
For example, suppose we set W=256 and *alpha=0.5*.
(Ignore *gamma* for now). Then when a new edge is added, every pair that
could form a triangle with this new edge and some other old edge would
have its weight cut in half. Thus, on the next selection these pairs
would have only half the likelyhood of being selected.
After a pair was decremented 9 times, its weight would be zero, (due to
truncation) and so would no longer be eligible as an edge.

Although this would allow cliques of size 10, and no larger (to see this consider the last edge added to a clique), in practice the maximum clique size would be far less than 10. This is because the probability is very high that many different incident pairs of edges will be set before a pair of vertices is selected.

If multiplicative controls greater than one are applied, then the generator will induce clustering and larger cliques. The reader is encouraged to imagine and then test what will happen if one control is set to increase the weights, while the other is set to decrease them.

In practice, using an additive weight of -1 for *alpha* and
additive zero for *gamma*(which means it has no effect)
the graphs are very flat with respect to degree variation.
These are the ones we have tested so far and found to be very difficult to
color well.

The edge selection process will be terminated if the total weight becomes zero.

The data structure used to store and select the edges
by weight is an implicit tree similar to
that used in binary heaps. This allows the updating of any pair's weight
and the selection of a pair proportional to its weight in
logarithmic time. This is important when one considers that the algorithm
performs O(n^3) updates and O(n^2) selections over *n choose 2* pairs.

** Weight biased graphs can require a significant amount of CPU time
to complete. **

The first required input is to indicate whether the color classes are to be selected in proportion to their sizes or not. The remainder of the input is a series of pairs of numbers, the first being a number of cliques to generate, and the second the size. An entry of 0,0 terminates the process.

The selection process is done by randomly selecting a vertex and deciding whether it is available for the current cycle. If not, another probe is made. The first input request asks the user to specify how many probes should be made before ending the attempt to extend the current cycle. This is used because it is possible to specify very long cycles which it may not be possible to complete, or the user may wish not to spend the CPU time for very intensive attempts, being satisfied with long paths.

The rest of the inputs are pairs of numbers as in the clique driven graphs, except the size now indicates the cycle length.

**
DISCLAIMER OF WARRANTY. Because the Software is a research
work, it is provided "AS IS" WITHOUT
WARRANTY OF ANY KIND AND WITHOUT ANY SUPPORT
SERVICES. **
This program
was developed on SUN SPARC workstations running UNIX using the Gnu C compiler.
It may or may not be portable to other systems.

Joseph Culberson

* June 15, 1995 *