CMPUT 690: Principles of Knowledge Discovery in Databases

Assignment 2 (Fall 1999)

I. Concept Hierarchies

Concept hierarchies are very important in data mining. They allow knowledge discovery at different conceptual levels, they allow interactive progressive refinement, etc.
In data warehousing, concept hierarchies are necessary for operations such as drill-down, roll-up, etc. Concept hierarchies can be partial orders, lattices, or even graphs.
There are many ways to implement concept hierarchy data structures in main memory and on disk.
  1. Enumerate and describe as many concept hierarchy data structure representations as possible and explain their advantages and limitations.
  2. Indicate, according to you, which concept hierarchy representation is the most efficient in terms of space used, and which representation is the most appropriate for concept hierarchies frequently updated. Justify your answers.
  3. Suppose we choose to represent concept hierarchies with tables in a relational database.
    a) What are the advantages of such a choice?
    b) Explain how the generalization and specialization operations are performed. Use examples to better illustrate your ideas.

II. Data Cubes

A data cube is a data structure to represent multidimensional data. It is called a cube but this data structure may often represent more than three dimensions. A cell in a data cube may contain one or more measurements associated with values in the dimensions (attributes) represented. It is common to see data cubes with most cells empty. These cubes are called sparse data cubes.
  1. Explain why multidimensional data cubes are often sparse. Give examples to illustrate your arguments.
  2. Because data cubes are very large and most of their cells are empty (i.e sparse cubes), when storing and manipulating data cubes or cuboids in memory, it is wiser to avoid representing the empty cells to prevent shortage of memory space.
    a) Design a representation for a multidimensional data cube that solves the sparsity of the cubes.
    b) Explain how MOLAP operations, drill-down and roll-up, perform on your data structure.
    c) Explain how the data cube represented with your data structure is up-dated when new measurement values are provided.

Due Date: October 29th 10:00 am


Maintained by: Osmar R. Zaïane <zaiane AT cs.ualberta.ca>
Last modified: Tue Oct 19 18:00 1999