UofAComputing ScienceSemester 2008-1

Mining Health and Medical Data
(Independent Study)
Instructor: Osmar R. Zaļane


Hospitals collect considerable amount of data about their patients, naturally for patient-record management, archival, and possibly decision support. There are also huge quantities of data collected by research centers during clinical trials and controlled experiments. Typical statistical data analysis is usually conducted on such data for evaluation and interpretation. However, classical data retrieval and analysis are insufficient to extract actionable hidden patterns. Many have started applying data mining and machine learning techniques to analyse patient records and medical data to assist in providing better decision support and more efficient, effective and cheaper health care.

The data collected are typically heterogeneous, noisy, multimedia and coming from different sources with different goals, even legacy systems. Thus, data analysts are presented with many serious challenges. Yet, techniques such as association rule mining, clustering, outlier detecion, supervised classification, social network analysis, time series analysis, text mining, etc., have been used with some degree of success. Data mining could indeed be applied on this data heterogeneous and very large to help on three different perspectives: (1) Patient record management to assist hospitals; (2) Decision support systems to aid medical practitioners; and (3) Medical research to facilitate discoveries of outbreaks, new diseases, causes or remedies.

The goal of this course is to survey the scientific literature pertaining to the application of data mining in medical and bio-medical data, to the application of knowledge discovery for decision support, data management and integration, as well as the use of data mining for the discovery and extraction of knowledge from text and semi-structured data such as patient records for knowledge and ontology building.

The course will mainly consist of a series of discussions on topics relevant to data mining for health informatics leading to the preparation of a survey paper.

Course Format:

The participants will meet once a week for one hour to one and half hours to discuss specific research papers. Students will be asked to give presentations on their interpretation and their personal critique about some specific selected papers. Each student will be assigned a particular subtopic or subtopics related to mining health care date and will prepare a term paper. Students will also be asked to implement some prototypes of reported implementations and seek public datasets to test the approaches.


Annotated Bibliography (20%), [
see web page]
Discussions (20%),
Implementation and testing (20%)
Final Term paper (40%).


Distributed: December, 2007