CMPUT 692
Information Extraction Meets Databases
Fall 2010
Meetings: MWF 11:00 - 12:00 at CSC B41
Instructor:
Davood Rafiei ,
ATH 436, 492-2374
Course moodle page:
moodle.cs.ualberta.ca/course/view.php?id=??
Information extraction from unstructured text has much in common with
querying in databases systems. Despite some differences on how data is
modeled or represented, the general goal remains the same, i.e.
to retrieve data or tag elements that satisfy some user-specified constraints.
In recent years, the two paradigms have become much closer thanks to the
large volume of data on the Web and the need for more automated search tools.
In this course, we study the areas where information extraction meets
databases. In particular, we review the roots from a database perspective
and some of the major related works that have emerged.
Topics to be covered (tentative)
- Conjunctive queries, negation and recursion
- Wrapper induction and maintenance
- Information extraction and named entity recognition
- Question answering
Course prerequisite
An introductory database systems course (CMPUT 291 or equivalent) is required.
Grading
- (35%) - Assignments: includes problem sets and research paper reviews
- (45%) - Term project (individual or groups of 2, depending on the class
size)
- (15%) - Class presentation of a research paper
- (5%) - Participation in class discussions
Recommended books and resources
- Abiteboul, Hull and Vianu,
Foundations of databases, Addison Wesley, 1995
(relevant chapters will be made available)
- Relevant research papers (tba)