CMPUT 692-A2
Modern Database Searches and Techniques
Fall 2022
Meetings: MW 11:00 - 12:20
Instructor:
Davood Rafiei ,
ATH 436
Course eclass page
With text and natural languages being pervasive, they have become an integral part of modern database management echosystems. As the data increase in size and variation, standard models of text and queries are no longer effective or efficient. This departure has led to many interesting models and algorithms for search and data exploration in the past couple of years.
This course studies some of those models and algorithms with an in-depth analysis of the underlying principles that allow these models to be applied or scaled to large data collections and workloads.
Topics to be covered (tentative)
- Models of text and queries
- Top-k queries and indexes
- Similarity search
- Semantic models of text
- Example based queries
- Natural language data and interfaces
- Probabilistic query semantics
- Querying knowledge graphs
Course prerequisite
Students are expected to have an introductory course in data management and/or information retrieval (e.g. CMPUT 291 or equivalent), some knowledge of probability and statistics and proficiency in Linux and programming.
Grading (tentative)
- (35%) - Assignments: includes problem sets, programming exercises and research paper reviews
- (45%) - Term project (individual or groups of 2, depending on the class
size)
- (15%) - Class presentation of a research paper
- (5%) - Participation in class discussions
Recommended books and resources
- Leskovec, Rajarman, Ullman:
Minining of massive datasets, 3rd ed.
Cambridge UP, 2014.
- Manning, Raghavan, Schutze, Introduction to information retrieval, Cambridge UP, 2009.
- Hogan et al. Knowledge graphs, Morgan & Claypool, 2021.
- Relevant research papers (tba)