NSERC Summer Research Projects

Web information retrieval and extraction

The general theme of our work for the past few summers has been on building tools and techniques for Web information retrieval, automated fact and relationship extraction, and querying and visualization. This has been (and still is) an important topic because of the large volume of information stored in non-structured formats (such as free text, html, links, tables etc.) and the desire to ask questions about these facts and relationships and to analyze the content. For this summer (Summer 2019), I have a few projects (as listed here) and will need 3 undergrad students who can help me with building tools and/or running experiments. Here is a brief description of the specific projects I am offering this summer. For more details, please feel free to talk to me.

Applicants are expected to have a good CS background (preferably be in their 3rd or 4th year), posses good programming skills (in C/C++ and/or Python) and are motivated to work on these projects. This is a research scholarship, and as such the project is generally open-ended and there is lots of room for creativity. I would expect the candidates to enjoy writing (robust) code and building prototypes.

If interested, please

The NSERC undergrad summer scholarship is open to Canadian citizens and permanent residents of Canada.

Some Past Projects

Data Annotation Through Online Games

Facts and relationships that are extratcted from the Web are often erroneos or inaccurate, and verifying them can be a tedious and sometimes a boring task. What if we turn this task into an online game where as the users play the game, the verification happens behind the scene? This doesn't sound boring anymore. This is a project Eddie Santos and Stephen Romansky (two summer students) did over a summer. Here is a link to the game page. James Moore (another summer student) put together and Android app for the game.

Extracting Facts from the Web

Data extraction from Web pages can be a tedious job, but this job can be often automated, and the extracted data can be easily queried once it is stored in a relational database. We are looking for a student to help us with building tools for data extraction and filtering. The extraction phase will involve building patterns or templates for extracting various pieces of text or filtering them. The storage part involves storing data in a database and indexing it. A candidate student is someone who

Searching the Web Connectivity

The connectivity graph of the Web can be searched for various paths, leading to a better understanding of both the topology of the Web and the connectivity between specific pages. Online processing of such queries can be quite time-consuming. With the help of an undergrad student, we will be looking into collecting various synopses and indexing those synopses within a relational system to speed up the search process. A candidate student for this work is someone who