Building tools and techniques for automated extraction of facts and relationships from the Web is an important topic because of the large volume of information stored in non-structured formats (such as free text, html, links, tables etc.) and the desire to ask questions about these facts and relationships and to answer them. This summer, I would need 1-2 talented undergrad students who can help us build some of these tools or experiment with different extraction algorithms. Here is a brief description of the projects (this is a partial list) for summer 2012. For more details about the specifics, please feel free to talk to me.
The projects are both exciting and can get you engaged in no time. This is a research scholarship, and as such the project is generally open-ended and there is lots of room for creativity. I would expect the candidates to enjoy writing (robust) code and building prototypes.
If interested, please
The NSERC undergrad summer scholarship is open to Canadian citizens and permanent residents of Canada.
Data extraction from Web pages can be a tedious job, but this job can be often automated, and the extracted data can be easily queried once it is stored in a relational database. We are looking for a student to help us with building tools for data extraction and filtering. The extraction phase will involve building patterns or templates for extracting various pieces of text or filtering them. The storage part involves storing data in a database and indexing it. A candidate student is someone who
The connectivity graph of the Web can be searched for various paths, leading to a better understanding of both the topology of the Web and the connectivity between specific pages. Online processing of such queries can be quite time-consuming. With the help of an undergrad student, we will be looking into collecting various synopses and indexing those synopses within a relational system to speed up the search process. A candidate student for this work is someone who