Home Research Demos Downloads Publications


I am interested in several subareas in natural language processing

Parsing A natural language sentence consists of a sequence of words. The purpose of parsing is to uncover the relationships between the words. I developed a parser called Minipar. The key distinction between Minipar and most other parsers is that it is a principle-based parser. 
Acquisition of Lexical Knowledge All natural languages consist of tens of thousands of words. Knowledge about these words is called lexical knowledge. Many NLP systems critically depend on lexical knowledge to be functional. The acquisition of lexical knowledge presents a serious challenge due to the large number of words and the many-to-many correspondence between words and meanings. The goal of my research in this area is to develop programs to automatically or semi-automatically acquire lexical knowledge from text corpora.
Coreference The objective of coreference resolution is to determine which words/phrases in a discourse (a piece of text or a segment of conversation) refer to the same entity. For example, given a sentence "John told Peter that he saw him at a conference three years ago",  a coreference resolver should be able to determine that "he" probably refers to John and "him" probably refers to Peter.
Question- Answering Given a query, an information retrieval system returns a set of  documents that may be relevant to the query. The goal of a question answering system is to identify a phrase or a sentence in the document collection that is the answer to the query.

The Q&A track in TREC is an competitive evaluation of question-answering systems.

Word Sense Disambiguation Natural language words often have multiple meanings in different contexts. For example, the word 'bank' in 'river bank' and 'bank account' means differently. Word sense disambiguation (WSD) is to determine the meaning of a word in its context.