Home Research Demos Downloads Publications


Minipar Minipar is a principle-based broad coverage parser. The version that is downloadable from here contains 
HMM A HMM Package in C++. Click here to download.
MaxEnt A MaxEnt Package in C++.
Dependency-based Thesaurus [90MB] This contains a list automatically constructed thesaurus. For each word, the thesaurus lists up to 200 most similar words and their similarities. The similar words are clustered (also automatically). 
Proximity-basd Thesaurus [200MB] Similar to above. But the words similarity is computed based on the linear proximity relationship between words only, where as the above thesaurus used dependency relationships extracted from a parsed corpus
WordCount [18MB] Word frequency counts from a 1.5B word corpus (TREC disks 1-5 and the Reuters corpus). The words are normalized as follows: ALL CAP words are prepended with a_ and Capitalized words are prepended with c_ after downcasing. Digits are all replaced with 0.
Dependency Triples [120MB]