||Minipar is a principle-based broad coverage parser. The
version that is downloadable from here contains
||A HMM Package in C++. Click here to download.
||A MaxEnt Package in C++.
||This contains a list automatically constructed thesaurus. For
each word, the thesaurus lists up to 200 most similar words and their
similarities. The similar words are clustered (also automatically).
above. But the words similarity is computed based on the linear
proximity relationship between words only, where as the above
thesaurus used dependency relationships extracted from a parsed
Word frequency counts from a 1.5B word corpus (TREC disks 1-5 and the Reuters corpus). The words are normalized as follows: ALL CAP words are prepended with a_ and Capitalized words are prepended with c_ after downcasing. Digits are all replaced with 0.
|Dependency Triples [120MB]