| Minipar |
Minipar is a principle-based broad coverage parser. The
version that is downloadable from here contains |
| HMM |
A HMM Package in C++. Click here to download. |
| MaxEnt |
A MaxEnt Package in C++. |
| Dependency-based
Thesaurus [90MB] |
This contains a list automatically constructed thesaurus. For
each word, the thesaurus lists up to 200 most similar words and their
similarities. The similar words are clustered (also automatically). |
| Proximity-basd
Thesaurus [200MB] |
Similar to
above. But the words similarity is computed based on the linear
proximity relationship between words only, where as the above
thesaurus used dependency relationships extracted from a parsed
corpus |
| WordCount [18MB] |
Word frequency counts from a 1.5B word corpus (TREC disks 1-5 and the Reuters corpus). The words are normalized as follows: ALL CAP words are prepended with a_ and Capitalized words are prepended with c_ after downcasing. Digits are all replaced with 0.
|
| Dependency Triples [120MB] |
|
|
|