MeatballWiki |
RecentChanges |
Random Page |
Indices |
Categories
One can theoretically detect clusters of related documents by the commonality of the words used in them. To do this effectively, simply:
- Remove stop words.
- Stem.
- Band-pass filter. Remove high and low frequency words.
- Condensation cluster
Nouns tend to appear only in limited topics, whereas modifiers like adjectives are more random. Thus, your clusters will probably be based on noun fourms.
See Wise, J. A. (1999). The ecological approach to text visualization. Journal of the American Society for Information Science, 50(13), 1224-1233.
CategoryGraphTheory