[Home]KeywordCondensationClustering

MeatballWiki | RecentChanges | Random Page | Indices | Categories

One can theoretically detect clusters of related documents by the commonality of the words used in them. To do this effectively, simply:

  1. Remove stop words.
  2. Stem.
  3. Band-pass filter. Remove high and low frequency words.
  4. Condensation cluster

Nouns tend to appear only in limited topics, whereas modifiers like adjectives are more random. Thus, your clusters will probably be based on noun fourms.

See Wise, J. A. (1999). The ecological approach to text visualization. Journal of the American Society for Information Science, 50(13), 1224-1233.

CategoryGraphTheory


Discussion

MeatballWiki | RecentChanges | Random Page | Indices | Categories
Edit text of this page | View other revisions
Search: