Of course, in building an index, you would probably want to apply 2 different techniques:
You would want to ignore certain words, e.g.: words like "a" and "an" (which might loosely be considered external verbal clutter).
I think you should ignore every empty word --EmilioDavis
For that, you'd want to use a list of stopwords. --AristotlePagaltzis?
Martin Porter, e.g., offers a list of stopwords:
To be a little better, it's recommended to ignore, e.g.: (various?) prefixes and "s", "es", "ies", "ing" suffixes (which might loosely be considered internal verbal clutter).
For that, the best solution might be PorterStemming--or maybe the earlier LovinsStemming.
See also Wiki:LikePages, MetaWiki which implements LikePages for any indexed wiki (including this one).
See also MoinMoin:CategoryCategory?action=LikePages, originally implemented by RichardJones? mailto:richard@bizarsoftware.com.au based on ideas from Wiki:LikePages.