MeatballWiki | RecentChanges | Random Page | Indices | Categories

Sometimes the meaning of a word depends on the context. For instance, in English, gift relates to parcels and presents; in German, it means poison. In ComputerScience, context-based definitions are supported with a NameSpace. The traditional hierarchical namespace demands that the parts of the context be ordered, but often in practice any ordering would be arbitrary and does get in the way.

Therefore, use a set for context. For instance, "tree as used in maths and computer science" has maths and computer science as its context, with no arbitrary ordering of the two. This creates a lattice space, as set containment is a lattice structure.

When a reference must be disambiguated, the definition chosen should be the one with the smallest context that covers as much of the context of the reference as possible. If this does not distinguish between several choices, their common root definition should be used.

But, this class of namespace is almost non-existent in computer systems to date, and its disadvantages are not known. Further, any adopter will have to educate their users; in contrast, every computer user is exposed to a hierarchic namespace on their desktop, and can be expected to understand it without explanation. Fortunately, a LatticeSpace degrades to a flat NameSpace if no contextual definitions are ever made.

The disambiguated context, if C is the context of the reference, and D is the set of all defined contexts, is
C(D) = ∩ { X ∈ D : ∀ Y ∈ D, (X ∩ C) ⊄ (Y ∩ C) }

If your browser has problems rendering that formula, you might have better luck with one of these two:
C(D) = ∩ { X ∈ D : ∀ Y ∈ D, (X ∩ C) ⊄ (Y ∩ C) }
C(D) = ∩ { X ∈ D : ∀ Y ∈ D, (X ∩ C) ⊄ (Y ∩ C) }

In words: the disambiguated context is the intersection of all defined contexts X that satisfy a single criterion: for any defined context Y, the intersection of X and C is not a strict subset of the intersection of Y and C.

If there are no defined contexts, the original reference should be used unchanged.

PeriPeri implements a LatticeSpace in a Wiki; however, the algorithm it currently uses is incorrect in some cases of ambiguity.

I'll take a chance by commenting, if only to break the "deafening silence"...

I certainly agree that Context is critical when a word may have more than one meaning, and I appreciate that set theory can be used to select a disambiguation method. However, I can't help but feel that the more familiar Key Word In Context algorithms I've come across before might be a bit easier to implement.

I'd be happy to try to say more, but I'm not at all sure where this is headed so I am reluctant to divert it. --HansWobbe.

Could you give references for the algorithms you're more familiar with? -- ChrisPurcell

Syntax. I had the necessity to invent a syntax of context. After some experimentation it seems best to add a context tag after the word, so one could write "gift~en equals Geschenk~de" meaning "the word 'gift' in English means the same as the word 'Geschenk' in german". One could also write about tree~el (common everyday language), tree~biology, tree~math, tree~cs or even tree~math~cs. Alternative syntactical forms like "math:tree" or "tree.math" turned out to have disadvantages. -- HelmutLeitner

My preference is "En:Gift equals De:Geschenk", "cs:tree", "cs+math:tree", et cetera. Did you consider this exact syntax (including the plus separator)? What disadvantages did it have? -- ChrisPurcell

(1) It collides with wiki namespace syntax. (2) It seems less readable. (3) The plus '+' is an especially bad character in the Browser/HTML/regex context. -- HelmutLeitner

I personally like the namespace overlap - see NearLink. Why is it bad in browser/HTML context? It's not a reserved character in HTML or URLs as far as I know. -- ChrisPurcell

I even have a wiki cluster in use where the namespace syntax is/was used for multilingual work. If you use it that way, its fine. But if you don't have such corresponding namespaces, the namespace syntax will get in the way (collide with InterWiki names, costing performance looking up potential links, ...). Just try it in practice. The '+' character is used as a replacement for the space character in URLs. -- HelmutLeitner

I don't understand the first three sentences, sorry. What corresponding namespaces? Get in the way of what? -- ChrisPurcell

In this [experimental dictionary setting] namespaces exist for natural languages and for a semantic minimal language. You can see how they link, exactly in the syntax you suggest. The namespaces are implemented as separate wikis using InterWiki links. So the "context syntax" and the "automatic link generation" correspond. But if you use the syntax En:house or De:Haus here at meatball or in most other wikis, no link is generated, but the interwiki regex pattern will be triggered ineffectively, so this is costing performance. There is also an overlap of the "namespace of languages" and the "namespace of interwiki abbreviations", so if you need a "MathCs?:tree" as a language context you couldn't have MathCs? as a interwiki abbreviation (that's what I call a potential collision). -- HelmutLeitner

See also: Some IBM folks have come up with a faceted search, although I'm sure this is not a new idea. See [1] for a beta demonstration. Click on the 'Search' tab. You can narrow your search by predetermined categories.

CategoryWikiTechnology CategoryUnimplementedWikiTechnology CategoryLink FacetWiki


MeatballWiki | RecentChanges | Random Page | Indices | Categories
Edit text of this page | View other revisions