MeatballWiki | RecentChanges | Random Page | Indices | Categories

Traditional markup languages are often created in a such a way that they may be parsed using standard tools such as Lex & Yacc. These are well known tools, so I'll skip straight to the point. Wiki languages as implemented by many wiki engines cannot be parsed correctly using Lex and Yacc because the languages they define often fall outside the realm of LALR(1) context-free languages, and often contain ambiguities.

Furthermore a key difference between lex/yacc consume/parse/render engines (e.g. a compiler) and match/transform engines (like wikis) is this:

This impacts a WikiInterchangeFormat in that essentially what a WIF needs to convey is meaning, which can be ignored by a second wiki. Traditionally the simplest way of doing this would be to exchange a parse tree - the abstract syntax tree for the document, and allow the second wiki to simplify the content it doesn't understand to just plain text. In a consume/parse/render system such a model is relatively simple since the AST will normally exist, and we just need to do a tree walk.

In the case of a Match/Transform system we need to modify every match/transform pair in the system to help us build the AST. Unlike a standard AST which would be created in parallel with the original datasource with a match/transform system, this decision has to be embedded in the transformation. The ability to "just dump the AST" doesn't apply - since it never gets built.

On the languages defined:

This means two wiki implementations can use precisely the same syntax, but have completely different languages due to evaluation sequence - as a result simply exchange the original markup between the two can cause problems. (This also affects upgrades and rewrites of wikis.)

An interesting side effect is that if you are able to extract the AST from a wiki through appropriate repeated match/transform rules (perhaps by interception), then you gain the ability to optimise the wiki rendering process. (Apply the match/transform context-senstive parsing on the way to disk to serialise the AST, and on the way out from disk using a context free consumer/parse/render system to transform to a target format). Furthermore such an approach future-proofs the content from changes to the parser. (Consider that changing the order of 10 markup rules in usemod/TWiki would result in a different language, and hence different parsing results. Simply inserting an extra rewrite rule somewhere in the middle changes the language significantly.)

I'm still not certain on how far this impacts things like a WikiInterchangeFormat and other things, but it struck me as significant enough to document seperately.

-- MichaelSamuels

Is the success of the Wookee engine relevant here? http://wiki.beyondunreal.com/wiki/Wookee --MatthewSimoneau


MeatballWiki | RecentChanges | Random Page | Indices | Categories
Edit text of this page | View other revisions