ITMarkupEngine

MeatballWiki | RecentChanges | Random Page | Indices | Categories

The InfiniteTypewriter markup engine will have two features that are uncommon:

1. Javascript

The engine will be coded entirely in Javascript, giving unique mobility advantages:

Push the renderer out to the client without needing to maintain two copies written in different languages. This allows live previews, and even lets busy servers push expensive calculations onto their clients.
Pull a new renderer from another site. Since Javascript is a "safe" interpreted language, one can automatically download and install a new renderer on a Wiki engine without worrying about setting up a security sandbox. This allows a wiki to extend its list of markup styles dynamically — a feature proposed on WikiTextMimeType. However, server-side Javascript interpreters are rare, and it is possible to do unsafe things in the more powerful ones, so these benefits are not as compelling.
Include the renderer in an XSLT stylesheet. This lets us use XSLT exclusively for the View part of InfiniteTypewriter.
DevolvePower, and allow the community to maintain the markup code via a FileReplacement mechanism, without risk of compromising server security.

2. Regular Expression–Recursive Descent Design

The engine consists of rules, each of which has a regular expression describing what it matches, and a list of children that should be applied in recursive descent to its contents.

The use of regular expressions allows great flexibility in matching a particular class of markup rules. MeatballWiki's TextFormattingRules fall largely within this category, with the exception of overlapping double- and triple-quote emphasis. It is also very quick to apply on most Javascript engines.

Each rule typically corresponds to a single XHTML node type, and always wraps its contents in balanced pairs of tags; thus, XML output is guaranteed. By limiting each rule's children to be a subset of the valid XHTML structure, XHTML output is guaranteed.

As an example: The table rule corresponds to the table element, and matches groups of lines starting and ending with ||. More specifically, the regex is:

: (^|\n)(\|\|.*\|\|[ \t]*(\n|$))+

The only child of the table rule is the tr rule, which matches a single line, wraps it in a tr tag, and strips off the final ||. Finally, the only child of the tr rule is the td rule, which matches a single pipe-separated block.

This structure can thus be verified against a DTD, guaranteeing that all future output will be strict XHTML.

One disadvantage of this is technique is that it scans over the same portion of the string several times. In tables cells with complex formatting (something that UseModWiki does not support, but is useful), for instance, this might get expensive. However, since most of that scanning is very simplistic loop & compare, which can be done in a relatively tight loop in the underlying regular expression parser, it may be less expensive than a state machine that expends thousands of instructions per byte. That being said, however, different JavaScript engines have grossly different speeds for string manipulation. Experimentation will determine if the performance is acceptable.

While this technique is currently limited to XHTML nodes, it can be extended to include any balanced string, which will be important in the future.

CategoryMonkey

ITMarkupEngine

1. Javascript

2. Regular Expression–Recursive Descent Design

Discussion