The engine consists of rules, each of which has a regular expression describing what it matches, and a list of children that should be applied in recursive descent to its contents.
Each rule typically corresponds to a single XHTML node type, and always wraps its contents in balanced pairs of tags; thus, XML output is guaranteed. By limiting each rule's children to be a subset of the valid XHTML structure, XHTML output is guaranteed.
As an example: The table rule corresponds to the table element, and matches groups of lines starting and ending with ||. More specifically, the regex is:
The only child of the table rule is the tr rule, which matches a single line, wraps it in a tr tag, and strips off the final ||. Finally, the only child of the tr rule is the td rule, which matches a single pipe-separated block.
This structure can thus be verified against a DTD, guaranteeing that all future output will be strict XHTML.
While this technique is currently limited to XHTML nodes, it can be extended to include any balanced string, which will be important in the future.