MeatballWiki | RecentChanges | Random Page | Indices | Categories

Since there is no direct way to embed meta information into an ASCII stream, it has become fairly common to use meta characters--that is, special strings that are picked up and recognized as something special to the stream. For instance, in HTML the < and > characters are meta characters. On this wiki, double-double quotes ("") break a LinkPattern.

However, it is very difficult to choose meta syntax that won't conflict with valid content, especially in a free form text medium like HTML or even wikis. It is actually impossible because you could easily start talking about the medium's syntax in the medium itself (meta-circular!), in which case you're toast.


Always provide a mechanism to EscapeMetaCharacters. The escape rule set should also be escaped by itself.

For instance, in HTML you can write &lt; and &gt; to replace < and > as the ampersand (&) is a meta character. To write the ampersand, you can write &amp;.

An alternative to EscapeMetaCharacters is to accept a narrower range of valid input. For instance, both the C2 wiki and UseModWiki define a "Field Separator" (FS) character which is used to delimit user-provided text. The FS character is *removed* from any user input. This means, among other things, that these wikis cannot store raw binary data (which may include the FS character).

Another approach is to limit the valid characters in certain meaningful parts of a document. For instance, the double-quote (") character is not allowed in a URL. (I think you can do %22 for a quote.) This makes parsing double-quote-delimited URLs much simpler.

The interaction of different escape sequences can create problems. The characters in "&amp;" are perfectly valid in a URL, and they do *not* represent a single ampersand character when used in a URL. (In UseModWiki URLs the "&amp;" text is "de-quoted" back to a single "&" character.) The official suggestion is to avoid the use of the "&" character for separating CGI parameters, and use the semicolon character ";" instead. All properly-written CGI scripts support the use of the semicolon character ";" in place of the "&" character ( http://rdrop.com/~cary/html/html.html#semicolon ).

Classic wiki formatting has a lot of special sequences that don't always interact nicely. For instance, the single quote (') character is legal within a URL, but two adjacent single quotes might be interpreted as a formatting command (if there is a matching pair of single quotes). Wikis generally follow the JargonFile:DontDoThatThen strategy of dealing with these conflicts.

Often one can achieve most desired behaviors with separate layers of escapes. For instance, in UseModWiki, the processing steps include:

After the whole page is processed, all stored text is reinserted into the page. Text that is stored at earlier stages receives less processing than text stored later. (HTML sections are unprocessed, NOWIKI text has only HTML-quoting, PRE-formatted sections are done after joining backslashed lines, etc.)

We had an earlier discussion somewhere about replacing the # for REDIRECTs because it is a valid character. Then again, ! is also a valid prefix character in some bizarre languages. -- SunirShah

The text #REDIRECT is only special at the start of a document. Wikis are full of position-sensitive formatting. People can generally live with the mild restrictions imposed by wikis.

After thinking a bit, I think judicious use of <nowiki> tags would suffice to get around the metacharacters in code problem. --ss

Actually, the <PRE> ... < /PRE> tags are probably better for most code--a preformatted section will skip any link formatting or other wiki markup. The nowiki tags will not translate line breaks or paragraphs--they are meant for rare uses where one wants to turn off formatting within regular text. For instance, if I want the name of this page to appear without being a link, I can use the nowiki tags like EscapeMetaCharacters within an ordinary sentence. --CliffordAdams


MeatballWiki | RecentChanges | Random Page | Indices | Categories
Edit text of this page | View other revisions