| Random Page
[Discussion extracted from MeatballWikiSuggestions
I've done a bit of rethinking about the HTML tags, and I think I could offer a larger subset of tags safely. Here's a new plan:
- Define a list of generally-safe tags (almost everything except applet and script tags).
- More conservative ContentOverForm sites might eliminate tags like <font> from the list.
- Search for a generic tag expression (like <(\/)?(\w+)(\s+.+?)?&rt; for the Perl-Inspired (tags are searched for *after* HTML-escaping)), and the three parenthesized sections (end-slash, tag-name, and tag-parameters) are passed to a "tag_verify" routine.
- I'm still considering whether to enforce tag pairs under such a scheme, at least for some subset of tags. (I'm leaning towards enforcement. Must be my HardSecurity background. ;-)
- I don't think any closing tags can have parameters--this would make pairing-enforcement easier. I'll check the specs at  carefully.
- The "tag_verify" routine either passes the tags as OK (and un-escapes the tag), or it rejects it (and leaves the tag safely escaped to appear as plain text).
- If the tag-name is not on the list of generally-safe tags, it is rejected.
- This also allows pseudo-HTML like <rant> to continue as normal (escaped) text.
- If on the list, and the tag-parameters are empty, accept/pass the tag.
- If the tag-parameters are not empty (they may be empty for simple tags like <br>)
- Copy the tag-parameters into $paramcopy.
- Apply a set of reducing-rules to $paramcopy, each of which removes a known-safe parameter from the copy.
- For instance, one rule might be s/(^|\s)align=\w+($|\s)/ /g (which removes parameters like align=center, replacing them with a space (to preserve spacing between parameters)).
- This list should also become a configuration setting for the wiki. (One might allow a font tag, but not allow color= parameters.)
- If, after all the reducing rules are applied, there is only whitespace in the $paramcopy, the full tag is accepted. (Otherwise, it is rejected and passed through as plain text.)
- Eventually one could add rules to reject "nonsense" tags like <br align=conservative color=myfavorite>, but the first priority is simply to reject tags that may be harmful. Mere nonsense can be handled in the usual wiki way.
- The first implementation would not handle anchors or tags which contain wiki-sensitive text in the parameters (like URLs for links)
- Later versions might hide wiki-sensitive text in the parameters.
This idea probably won't take as long to implement as it did to describe. --CliffordAdams
I'm not fond of allowing too much HTML, since it can raise the barrier to entry. Probably it should be optional.
- I'm not overly fond of "too much HTML" either. For one thing, it could make a wiki turn into little more than a publically shared set of homepages. Of course, that's not necessarily a bad thing, just not a very wiki-like thing.
- On the other hand, I'm not sure that "wiki markup" is always a great idea. The core idea of the LinkPattern (or some simplified link) is essential, of course. The line-based markup for indentation and lists is also quite convenient. I'm less sure about the single-quote markup (although I plan to keep it around, at least as an option). I'd rather not introduce (much) more special markup. If one learns the basics of HTML one can use it in many contexts. If one learns one wiki's markup ... one has learned one wiki's markup. :-)
- The HTML support will be configurable without changing the main code. Indeed, turning off the tags will probably be a 1-character change. I'm considering some "levels" of HTML tags like:
- 0: No tags except a very few special ones (like nowiki and pre)
- 1: Very minimal (like Meatball now), adding tags like b, i, tt, and maybe a few like code, hr, br.
- 2: Minimal: add some more like h1...h6, big, small, sub, sup, p, blockquote
- 3: Intermediate: add lists (UL, OL, DL), fonts
- 4: More: tables, DIV, other layout
- 5: Extreme: any non-security-risk tag (could even include forms)
- Beyond these levels, one would have to use an HTML.../HTML section. --CliffordAdams
Besides, I thought I already showed on http:mbtest.pl how slurping entire HTML tags without question is a bad idea. HTML just continues to extend itself in hairy ways. There is a proposal for scriptable CSS too.
The only safe way to include HTML would be to write an HTML parser and accept only what is safe (rejecting all else). This would require intimate knowledge of the HTML standard, which has a bad habit of changing and being totally unrelated to the RealWorld.
WikiSyntax steps around this by restricting the markup to something known to be safe. Moreover, by looking different from HTML, it doesn't encourage people to try to output <B STYLE="color:red">junk like this</B>. ;)
Anyway, if you want to remain safe, you must not allow HTML to be entered. HtmlIsAssembler. Safe environments don't provide assembler escapes.
- I'm not planning on providing full HTML. Style sheets are not going to be supported, at least until Netscape works well with them. (I'm not holding my breath.) (If you want CSS, I hear MoinMoin is pretty good. :-) While the standard bodies scream "stop using old-fashioned tags", I think CSS will be about as successful as SGML.
- For security, the plan is to allow a carefully-selected subset of parameters. For instance "(^|\s)color=#cc0066(\s|$)" is not a security risk, so it would be removed from the copy. After all known-safe parameters are removed, the tag will be rejected if there is even one non-whitespace character remaining. (I probably won't allow quoted parameters unless I'm sure I can match the quotes properly.)
- For community feel, Meatball will probably remain at the "Very Minimal" tag level. I have no desire to see H1 tags either. Other sites feel differently. One of the most active UseModWikis even allows raw HTML, and frequently uses tags for colors or other presentation. (It's a private wiki for a college class.) Of course, other tags might be allowed on Meatball if the local GodKings approve... --CliffordAdams