[Home]EmphasisPattern

MeatballWiki | RecentChanges | Random Page | Indices | Categories

The WikiSyntax used to format text as bold or italic, or whatever. Several schemes are in common use:

Triple-single-quote

This is used by WardsWiki and also by our current host, MeatballWiki. Italics use double-single-quotes and bold uses triple-single-quotes, like

 '''bold''' ''italic''
The main advantage is that it's fairly rare to get two or more consecutive single-quote marks in normal text, so people who don't know about syntax can almost ignore it. It's also quite quick to type.

The downside is that it is not especially obvious or standard outside of Wiki. If you weren't told I don't think you'd ever guess it. Even when you know, it's not obvious which is bold and which italic. Reading marked-up text isn't easy, either, especially if double-quotes are being used for quoting. You have to carefully count the tiny little dots.

Asterisks

This is used quite widely on Usenet. Bold uses asterisks. Italics uses either underlines or slashes (in which case underlines will underline). For example:

  *bold* _italic_
  *bold* /italic/ _underline_
The main advantage is that it is what people naturally type when all they have is ASCII. It's short and readable. With the second version it's also pretty obvious what each markup does.

The downside that it is very ambiguous. Asterisks and slashes are fairly common in normal text, especially in program text (C uses them for comments, multiplication and dereferencing; filenames and urls use slashes). You can try to disambiguate it (for example by insisting on whitespace to one side), but that gets hairy.

XML/HTML

Used by, well, HTML. Tags in angle brackets, bold uses B, italic I. (STRONG and EM are sometimes used too.) Eg:

   <B>bold</B>  <I>italic</I>
The main advantage is that a great many people know it from HTML or XML, and it is fairly clear what is going on even if you don't know HTML. It's reasonably unambiguous. It can form part of a consistent XML-based syntax (although you can also be more forgiving than strict XML - for example you don't have to require &lt; be used instead of < everywhere else).

It is quite fiddly to type (although at 7 characters is only 1 character longer than triple-single-quote for bold) and looks quite techie. It may also encourage people to believe your Wiki supports more HTML than it does.


The techie problem won't go away if you're only using plaintext because 7-bit ASCII (the Internet standard plaintext format) doesn't include "bold" nor "italic". So, you're stuck with metacharacters. The other option is to not use plaintext, but write a more WYSIWYG text editor in, say, Java or Javascript. -- SunirShah


One reason why single quotes (or slashes) are superior to HTML tags is that they are easier to refactor. When I want to flip the emphasis style of a paragraph with quotes, I just surround it with two single quotes on either side (or a single slash on each side). Any italics on the inside of the paragraph automatically do the Right Thing and revert to normal text.

On the other hand, the <i></i> tags don't automatically flip their meaning. Instead, you will get something weird:

<i>Beginning of italics. <i>Embedded italics</i>. End of italics</i>
Beginning of italics. <i>Embedded italics. End of italics</i>

Compare that with:

''Beginning of italics. ''Embedded italics''. End of italics''
Beginning of italics. Embedded italics. End of italics

How much of that is due to Meatball bugs? The generated HTML does not have properly nested tags. Meatball has generated <i> explicitly so the browser doesn't have a chance. Admittedly I wouldn't expect the inner tag to flip its meaning but you should at least get the whole lot in italics.

It's not a bug--it's a misunderstood feature. :-) I'm not certain what the proper standard behavior is for this case, but I'm pretty sure I'm not going to support nested uses of HTML tags. What the wiki code is doing is:

...So what happens is that the wiki first sees <i>Beginning of italics. <i>Embedded italics<i> which it replaces with <i>Beginning of italics. <i>Embedded italics</i>. Arguably the replaced text should be scanned again, but it is not, so the overlapped tag is not recognized. Note that all <>& characters are first escaped, and only unescaped if a valid tag-pair is detected.

I'm inclined to see nested identical tags as a non-supported HTML feature of the wiki pseudo-tags. If you really want that level of control, then the HTML tag (which wraps raw HTML) is the way to do it. Finally, short sequences of italicized text (like the above example) are easy to fix. Long sequences of italicized text should simply be avoided for readability. (Also, if one is using italics to separate authors, one could easily be confused and think the embedded (non-)italics are from a different author.) --CliffordAdams

I don't much care what the Meatball code is doing. My point is that it can do something else. It can interpret nested tags to give whatever semantics are desired. If you don't want to rely on nested tag support in browsers, fine. You can turn wiki text like:

<i>Beginning of italics. <i>Embedded italics</i>. End of italics</i>

into html like:

<i>Beginning of italics. </i>Embedded italics<i>. End of italics</i>

to force the browser to behave as for double-single-quotes. You have a choice. The original comment, "One reason why single quotes are superior" was just talking about a specific implementation, not a limitation of tags generally. -- DaveHarris

This is a mistake. You would be lying to the author. The author has explicitly stated that the Embedded italics will be italic. i.e. the behaviour of HTML. The best that the Meatball software should do is:

<i>Beginning of italics. <i>Embedded italics</i>. End of italics</i>

Remember that HtmlIsAssembler, even if you can preparse it and do magical things to it. People will be expecting that if you're going to give font style tags like <I> and <B> they will force the style just like HTML (assembler).

The whole problem is using a StyleTag? instead of a SemanticTag? like <EM> or <STRONG>. There is no correct way to transform the style information semantically. -- SunirShah

Sunir, can you expand upon this? I understand what you're saying but not why/what you think it should be instead. -- AdamShand

You don't have to copy HTML in your markup, although I agree it might be surprising if you don't. Notice the original section for this approach is titled "XML/HTML" and includes <EM>. -- DaveHarris


For some surveys detailing different emphasis patterns, see the bottom of CharacterFormattingRules.


Why are there such SillyTextFormattingRules?

Silly Rules: Using dozens of single-quotes for various levels of emphasis; using various numbers of underscores for levels of emphasis; ditto for asterisks.

The EmphasisPattern may have a good reason for the use of so many single-quote constructions in the original wiki, but with so many other approaches being taken by other wiki engines I don't see why it's a worthwhile convention to propagate. But what I really don't understand is with all the different, incompatible conventions, no one has tried using the standard text formatting conventions that have evolved in email communications. Everyone I know uses /words contained in slashes/ for emphasis (italics); *words surrounded by asterisks* for string emphasis (bold); and _words surrounded by underscores_ for underlined text. What is wrong with using these conventions? Why does no one use them? (anon)

I don't agree with the use of *bold* /italic/ and _underline_ conventions. If your wiki is likely to contain source code, this will be a source of confusion. I tried it for my own wiki engine [1]. Currently I prefer to actually /leave/ these extra characters in there. It gives the text a special e-look which is just fine by me. -- AlexSchroeder

I also think it's OK to leave the extra characters in there. In particular, when I'm quoting a chunk of C code, it's very important that every character in the original be visible. I also think making the read-time view of the wiki page more like the edit-time view of the wiki page makes it easier to learn how to edit wiki pages. I disagree that using the *bold* /italic/ or _underline_ conventions will cause confusion when the wiki contains source code. Most source code (and ASCII art) has one or more spaces at the start of the line, and some wikis recognize this and disable most formatting so source code (and ASCII art) look fine. -- DavidCary

The _underline_ in hand-written text and pages from a typewriter is semantically identical to /italic/ in printed books. I think both should map to <em>emphasis</em> in HTML. (I thought I saw someone else make the same point at Wiki:RealMenDoNotUnderline, but now I think I must have read that somewhere else ... perhaps http://emich.edu/web_standards_guide/text_style.html ). I currently lean towards saying that *asterisks* should also map to <em>emphasis</em>, but it would be pretty easy to persuade me they should map to <strong>strong</strong> in HTML.

Note that the EmphasisPattern involves two prime numbers (2 and 3). Consider the mathematical benefits of this, such as inverting the emphasis on large blocks of text.

Alex, I think you've identified a big dividing line in wiki, i.e., that a wiki should put a high priority on editing and display of source code. For a wiki devoted to programmers this might be the case, but if most of the users of a wiki are non-programmers and/or they're predominantly typing natural language, it seems that the priority would be for the most natural typing convention, and provide some kind of "specialized" syntax for code (like maybe just that leading whitespace). If the leading argument against (anon)'s question is code, the priorities seem reversed. How often does one really come across code, in comparison to people communicating with bold and italic? This was one of the strangest things for me in writing on a wiki. I've been doing email and usenet since about 1982, and I've always used *bold* and _underline_, as has most everyone I can think of. I have seldom (to my recollection) seen forward slashes used for italics, and I think that would be pretty dangerous, in that slashes are fairly common in normal text, such as forming alternate lists (e.g., "Jerry took the car/boat/train"), writing file paths, and various grammatical constructions. So perhaps for italics, two apostophes would be fine. But the current syntax seems very odd in (seemingly) putting a higher priority on code than on normal communication. I can't tell my users (who are professional writers) to use three apostophes to obtain bold. They'd think I was nuts. "Why not asterisks?" would be their reply. -- MurrayAltheim

It is true that using these symbols may mess up programming source code, but take into account that all source code should be displayed in a monospace font. Taking this into account, we can either dis-allow these formatting commands in monospace formatted text, or create a source code format where the symbols are ignored. - Si Dunford.


Discussion

MeatballWiki | RecentChanges | Random Page | Indices | Categories
Edit text of this page | View other revisions
Search: