[Home]WikiMarkupStandard

MeatballWiki | RecentChanges | Random Page | Indices | Categories

[Japanese translation] | comments removed to ShallowPage#tooBig
This page discusses ways to allow visitors from one wiki engine to edit pages on other wikis without having to learn their WikiSyntax. There are quite a lot of WikiEngines out there by now. Most have a similar, but not identical markup language.

This page just tries to determine a potential convergence target for text markup rules which we call the "BasicSet?". So that most wiki engine authors would be willing to support.

During WikiSym 06 there was a workshop on the WikiCreole which was agreed upon.

CategoryWikiStandard (proposal)

Contents

1. Creating the WikiMarkupStandard
1.1. Suggestion A
1.2. Suggestion B
1.3. General Guidelines for a Future Markup Standard
2. Collecting Existing Markup
2.1. Links
2.1.1. WikiWord Links
2.1.2. Nonwiki Links
2.1.3. Free Links
2.1.4. Titled Links
2.1.5. Links to Anchors
2.1.6. Interwiki Links
2.1.7. Avoiding Linking
2.1.8. Assorted
2.2. Text Formatting
2.2.1. Assorted
2.2.2. Citations
2.2.3. Emphasis (italics)
2.2.4. Bold
2.2.5. Inserted Text (underline)
2.2.6. Deleted Text (strikethrough)
2.2.7. Monospaced Text
2.2.8. Literal/Unprocessed Text
2.2.9. Superscript Text
2.2.10. Subscript Text
2.2.11. Large Text
2.2.12. Small Text
2.2.13. Centered Text
2.2.14. Left Aligned Text
2.2.15. Right Aligned Text
2.3. Line Breaks
2.4. Headings
2.5. Lists and Indentations
2.5.1. Known Bullet Characters
2.5.2. Unordered Lists
2.5.3. Ordered Lists
2.5.4. Indentations / Block Quotes
2.5.5. Definition Lists
2.6. Tables
2.6.1. Cell Attributes
2.7. Horizontal Rules/Separators
2.8. Meta-Wiki
2.8.1. Macros, Variables, Plugins and Extensions
2.8.2. Comments
2.8.3. Processing Instructions and Meta Data
2.9. Character Replacement
3. X/HTML Markup in Wikis
4. Suggested Basic Set
4.1. Basic Set A
4.2. Plan B
4.3. Basic Set B
4.4. TikiWiki RFC
4.5. Heilbronn University Proposal
5. General Discussion

CategoryTextMarkup

1. Creating the WikiMarkupStandard

1.1. Suggestion A

Goal

Process

  1. Collect ideas and existing markup.
  2. Find optimimum markup (not conflicting internally.)
  3. Define a convergence standard that could become an official standard.
  4. Implement parsers and converters for converting existing markup to target markup (the plan is to migrate to the new markup, not to support it additionally, see bloat.)

Problems

1.2. Suggestion B

Goal

Process

  1. Collect existing markup.
  2. Find useful common subset with few exceptions.
  3. Ask wiki engine authors to implement the remaining exceptions.

Problems

1.3. General Guidelines for a Future Markup Standard

2. Collecting Existing Markup

2.1. Links

2.1.1. WikiWord Links

2.1.2. Nonwiki Links

External link
http:// or ftp:// or gopher:// etc.
Email link
name@domain.org or mailto:name@domain.org
File attachment of a page
attachment:filename.doc
File from user upload area
Upload:UserName/filename.doc

2.1.3. Free Links

Inline

Block

What is the difference between a "inline free link" and a "block free link" ? Are there *any* wiki that implement "block free link" ?

free links with square brackets

free links without square brackets

Discussion
Link thoughts: We may have to demand that a wiki supporting WikiMarkupStandard has to support free links - or we will have migration problems. CamelCase links could be emulated by free links, so that we may want them, but not need to demand them. (Alternatively, use WikiNameCanonicalization.)

Space as word separator is very important for usability. A successful wiki is using underscores while displaying spaces.

There are are only two theoretical types of free links: external framed (LEFT_DELIMITER free text RIGHT_DELIMITER) and internally linked (HANGING_DELIMITER? free INTERNAL_LINKAGE text HANGING_DELIMITER?). External frames have the advantage of making phrases easy to link; internal links have the advantage of putting architectural pressure to make smaller link titles.

2.1.4. Titled Links

Inline

Block

Discussion

May I cast a vote (put forth an argument) supporting a link format with the title first? Only so that, in an ordinary text editor, I can sort a list of titled (or untitled) links alphabetically with minimal jumping through hoops. I would point out that, if you want links ordered alphabetically, you want that done on the basis of the text that a reader sees, i.e., the title, not the actual link URL. -- RandyKramer

2.1.5. Links to Anchors

2.1.6. Interwiki Links

Inline

Block

2.1.7. Avoiding Linking

See also the section Literal/Unprocessed Text below.

Method 1: Enclosing Markup

Inline

Block

SGML/XML Markup

Method 2: Breaking the Link Pattern

Method 3: Escape Character

2.1.8. Assorted

WikiWord link to anchor/section
WikiWord#anchor
Link to other language
[languagecode:pagename]
Link to other language and namespace
[languagecode:Namespace:pagename]
Absolute link to subpage
WikiWord/SubPage
Relative link to subpage
/SubPage

Discussion
I think that parsing http://... links could be harmful.

Also enforcing short links makes them low quality, too:

-- KornelLesinski

2.2. Text Formatting

2.2.1. Assorted

Paragraphs
Blank lines separate paragraphs.
Strong emphasis
[+example text+]
Very strong emphasis
[++example text++]

Hilighted text
##example text##
Notes
[example text]
Reversed background color text
[rev example text]
Red text
{r}example text{/r} or <r>example text</r>
Green text
{g}example text{/g} or <g>example text</g>
Blue text
{b}example text{/b} or <b>example text</b>
Colored text
{#FFFFFF}example text{/#}
Justified text
<>( example )

2.2.2. Citations

2.2.3. Emphasis (italics)

Inline

Block

2.2.4. Bold

Inline

Block

2.2.5. Inserted Text (underline)

Inline

Block

2.2.6. Deleted Text (strikethrough)

Inline

Block

2.2.7. Monospaced Text

Inline

Block

2.2.8. Literal/Unprocessed Text

Inline

Block

SGML/XML Markup

2.2.9. Superscript Text

Inline

Block

2.2.10. Subscript Text

Inline

Block

2.2.11. Large Text

Inline

Block

2.2.12. Small Text

Inline

Block

2.2.13. Centered Text

Inline

Block

2.2.14. Left Aligned Text

Block

2.2.15. Right Aligned Text

Block

2.3. Line Breaks

2.4. Headings

Method 1

A sequence of heading characters at the beginning of a line indicates heading level.

= Heading 1
== Heading 2
=== Heading 3

Argument against
Less important titles stand out more.

! Heading 1
!! Heading 2
!!! Heading 3

Argument for
Intuitive. Exclamation point says: here's something important.
Argument against
Less important titles stand out more.

- Heading 1
-- Heading 2
--- Heading 3

Argument against
Less important titles stand out more.
Argument against
Double hyphens at the beginning of a line may also introduce a signature.

Method 2

A sequence of heading characters at the beginning and end of a line indicates heading level.
= Heading 1 =
== Heading 2 ==
=== Heading 3 ===

Argument for
Intuitive. Looks like a banner.
Argument against
Less important titles stand out more.

-= Heading 1 =-
-== Heading 2 ==-
-=== Heading 3 ===-

Argument against
Forces user to count the correct number of characters twice.
Argument against
Less important titles stand out more.

Method 3

Rule: a sequence of heading characters at the beginning of a line indicates heading level, and any heading characters after the title are ignored.

= Heading 1 =======================================
== Heading 2 ===================
=== Heading 3 ===

Basically the number of heading characters at the end is ignored, as long as there is at least one.

Argument against
While more important titles may stand out, they don't have to. It would be nice if this rule was enforced.

Method 4

Rule: a line of text with all-capitalized words.

Any Line Of All Capitalized Words Becomes A Heading

Argument for
Clever.
Argument against
Actual titles don't have all capitalized words.
Argument against
Not feasible for most non-English languages.

Method 5

Rule: headings are underlined (or over-and underlined) with a printing nonalphanumeric character. The underline/overline must be at least as long as the title text.

=============
First Heading
=============
Second Heading
~~~~~~~~~~~~~~
Third Heading
-------------

Argument pro
Important titles stand out more.
Argument against
Hard to use with proportional fonts. A possible fix is to just require a minimum amount of underlining (eg. four characters).

Method 6

Rule: heading characters plus a number indicate heading level.

---+1 Heading 1
---+2 Heading 2
---+3 Heading 3

Method 7

Rule: heading characters plus additional sequence of characters to indicate heading level.

---+ Heading 1
---++ Heading 2
---+++ Heading 3

Argument against
:Too much markup.
Argument pro
Less important titles stand out more.

Method 8

Rule: Change of bullet character indicates change of level

* Heading 1 *
+ Heading 2 +
* Heading 1 *
- Heading 2 -
+ Heading 3 +

Method 9

Rule: Number of heading characters indicates level importance. The highest number of heading characters is heading 1, the second highest is heading 2, etc.

==== Heading 1
=== Heading 2
== Heading 3
= Heading 4

Argument pro
Important titles stand out more.
Argument against
There must be a maximum number of levels.

Miscellaneous

= # enumerated heading text =

2.5. Lists and Indentations

2.5.1. Known Bullet Characters

2.5.2. Unordered Lists

Method 1

Rule: a sequence of bullet characters indicate level.

* level 1
** level 2
*** level 3
000 also aligned with third level, but no bullet
... also aligned with third level, but no bullet

Argument for
easier to parse.
Argument against
less intuitive.

Method 2

Rule: a sequence of spaces followed by a bullet character indicate level.

* level 1
 * level 2
  * level 3
    * level 5

Argument against
Counting spaces, like other invisible characters, is not user friendly.

Method 2

Rule: an indent followed by a bullet character indicate level.

* level 1

     * level 2
            * level 3
             * level 4

Method 3

Rule: A change of bullet character indicates level change. Indetation optional.

* level 1
- level 2
+ level 3
- level 2 again
@ level 3
+ level 4

2.5.3. Ordered Lists

Method 1
Rule: a sequence of ordered list characters indicate level.

# level 1
## level 2
### level 3
#3 restart numbering from 3

> level 1
>> level 2
>>> level 3

0 level 1
00 level 2
000 level 3

Method 2
Rule: a sequence of spaces followed by an enumerator indicate level.

1. level 1
 1. level 2
  1. level 3
1.#3 restart numbering from 3

1) level 1
 2) level 2
  3) level 3

Method 3
Rule: Each level has it's own numeration.

1. level 1
1.1 level 2
1.1.1 level 3
1.2 level 2

2.5.4. Indentations / Block Quotes

Method 1
Rule: a sequence of spaces indicates indentation level.

outer
 indent 1
  indent 2

Method 2
Rule: a sequence of indentation characters indicates indentation level.

outer
: indent 1
:: indent 2

outer
> indent 1
>> indent 2

2.5.5. Definition Lists

Method 1
;Term: Definition

   $ Term: Definition

Method 2
Term:: Definition

Method 3
Term:
    Definition

2.6. Tables

Method 1: Sequence of Rows

Method 2

[| col 1, row 1 || col 2, row 1 ||
|| col 1, row 2 || col 2, row 2 |]

Method 3: Drawing Boxes

+------------+------------+-----------+
| Header 1   | Header 2   | Header 3  |
+============+============+===========+
| body row 1 | column 2   | column 3  |
+------------+------------+-----------+
| body row 2 | Cells may span columns.|
+------------+------------+-----------+
| body row 3 | Cells may  | - Cells   |
+------------+ span rows. | - contain |
| body row 4 |            | - blocks. |
+------------+------------+-----------+

Argument against
Very hard to parse and takes a lot of effort to type.

|---------------------|
| Header 1 | Header 2 |
|=====================|
| Column 1 | Column 2 |
|---------------------|

Method 4

=====  =====  ======
   Inputs     Output
------------  ------
  A      B    A or B
=====  =====  ======
False  False  False
True   False  True
False  True   True
True   True   True
=====  =====  ======

Method 5: Definition Tables

Term 1 |
   Definition 1 begins here.
   Term 1.1 |
      Definition 1.1
   Term 1.2 |
      Definition 1.2
   This is part of definition 1.
Term 2 |
   Here's definition 2.

Method 6: Wiki-pipe Syntax

{|
!heading 1 !! heading2
!heading 3
|-
|text 1a || text 2a || text 3a
|-
|text 1b
|text 2b
|text 3b
|}

Method 7: Relational

[[Table][Seperator=;]
[Columns=Person,Height,Weight]
Person=Person; Height=Height; Weight=Weight
Person=Peter; Height=180; Weight=84
Person=Martha; Weight=52; Height=167
]

Method 8: Double commas

(Here's how it's done in SdiDesk)

Without headings

a,, b,, c
1,, 2,, 3
4,, 5,, 6

With heading

a,, b,, c
____
1,, 2,, 3
4,, 5,, 6

Argument for
Very quick and easy to enter
Argument against
Can confuse people who think that double comma implies an empty cell
Argument against
gets confusing to edit a wide table. (But this is true of most alternatives)
Argument against
precludes double-commas for other purposes (like what?)

2.6.1. Cell Attributes

Cell Attribute Specification

Cell Attributes in Use
Top alignment
{t} or <^>
Bottom alignment
{b} or <v>
Column spanning
{w=number} or <-number>
Row spanning
<|number>
Border width
{Tb=number}
Cell class
{C=string}
Cell style
{s=string}
Cell width
<100%>
Background color
<#XXXXXX>

Miscellaneous

|<<END|<<END|
col1 text is here
END
col2 text is here
END

2.7. Horizontal Rules/Separators

Discussion

Having four as a minimum is totally arbitrary. Parsing is not ambiguous as a separator should begin and end with a newline. Thus the only possible ambiguity is for the reader in that a small separator could get "lost" in the document. -- IanBollinger?
One and two could not be the minimum for obvious reasons. Three cannot for I've seen at least one Wiki that uses --- for an em dash, and some word processors do as well. So four hyphens is the minimum number that can safely be chosen.

2.8. Meta-Wiki

2.8.1. Macros, Variables, Plugins and Extensions

2.8.2. Comments

SGML/XML Markup

2.8.3. Processing Instructions and Meta Data

2.9. Character Replacement

 ** Argument against: some English abbreviations begin with single right quotes ("I said 'e would").

3. X/HTML Markup in Wikis

(Not including wikis that are too lazy to restrict the use of HTML at all, which is inherently insecure.)

4. Suggested Basic Set

4.1. Basic Set A

We are still in the idea collecting phase, so there is nothing here yet.

4.2. Plan B

The "original" Basic Set B is smaller than the one below.

You can find it on CommunityWiki:MarkupStandardPlanB.

4.3. Basic Set B

Internal CamelCase link
WikiWord
Internal free link
[[free link]]
External link
URL or [[URL][text]]
Paragraph
empty line separates paragraphs
Emphasis (usually italics)
''emphasized words''
Strong emphasis (usually bold)
'''strong emphasis'''
Headings
== Headline text ==, use more equal signs to get lower level headlines
Lists
Use number of asterisks, no leading space
Horizontal line (separator)
----
Indenting
:+<text>
Description lists
;hello: world
Line break
\\
Wiki escape
<nowiki>...</nowiki>

4.4. TikiWiki RFC

TikiWiki tried to publish their syntax as an IETF RFC; cf. http://tikiwiki.org/RFCWiki

4.5. Heilbronn University Proposal

The Heilbronn University is leading the Wiki Markup Standard Workshop at WikiSym 2006. More details: http://www.i3g.hs-heilbronn.de/Wiki.jsp?page=WikiMarkupStandard

Our recent discussion about WMS can be found at HeilbronnWMSDiscussion.

Discussion

This is like a minimal UseMod/OddMuse set, and doesn't use significant whitespace except for the empty line separating paragraphs. Note that working URL links mean that mailto:alex@emacswiki.org will be a valid link.

I suggest backslash to be only escaping character. Backslash as first character of line = ignoring wiki markup on whole line. Backslash immediately before markup = ignore that markup. Simple, short and standard (\ is escape char in almost all programming languages)

\''no emphasis here\'', and ''thats emphasis with \'' in it''
\ this line is like in <pre>
normal line  without line-break


Um, No, either that first line would be whole-line escaped or the second would not. --Kevin D. Keck

Then perhaps it should read "backslash-space as first pair of characters of line = ignoring wiki markup"? -- ChrisPurcell


5. General Discussion

Most users don't even know wiki yet at all, so it is the task of wiki authors to agree on a standard soon.

It unnatural if you have to speak 5 different wiki markup languages to discuss 5 topics on 5 different wiki engines, so it is less problematic to migrate to a common basic markup standard than to keep multiple markups.

Imagine the web without having a (mostly) common HTML markup language. -- Anonymous


Clearly having a common tongue (as English seems to serve for *some* of the web) for Wikis would make sense however, and make life a lot easier - especially if it gets promoted as an "official second language" on multiple wikis.

Ask yourself:

These are by definition the hardest aspects to deal with (all of which I've seen used in wikis) - and whilst clearly not suitable for being in the core of a minimalist markup, there is a need to be inclusive towards these desires rather than exclusive.

Personally, I'm led to two main conclusions:

  1. Any minimalist wiki markup should be very, very minimalistic and should also provide a method for extending the syntax and indicating this.
    1. If that would be the case, this would be completely hopeless. I hope most wiki authors could agree on such a basic syntax as that really should be in the basic set (not sure about underlining though as this might be confused with links).
  2. Wiki markup could become separated from storage - after all if we allow users to speak their local lingo to a wiki (which might be WikiStandardMarkup?, rather than Usemod, MoinMoin, TWiki Wiki, etc). Like SunirShah this to me implies at least a partial parser based model rather than plain replacement

These obviously aren't mutually exclusive - so I'm encouraged to see this page. (It's not the only page of this kind however - when I get a chance I'll dig up references to others). However the syntax presented is almost entirely different to the syntaxes I use at present!

To give a flavor of the problem however for just linking to content:

-- MichaelSparks?


So still nobody has agreed on a common markup. Personally I strongly share the opinion, that the Wiki idea will never get close to WorldDomination?, if there isn't even the most basic set of markup users could assume to work in every WikiWare?. But also I believe it is too arrogant to call it WikiMarkupStandard in respect to the multitude of existing implementations (and the authors of each probably had good reasons to choose a different markup set). Eventually this page should be renamed to StandardWikiMarkup? and document just that, instead of all available markup variations.

Then it would be possible to register a WikiTextMimeType (text/wiki) with IANA and the IETF. This finally gets us a bit closer to the InterWiki idea by providing the WikiWorld with a standard similar to the base of the WWW (namely text/html). -- MarioSalzer?


I've been thinking a lot on this topic as well and the only insight that I've come up with so far is that maybe there should be in effect three different Wiki formats--the first is oriented toward ease of entry, and actually works with most Wiki variant TextFormattingRules all at the same time! Then the system parses that text into a standard text format (which I'm calling canonical text, or CanText?), which is also editable as text, but is very readable as is, but maybe not quite as easy to enter from scratch. One interesting principal is that if canonical text is run through the wiki text filter, the result will be exactly equal to the input. Then there is finally the final html markup. -- ChristopherAllen


Speaking of which, my ideas are embedded in the InfiniteMonkey parser ([script]) where I break down syntax types by the functional forms of the parser. Blocks are the fundamental part, which are broken down into line & paragraph parsing, then aggregate blocks like lists and tables. Links are also special. cf. WikiParserModel. We will likely have a workable "standard" by the end of the year for those who wish to follow it. -- SunirShah


It seems impossible to define one-size-fits-all wiki markup that everyone will agree (still, of course, standard should be made. authors may later agree with it).

I think every WikiEngine should be able to transform its markup to standarized DocumentModel? - a HTML-like structure of paragraphs, lists, sections, headings, etc - which could be stored as XML and loaded into another WikiEngine. This would allow relatively easy conversion between any flavor of Wiki markup.

-- KornelLesinski


Requiring arbitrary sequences of identical characters in markup seems like a very bad idea to me--more than three is probably too many. (How many letters are in this sequence: 'lll'? How many in this one: 'llllll?')

Also, some of the formatting options seem superfluous. Underline, for instance, is bad typography, and is confusing because only links should have underlines. The <b>, <i>, <tt>, <small> and <big> tags were removed from the XHTML 2 specification for good reason, as <u> and <s> were from the HTML 4 specification. -- IanBollinger?


I've been involved in standards work for the past decade, so that argument isn't anything I'd disagree with. But in the case of some of the things I've helped standardize, there was a ready audience that wanted it standardized. I'm not sure that the wiki community would actually want one standard by which they were all to follow. Put it this way: if next month a wiki standard showed up, how many wiki would dump what they have now (both pages and supporting software) and go with the standard? I think a more viable "standard" would be one for interchange, which is a different tack and doesn't require anyone to alter the primary syntax (and software) they use, only suggests a method(ology) for interchanging wiki content. I do still advocate a wiki-wide "standard" for identifying wiki syntaxes (the !#wiki idea) because those who wish to self-identify need only make a very minor change. I'd not suggest anyone adopt an entirely new syntax. I don't think very many would do that. (Esperanto as an interchange language) -- MurrayAltheim


Who said they'd have to dump everything? There can be transition phase when both standard and custom syntax can be used. You could automatically convert pages.
I found this page because I started to create my own wiki and I wanted to be compatible with something (and more familiar for users). Since there is no standard :(recommendation) I'm left alone with making yet-another incompatible wiki derivative
-- KornelLesinski


I get myself extremely frustrated by the differences in markup between different wikis, but I'm also involved in standards work (nntp, and usenet format), and it can be even more frustrating to create a standard that has a chance of being actually adopted (nntp is doing it, hell might freeze over before USEFOR does). From what I have seen of the wiki world, I concur with Murray that it's just very unlikely to happen, and think the effort would be best spent on interoperability, easy import, and conversion, parts of which are above referred to as "markup babel fish" or "markup skins". But if you want to proceed with this, your best chance is to do it via an [IETF] standards process to which you want to invite all those wiki authors who'll need to implement that standard. --AlixPiranha


I'd like to see people use some sort of CSS markup instead of inventing HTML2-style markup extensions for Wiki. -- MarioSalzer?


It might be useful to categorize the percentage of use of categories. Bold is probably something used on like, 90% of pages. Italics, maybe 70%, Underline, 10%. This is just off the top of my head based on intuition. Regardless, they're the same category of markup. Color and font type/size changes strike me as ancillary markups. They sometimes serve a purpose of emphasis or clarification, but aren't generally needed over and above the basics. Of course, you could suggest the reason people don't use these extended markups is because they aren't readily available. But I imagine the majority of users come from a paper writing/word processing background where one doesn't typically make use of these effects. So it seems to me that it would make sense to make those "harder to use" in the interest of keeping the standard cleaner and more consistent.
-- JerryHsu


Maybe I could clean this up and make a mock standards document out of it? The TWiki people have done the same and I'm not too much of a fan of that syntax.

What seems best is decide on one syntax for each operation, put it here, and then put the rest and discussion into a separate page. This would be much easier on us implementors :). -- RyanNorton

Can anyone provide a link to that TWiki (mock) standards document? --RandyKramer

Actually its Tiki, http://tikiwiki.org/tiki-index.php?page=RFCWiki -- RyanNorton


Is the Tiki work Ryan refers to fairly well included in the material that has now accumulated here?

One aspect of a markup standard that appears to not yet have been considered is the specific intent of the use of the wiki technology. For example:

By recognizing that all wikis produce html and many can save that (rather than just displaying it) it becomes practical to use a wiki software's editing and display functions separately, at different times, which reduces my concerns about the use of different markups. In effect, I can use a Personal wiki (that make it possible for me to choose my 'personal markup'. As long as the Personal Wiki produces standard HTML (pretty well assured for any wiki that expects a Browser to do its presentation) than all that may be needed is an HTML2MyMarkUp conversion utility.

-- HansWobbe


May I suggest another way to look at some of this—how about looking at what we (I?) would like to achieve with wiki markup? Here are some things I'd like to be able to do that have not been possible / easy in the wikis I've looked at or used:

Asides:

-- RandyKramer

Formation of an IETF WorkingGroup?

See also WikiMarkupStandardWorkingGroup for the mailing list supplementing this discussion.

Proposal for page refactoring

This is a solid BarnRaisingNomination. Start with low-hanging fruit until the job is done. -- SunirShah

Dissenting Opinion(s)

Moved to WikiMarkupStandardIsMisguided.

When did DisagreeByDeletion? become normal behaviour on MeatBall ? -- MichaelSamuels

What deletion are you referring to? If you are referring to discussion being moved to a separate page, then I did that as an effort to reduce the size of this page. Since this page's topic is on how best to implement a wiki markup standard and not why a wiki markup standard may not be the best idea, it made sense to move that discussion to its own page. Perhaps the title I chose for this page offended you and for that I apologize but I could not think of a better one at the time. -- IanBollinger?

I didn't understand the context of the edit (indeed I didn't see the original edit), and as a result it looked like DisagreeByDeletion? . I have no objection to shrinking things (refactoring GOOD :), I obviously disagree with the level of the change, but I'm not about to undo your work :). Sunir let me know that the discussion had picked up again recently (which is also good :) and I'm currently thinking of how best to put forward my thoughts on the current efforts in a positive way. Noticing the deletion just made me rather surprised, that's all. I probably could've phrased the question better - apologies!

As for the title, I think it's fairly accurate, though I would probably have said WikiMarkupStandardSyntaxIsMisguided? is more accurate. shrug :) -- MichaelSamuels

Practical Considerations for Wiki Programmers

I've implemented several WikiParsers? and also created several levels of complexity of WikiMarkup? for various CMS. |I'm currently involved in creating a simplified, basic parser for the Wikipedia database.

Personally (from a programmers/practical POV) I'd proceed as follows, create a ancillary markup of the final html output, such as enclosing the rendered html in <div id="Wikicontent">...</div>. This would be the first step of letting others extract the html result from a rendered WikiPage without choking on other page elements.

Secondly create reverse parsers for each wiki that turn the html back to the respective wiki markup. at this stage you might find some unreversible/ambiguous markup which needs to be changed/disambiguated.

At this stage you will have a basic interchange format. After all, html exists as a well(?!) defined standard, and most wiki engines can already convert wiki markup to html.

Another advantage of using the html as interchange format is that any wiki markup that might be supported by one wiki and not by another will become portable.

Many wikis have some form of support for basic html syntax. Supporting the html versions of otherwise unsupported wiki markup is trivial. For example if a wiki supports underline syntax by turning "_underlined words_" into "<u>underlined words</u>" and the next wiki does not recognise the <u> tags, it will simply keep the html syntax and thus preserve the document.

At the same time, any security concerns, such as otherwise unsupported html entering the wikis document space can be allayed by treating any non-converted, remaining html the same as html entered by a user. This might for example strip the <u> tags form my example and just leave "underline word" in the final wiki text on the target wiki.

At this stage we can start investigating the spread and breadth of markup. For example using sample bodies of wiki pages and various engines it will be trivial to create statistics about the spread of particular markup and also to see which important markup is split into majorities among the compared wikis.

This will show a practical way forward regarding the basic wiki markup set. It can also serve as a guide to those looking to implement wiki parsers for wikis or CMS systems.

I think starting to discuss a standard markup set in isolation, without a study of the practical considerations and realities is difficult. I've always seen wiki as a practical solution to a practical problem, rather then a theoretical ideal to a narrow purpose.

I have many times rebuilt a basic wiki syntax set according to the best of my memories of my initial experiences with C2, and have managed to create multiple related syntaxes, incompatible with itself.

In addition, Wikis may go a similar way to html. If we offer raw wiki text output, and a browser site extension to process it (similar to RSS or FTP support), this may encourage more versatile parsers that, similar to html browsers, can interpret multiple wiki markups in a best effort basis. this may not be desirable, but it might be necessary for the evolution of wiki markup. End of uncontrolled rant. -- Wiki:SvenNeumann

The WikiGateway library provides getPage and getPageHTML functions to retrieve the wiki markup and the HTML associated with the content of a given wiki page. Currently WikiGateway supports UseMod, MoinMoin, and OddMuse, but the plan is for it to eventually support a lot more -- perhaps it would be of use to you in extracting text and HTML from wiki pages.

Also, if you write Python screenscraping routines for MediaWiki, and would like to contribute them to WikiGateway, I'd be interested :)

-- BayleShanks


See also MetaWeb?:Wikitext_standard http://www.metaweb.com/wiki/wiki.phtml?title=Wikitext_standard .


Another critique/proposal: WikiCoreAstStandard


From WikiSym it seems that few developers are interested in the topic of a WikiMarkupStandard because almost everyone expects to move to WYSIWYG / exchange format, so that markups loose most of their importance. -- HelmutLeitner


Discussion

MeatballWiki | RecentChanges | Random Page | Indices | Categories
Edit text of this page | View other revisions
Search: