The block-quoting you mention is exactly the point of such a common exchange vocabulary: you standardize the XML, and can have highly differing wiki markups. And XML allows you to ignore tags you do not know, further improving independant development.
Each wiki software would be able to export (and optionally, import) pages in XML form. Which would allow the exchange between the diverting systems, and also allow alternative formatting (like putting the XML into XSLT and then into XSL:FO resulting in a PDF wiki page). Once you enter the XML world, any stuff is possible (like XLinking wiki pages). -- JürgenHermann
By the way, just to be a party pooper, the MeatballWikiCopyright is written as such to preclude the exporting and transferring of material. Most wikis will be the same because of the default rights affording to authors by law. Bummer. That being said, translating the wiki markup into something other tools can read in is a good thing. I'm definitely interested. -- SunirShah
By all means, if you want to experiment with it, go ahead! --ss
Since DTDs are crappy, it may be better to switch to XmlSchema?.
I'm pondering whether future versions of UseMod should use XML to store wiki pages. Currently, the actual files stored on disk use an obscure file separator character. Granted, future usemods may use MySQL?, but not requiring other apps is one of usemod's strenths, I feel. - tarquin
[Aside] Why not use a modified RFC 822 format? That is easier and faster to parse than XML, plus it requires no extra XML modules (which may not be installed).
(People who know me probably expect this kind of thing from me. I've generally withheld my DTD rants over the past three or four years, as life goes on, but I thought I'd try out my lungs to see if I've gotten rusty.)
It's rather a popular thing to rant about XML and DTDs (and I've heard complaints about the latter for over a decade, so it's a longstanding fashion), but they both do exactly what they were designed to do. If they don't work for one's purposes, perhaps a different technology is more suitable. DTDs were designed to constrain the markup of an SGML or XML document, not its content. They were designed to be human readable and editable. They were by design also not instance markup. By comparison, XML Schema was not meant to be human readable, was by design to use instance markup, and also to constrain the content of document instances. XML Schema has a whole lot of features that DTDs don't have, but it also has a whole host of problems that DTDs don't have too. If you're designing a markup language, DTDs are usually an appropriate schema language. XML Schema is better for constraining the content of fairly rigid content, such as industrial or commercial transaction documents, e.g., those with a need to state that a specific element must contain a date or dollar value within a certain range, etc. This is wwaaayyyy overkill for design of a markup language, and it's also very difficult for people to read the XML Schema to understand the legitimate structures allowed by that schema. It all must be done with tools.
Now, for purposes of designing a markup language, hundreds of industry-quality markup languages have used DTDs. DTDs have a proven track record going back over 20 years. There just happens to be a number of influential staff members at the W3C who don't like DTDs and have done what they can to make them go away.
If you're planning to create an XML markup language, I can state from a great deal of experience (both my own and that of many people I've worked with) that it is a HUGE undertaking. I'd recommend reading (almost cover to cover) Eve Maler's book [Developing SGML DTDs], which talks in great detail about the processes that one must go through to develop a DTD. (These are processes that one would go through in an XML Schema too. Or not. But the world will bite your ass if you don't.)
Writing a DTD is the easiest part of the process. I've suggested before that it would be much better to use an existing markup language if at all possible, or modify one by removing or adding features as necessary. But from scratch, it's a terrifically difficult thing to do, or at least to do well. In a distributed community such as wiki, I don't quite follow how it could be done well. I've done it in committee, with people paid to spend time, and it's hard work.
As for using XML Schema, there's no good to come out of it for wiki, AFAIK. It's not really an appropriate technology for the purposes I see here. It's much slower to parse an XML Schema document than it is a DTD (they're more verbose), they're harder to debug, and they're only (reasonably) created by schema tools, not by hand. Know of any good XML Schema tools? I don't.
One thing that comes to mind when reading about people's ideas on this subject (i.e., interwiki) is that the intermediate format could take a cue from nsgmls, the SGML parser from James Clark (who is considered about as good as it gets in the SGML/XML community. Some people think he's a god.). nsgmls is an SGML and XML parser that has an [output format] called "Element Structure Information Set" (or ESIS) that is extremely efficient, very easy to parse, and could make a good intermediate format. If you're on a linux machine, nsgmls is usually part of your distribution, so you can just type nsgmls and the filename of an HTML file, and if nsgmls can locate the DTD, it'll dump ESIS to sysout. Rather than dump a sample here, I've posted one [here].
Now, I've already provided my own suggestion in the form of the [IWML] DTD, but if the XML route isn't the one for an intermediate format, I'd suggest something like ESIS (or a suitable variant, at that level of primitive, not trying to do anything except capture the lexical output of the syntax for further processing at a higher level). So one just captures the parse output and translates that directly to whatever output format is necessary. I recently suggested on the InterWiki mailing list using Radeox' markup property files to define an input and output format as wiki text, translating between two wiki text syntaxes directly (with no intermediate format at all).
I think it's a mistake to try to capture the "semantics" of wiki text because it's by nature unknowable, unless by "semantics" one is suggesting something like "By three apostrophes I mean bold". (IMO, "semantics" easily wins the award for most overused, misunderstood, and abused word of the decade. The Semantic Web? A good marketing term.)
No discussion involiving choosing which schema language should be used would be complete without mentioning [Relax NG]. [Here] is an article discussing some of the benefits of Relax NG over XML Schema. This should be seriously considered.
Yes, RELAX NG is really the state of the art in XML schema languages. It's easier to use than DTDs and XML Schema (particularly in the compact syntax), but more powerful. In addition, [Trang] offers easy conversion from RNG to both DTD and XML Schema. As far as I'm concerned, there's no reason to use DTD's today, much less to design with them.
-- Bruce D'Arcus