[Home]WikiXmlDtd

MeatballWiki | RecentChanges | Random Page | Indices | Categories

I'd like to have a DTD that contains things like

 <rule/> for ----
 <wikiname>WikiName</wikiname> for WikiName
Anyone interested? -- JürgenHermann

Interesting. Would this just be for this wiki, or for the interchange of page data between wikis?

Potential problem for the latter: some wikis (well, many) don't support the block quoting feature we have here. For instance, Wiki:SimulatingQuoteBlocks explains how to, um, simulate them. Subsetting the mark up may destroy the renderings of some pages. Would we have to eventually "standardize" the markup? That would destroy the point of spawning things like CLiki, or even our own RFC links which are useful in our context. I think these concerns are what caused the death of Wiki:InterWiki. I think our InterWiki is much easier. ;) -- SunirShah

The block-quoting you mention is exactly the point of such a common exchange vocabulary: you standardize the XML, and can have highly differing wiki markups. And XML allows you to ignore tags you do not know, further improving independant development.

Each wiki software would be able to export (and optionally, import) pages in XML form. Which would allow the exchange between the diverting systems, and also allow alternative formatting (like putting the XML into XSLT and then into XSL:FO resulting in a PDF wiki page). Once you enter the XML world, any stuff is possible (like XLinking wiki pages). -- JürgenHermann

I'm interested in this, but I have some concerns. The biggest problem I see is that the wiki links are strongly incompatible with each other. The major division is between the WikiName style (internal capital letters) and the [some link here] style (bracketed/delimited links which often include spaces and/or punctuation). Even within the WikiName-style wikis there is a lot of variation: JürgenHermann is not a link in many wikis (because of the ü character). Without working links, a wiki is not very useful.

Also, I only want to use "standard" Perl modules in the default UseModWiki package. CGI.pm is standard, but I'm not sure that useful XML modules are standard. I could write separate XML-import/export, but then I'd be duplicating code. (The UseModWiki code changes quickly, so this would be a problem.)

In any case, I'd be interested in your work. Even if XML doesn't help in exchanging pages it still might be useful within a wiki for things like advanced markup or searching. --CliffordAdams

By the way, just to be a party pooper, the MeatballWikiCopyright is written as such to preclude the exporting and transferring of material. Most wikis will be the same because of the default rights affording to authors by law. Bummer. That being said, translating the wiki markup into something other tools can read in is a good thing. I'm definitely interested. -- SunirShah

Tools don't violate copyrights, people do. So much for that matter. :))

The different WikiName styles can be handled by the ex-/importer and by XML's ID/IDREF mechanism.

If we agree that a common XML DTD is a valuable thing to have, I will start to explore the matter in MoinMoin and you can see what you'll do. Clifford's problem with the limitations of Perl <grin> can be solved by having it optional, or in external tools. -- JürgenHermann

Check out the XML generated by OpenWiki. -- LaurensPit

By all means, if you want to experiment with it, go ahead! --ss


Since DTDs are crappy, it may be better to switch to XmlSchema?.

XmlSchema? is disgusting and undreadable, but yeah.


For Wikis that use [alternate linking schemes] some form of munging could be done; ideally, any Wiki clone would implement the Interwiki mechanism, and pages from the Foo Wiki would use the Interwiki linking scheme. --SteveWainstead?


I'm pondering whether future versions of UseMod should use XML to store wiki pages. Currently, the actual files stored on disk use an obscure file separator character. Granted, future usemods may use MySQL?, but not requiring other apps is one of usemod's strenths, I feel. - tarquin

[Aside] Why not use a modified RFC 822 format? That is easier and faster to parse than XML, plus it requires no extra XML modules (which may not be installed).

XML-sucks-but-you-have-to-use-it-anyway.


Counter Rant in Defense of XML DTDs

(People who know me probably expect this kind of thing from me. I've generally withheld my DTD rants over the past three or four years, as life goes on, but I thought I'd try out my lungs to see if I've gotten rusty.)

It's rather a popular thing to rant about XML and DTDs (and I've heard complaints about the latter for over a decade, so it's a longstanding fashion), but they both do exactly what they were designed to do. If they don't work for one's purposes, perhaps a different technology is more suitable. DTDs were designed to constrain the markup of an SGML or XML document, not its content. They were designed to be human readable and editable. They were by design also not instance markup. By comparison, XML Schema was not meant to be human readable, was by design to use instance markup, and also to constrain the content of document instances. XML Schema has a whole lot of features that DTDs don't have, but it also has a whole host of problems that DTDs don't have too. If you're designing a markup language, DTDs are usually an appropriate schema language. XML Schema is better for constraining the content of fairly rigid content, such as industrial or commercial transaction documents, e.g., those with a need to state that a specific element must contain a date or dollar value within a certain range, etc. This is wwaaayyyy overkill for design of a markup language, and it's also very difficult for people to read the XML Schema to understand the legitimate structures allowed by that schema. It all must be done with tools.

Now, for purposes of designing a markup language, hundreds of industry-quality markup languages have used DTDs. DTDs have a proven track record going back over 20 years. There just happens to be a number of influential staff members at the W3C who don't like DTDs and have done what they can to make them go away.

If you're planning to create an XML markup language, I can state from a great deal of experience (both my own and that of many people I've worked with) that it is a HUGE undertaking. I'd recommend reading (almost cover to cover) Eve Maler's book [Developing SGML DTDs], which talks in great detail about the processes that one must go through to develop a DTD. (These are processes that one would go through in an XML Schema too. Or not. But the world will bite your ass if you don't.)

Writing a DTD is the easiest part of the process. I've suggested before that it would be much better to use an existing markup language if at all possible, or modify one by removing or adding features as necessary. But from scratch, it's a terrifically difficult thing to do, or at least to do well. In a distributed community such as wiki, I don't quite follow how it could be done well. I've done it in committee, with people paid to spend time, and it's hard work.

As for using XML Schema, there's no good to come out of it for wiki, AFAIK. It's not really an appropriate technology for the purposes I see here. It's much slower to parse an XML Schema document than it is a DTD (they're more verbose), they're harder to debug, and they're only (reasonably) created by schema tools, not by hand. Know of any good XML Schema tools? I don't.

ESIS

One thing that comes to mind when reading about people's ideas on this subject (i.e., interwiki) is that the intermediate format could take a cue from nsgmls, the SGML parser from James Clark (who is considered about as good as it gets in the SGML/XML community. Some people think he's a god.). nsgmls is an SGML and XML parser that has an [output format] called "Element Structure Information Set" (or ESIS) that is extremely efficient, very easy to parse, and could make a good intermediate format. If you're on a linux machine, nsgmls is usually part of your distribution, so you can just type nsgmls and the filename of an HTML file, and if nsgmls can locate the DTD, it'll dump ESIS to sysout. Rather than dump a sample here, I've posted one [here].

Now, I've already provided my own suggestion in the form of the [IWML] DTD, but if the XML route isn't the one for an intermediate format, I'd suggest something like ESIS (or a suitable variant, at that level of primitive, not trying to do anything except capture the lexical output of the syntax for further processing at a higher level). So one just captures the parse output and translates that directly to whatever output format is necessary. I recently suggested on the InterWiki mailing list using Radeox' markup property files to define an input and output format as wiki text, translating between two wiki text syntaxes directly (with no intermediate format at all).

I think it's a mistake to try to capture the "semantics" of wiki text because it's by nature unknowable, unless by "semantics" one is suggesting something like "By three apostrophes I mean bold". (IMO, "semantics" easily wins the award for most overused, misunderstood, and abused word of the decade. The Semantic Web? A good marketing term.)

-- MurrayAltheim


No discussion involiving choosing which schema language should be used would be complete without mentioning [Relax NG]. [Here] is an article discussing some of the benefits of Relax NG over XML Schema. This should be seriously considered.

-- ToivoLainevool

Yes, RELAX NG is really the state of the art in XML schema languages. It's easier to use than DTDs and XML Schema (particularly in the compact syntax), but more powerful. In addition, [Trang] offers easy conversion from RNG to both DTD and XML Schema. As far as I'm concerned, there's no reason to use DTD's today, much less to design with them.

-- Bruce D'Arcus


See also:

[CategoryWikiTechnology] [CategoryUnimplementedWikiTechnology]


Discussion

MeatballWiki | RecentChanges | Random Page | Indices | Categories
Edit text of this page | View other revisions
Search: