| Random Page
A generic WikiSyntax
parser that is configurable at a data level. Ultimately allows translation from one syntax to another, and more importantly, a DocumentObjectModel
that would make it more feasible to build a WysiwygWiki
The primary goals of the Wikix parser are to
1. Describe WikiSyntaxes using a common WikixStyleSheet? language
2. Emit valid XHTML
The secondary goals of the Wikix parser are to
1. Parse WikiSyntax into a DocumentObjectModel that can be re-emitted based on another Wikix stylesheet, thus allowing WikiSyntax-to-WikiSyntax translation on the fly; and also optionality for editors to pick a WikiSyntax preferable to them.
2. Store documents in a DocumentObjectModel format that could lend itself to a WysiwygWiki editor. The biggest benefit and problem of WikiSyntax is that it is enmeshed directly within the text, so it is both easy to create by keyboard and difficult to manage by GraphicalUserInterface?.
In this way, the Wikix parser may be an important step towards migration towards a RichTextEditor?.
Initially created by SunirShah for BibWiki and abandoned; may see the light of day while reviving MeatballWiki. Code will be appropriately OpenSource licensed (GPLv3).
The Wikix parser is based on understanding the most common patterns in the design of WikiSyntax, while still constraining the behaviour to be rationalized and consistent.
The core design goals are
- Separate specification of the WikiSyntax from the implementation; thus
- Flexibly add and change WikiSyntax without rewriting entire engines, and ideally dynamically user-configurable
- Allow XHTML->wiki syntax translation
The system should be a blackbox. It takes
- Input: a JSON WikixSheet? that specifies the syntax rules for the engine
- Input: A function to determine if a page exists or not
- (optional) Input: an InterWiki IntermapTxt? file
- Input: The text to transform
- Output: the transformed XHTML
Overall the parser algorithm is a RecursiveDescentParser?, on a stream of lines. Syntax rules are arranged in a hierarchy (technically a DirectedAcyclicGraph?). Each syntax rule captures a portion of the input, and then recursively runs its childrens' syntax rules against that captured input until the entire text is transformed.
Syntax rules are grouped by type. Almost all known WikiSyntax rules belong to one of these types. These types are designed to operate within a stream of text lines. This allows individual syntax rules to focus on what they look like, rather than to be designed to handle such problems as end-of-lines, or line wrapping, or inline modification of the input stream.
Because the input is modified in place, to avoid collisions between recursive rules matching on previous output, the system allows rules to move a portion of the emitted output stream into a store which will later be restored before the final output is returned.
Along the way, the system will also collect links if requested.
- Only paragraphs and lines can contain inline_styles
- Children are listed in descending order of priority
- Multilines can only have starts, not equals, ends, or optionallyEnds
- Blocks MUST have starts AND ends; never equals or optionallyEnds
- Compiler generates regexy things for starts and ends
- links require href and text
- equals cannot have children
- links cannot have children
Issues with Python code
- GPLed for now. Could be changed to MIT
- The parser now handles almost all of MeatballWiki's TextFormattingRules
- except ISBN links
- except unicode in the WikiLink?s. Will likely need to switch to PyPi? regex module
- requirements.txt is a trainwreck
- code needs to be SelfDocumentingCode?
- explanation of the model
- XHTMl -> Wiki transformer needs to be ported and upgraded
- I dislike the regex hack for CamelCase""s. It would be better to have a more BNF-like rule set. I suppose one could match CamelCase""\S+ with children CamelCase -> link, "" -> ''
- The introduction of collections.deque because Python lists are non-shiftable is a bit of a hack; could be cleaner
- The String(str) class is totally the wrong architecture
- No table of contents (<toc> + == # heading ==)
- No numbered bracketed links like this 
- Many of these require some kind of lambdas to execute based on the syntax rule
I would greatly appreciate if someone with fresh eyes compared http://meatballsociety.org/wikix/TextFormattingRules.html and TextFormattingRules and identified any differences to suss out any bugs.
I know about the one with mixed lists; MeatballWiki is actually incorrect and I don't consider this normal behaviour i.e.