Contributors: AlexSchroeder, YonatSharon?
The default charset is often ISO-8859-1 (Latin 1). That depends on the browsers used and the language environment of users, however. Usually you would set your charset in the wiki engine. UseMod, for example, has the following config option:
$HttpCharset = ""; # Charset for pages, like "iso-8859-2"
What that means is clear: If you put iso-8859-1, then the pages will require manual intervention to display correctly when anything else is to be displayed. Perhaps future browsers will be smart enough to autodetect encodings (some editors such as Emacs already do that for ordinary documents), or perhaps it would be best to stick to some Unicode encoding such as UTF-8 (as required by Java, for example).
Also note that when a user edits a page, his edits should be in the same coding system as the original pages -- or escaped unicode entities...
"UTF-8" didn't work on UseMod 0.92 without some tweaks. See UseMod:SupportForUtf8. This has been fixed in later releases.
If you want to support audio browsers, or built in spelling checks, you will need the language attribute as soon as you have several languages. Example:
<body lang="en"> ...
or
<p lang="de"> ...
One way to determine this might be a very simple heuristic scanning the page or the paragraph for typical words. See EmacsWiki:GuessBufferLanguage for an example.
Browsers usually send the acceptable languages along with their requests, such that websites that have their web servers configured correctly can react accordingly.
For present day wikis it seems that they would do best by storing the prefered language in the user preferences, if at all.
For present day wikis it seems that they could share the namespace of wiki pages with very little difficulties. CategoryWikiTechnology turns into KategorieWikiTechnologie, for example. Sometimes foreign languages adopt foreign words. Especially English words. Thus the problem still needs to be solved for pages such as TourBus. Perhaps TourBusDe or TourBusGerman or TourBusDeutsch would be appropriate. If this naming convention was followed rigorously, then perhaps it can be exploited by the wiki engine by listing translations of the page.
If different languages get stored onto different pages, the problem is that threads and contributions are divided artificially. This reduces the collaborative aspect of a wiki by constructing unnecessary barriers. (Assuming a multilingual audience.)
Another problem is the LinkPattern, because it relies on having both uppercase and lowercase letters. This works for latin languages, but not for any others (Hebrew, Arabic, Persian, Urdu, Chinese, Japanese, Thai, ...).
Usually a wiki will have to limit the LinkPattern to pure ASCII characters and rely on FreeLinks for any byte sequences. Locale information might provide you with the ability to distinguish uppercase from lowercase characters. This requires your web server running the wiki to run in the same locale your page database is stored in. This may not necessarily be true.
RecentChanges gets clogged with content in various languages. One solution would be to determine the languages used on every page and store this somewhere. Every user could then filter the output of RecentChanges according to the languages used.
This has been implemented in UseMod:WikiPatches/MultiLang.
TwinPages depends on pages having the same (canonical) name on different wikis. To implement twin pages from A to B, for example, A needs a list of all the pages on B, and whenever a page is displayed on site A, where the same page also exists on B, A automatically adds a link from A to B. What happens, however, when A and B are not in the same language? Then pages with related content will not share the same name on A and B.
This is important when -- instead of creating a true multilingual wiki -- the community is split along the language barrier, and every
language group uses its own wiki. Now, in order to link ThisPage on A with CettePage on B and DieseSeite on C, you need to have some sort of translation.
The easiest solution for the problem of translated or even just related pages on other languages, is to have community maintained InterWiki links. Define the prefix Fr: for French pages, create a French wiki, and from the english wiki, use Fr:FooBar to point to the translated or related page.
The drawback is that if you have n translated pages and add a new one, then you have to edit the existing n pages to add the interwiki link to the new translation. If n is small, or if one page is considered to be the "main" page, then that is not a problem.
If you have an idea that fits in one or two paragraphs, add it to this page. If your idea takes more space, or if it described on another site, link it from here with a short comment.
MandrakeClubImplementation -- every translation has a status of "in sync", "not in sync", "not in sync, older" and "not in sync, newer". This status is maintained automatically.
LinuxWiki:MultiLang -- (text in German) meta information notes the various translations of a page in SisterSites. This solves the problem with the InterWiki solution mentioned above.
TagBasedTranslationMaintenance -- some ideas that comes to my mind when I was translating some pages on the EmacsWiki. Everyting is based on a tag and imho fits in the wiki habits.
The main advantage of such an approach would be to have "one source" which collects improvements whether the different translations may lag behind.
I'm not sure, whether this is the right place for this question. If there is a better one: please move it there. -- ThomasKalka
Well, CommunityWiki:[Multilingual experiment] has a lot of what you describe. But there are many further ideas here, I still have to understand. Let's work on with this issue. I'm going a bit over the top on CommunityWiki:[Editable title]. I do so sometimes. -- MattisManzel