One very high correlate to ShotgunSpam is that it usually comes from China or Russian or Taiwan, and typically takes the form of non-Latin characters. If the page suddenly has an explosion of non-ASCII characters or unicode characters, then it stands to reason that the post is spam. Also, if they post external links, you can load those links to see what the character set is on those pages. If it is big5 of gb2313 for instance, it stands to reason you've been spammed.

Another alternative is to scan the words in the diff against a dictionary of English (or whatever the language of your wiki is) words. If there are very few English words, it is probably spam.

However, many e-mail spam ContentFilters used strategies like this for a while, and all this did was result in spammers pasting huge blogs of non-sense into their e-mail to. Also, if you ever do want to link to a page in one of the target languages or have a quotation from one of those languages, you are out of luck.


I would certainly prefer a language filter, to banning all of China.

I was talking with a friend who is currently working in China, teaching English. For many reasons, I would really like to introduce him to this site (in fact, I would whole-heartedly 'sponsor' him as a potential member. If we block all IP addresses that originate in China, I will obviously not be able to do this.

If it is possible to reliably detect the chineese characters based on their use of unicode, then this seems to me to be preferable (In spite of the fact that I use a great many special characters (all be it only in the 00-FF range) within my personal Wiki). -- HansWobbe

A LanguageFilter is not a real solution. It's not particularly relevant to Google what languages the spammers use, and it doesn't target the underlying economic advantage, which means that English-based (or other latin-based) spammers are bound to appear. Besides, if I want to talk about Human Rights in China, it would be a problem for me. Nonetheless, as a stop gap solution, it's fine if it works and allows us to unban China again. But of course I am not opposed to RegionalBans. -- SunirShah


