CoolHostCase

MeatballWiki | RecentChanges | Random Page | Indices | Categories

On October 8th and 9th, we experienced a nightmare WikiSpam attack here on MeatballWiki. Here is how it neatly avoids almost all anti-energy weapons on CategorySpam:

HardBan. IPs were rotated, with no use of an OpenProxy, and almost no reuse of an IP. Even domains were rotated, with over a dozen used, limiting the effect of blanket bans.

ContentFilter/LanguageFilter. While the domain used was the same each time, nothing else could really have been used to identify the additions as spam. The posts were ASCII, and looked like acronyms. No telltale words.

SurgeProtector. Edits were spaced out over a long period of time, minimising the chances of catching them without crippling the site for regular users and preventing reversion of the spammed pages.

SpiderTrap. If a 'bot was used, it was well-trained, following a very limited number of links and completely avoiding MB's SelfBan traps.

The attack was consistent with human control, meaning neither HumanVerification nor EditMasking would have caught the posts. As an added insult, the attacker actually removed their own links, meaning the existing anti-motivation weapon (history pages are NotIndexed) completely failed to have any effect.

Spammers are growing immune to anti-energy weapons we haven't even implemented yet. While this adds weight to the thesis of MotivationEnergyAndCommunity - we must universally deploy anti-motivation weapons or risk losing our anti-energy ones - it also adds weight to the new thesis of GlobalRevert: we need energy weapons. We must fight fire with fire!

Given the fact that the IPs in use are on the residential networks, it's likely he or she is using zombies rather than OpenProxy machines. It is not likely I will be able to portscan for every security hole imaginable. -- SunirShah

Most of the IPs are from north american ISPs. In most attacks I've seen so far the IPs were more widely distributed. Other wikis were hit too by the same spam during the last weeks. Interestingly for a small amount of these wikis the main domain disney.com was used instead. Different LinkPatterns were used to make them work (partially) with different WikiEngines. -- MarkusLude

One of the domains has an OpenProxy on port 113. -- SunirShah

Most of the machines are Windows machines (evidenced by an open port 1025; HTTP response on port 5000, which is Universal Plug-n-Play). This again suggests zombies. -- SunirShah

I'm think I ought to switch MeatballWiki to InfiniteMonkey. InfiniteMonkey is much more cleanly a state machine than any UseMod derivative including OddMuse, allowing operators to log, rollback, and replay each action to the site. GlobalRevert, for instance, would be a relatively simple method (10-30 lines) that would be much more easy to guarantee as correct and scale as the script grows in complexity. -- SunirShah

Wasn't the attack with much the same <div>...</div> blocks?

Would a diff-fingerprint used to check for repeatedly adding the same content be able to block that?

Have you an explanation for the large number of IP-resolutions? Are they real or somehow faked?

: Sure, a diff-fingerprint could work. It would also stop us adding new categories. And, as suggested above, the human attacker would simply start posting different text each time.

: As Sunir suggested, the large numbers of IPs are probably zombies: machines controlled by the attacker without their owner's knowledge. If we knew which port they were using, we could ban them as an OpenProxy, but that's pretty implausible as they could simply rotate the port they use. -- ChrisPurcell

Then, what about:

anonymous edits go through diff-fingerprint filter. Maybe 1 hour, 3 identical diffs => block
username/cookie edits not

I don't think that anonymous users act very much in the role of wiki gnomes. -- HelmutLeitner

: This attack shows us that any anti-energy weapon we think up will only be effective for a short while. The posts were nonsense, and could easily have been posted with different text every time. -- ChrisPurcell

Perhaps one must be flexible in defense and combine measurements. I don't think that terms like "anti-energy" or "weapon" add anything to this. If there are hard defenses (like required username, formal registration, capchtas, keywords, automatic reverts) are in place, one can always play around with attackers out of a position of strength. Instead of diff-fingerprints one could go to diff-line-fingerprints (of lines above a certain treshhold of characters). -- HelmutLeitner

: Sure, we could go to formal registration. I thought the idea was to keep this a wiki, though. If we allow anonymous contributions, even signed with a pseudonym, the attacker simply needs to re-register. The harder you make registration, the more you dissuade real contributors.

: Further, to repeat myself, the attackers could have used completely different text each time, such that only the domains matched. If you try and catch that, you'll end up blocking people who use "and" and sign. There were no keywords to match against. These filtration ideas would not have worked. -- ChrisPurcell

I wonder why wiki hosters try to reinvent the wheel on spam. Bayesian filtering works wonderfully on email or usenet spam, so why not train a filter for spam patterns in wiki editing diffs? A thing like this should be easy to implement using tools like CRM114, and would surely scare less contributors away than HardBan usage. -- AlanMcCoy?

: Read GlobalRevert. The lessons learned from email and usenet spam simply do not apply, because the attacker's aim is to increase PageRank, not sell a product. Hence there are no predictable textual patterns to be learned, because no information (apart from the domain) actually needs to be transferred (and domains are cheap and cannot be pattern-matched).

: Further, a Bayesian filter must be trained by a GodKing, or an attacker can simply mistrain it. This goes against the wiki grain, and is highly vulnerable to abuse by the GodKing to filter out disagreeable points of view instead of merely spam. Check out CategorySpam. -- ChrisPurcell

I have yet to see the CoolHostCase spam, but I fail to believe that there aren't any probabilistic patterns that can tell ham and spam apart. Sure, the spammers will become more sophisticated, too, but fighting against spam and spamming itself will always be an arms race.

I don't think it would take a GodKing to train. The diff histories should be enough training data, and we have yet to see if mistraining is that easy. Given the fact that in a well-used wiki the spam doesn't last long, the mistrained statistical data will most probably not last long, either. Perhaps training data can even be shared between wikis. Techniques like SharedAntiSpam don't have to stop on patterns extracted by humans, after all. This is our advantage compared to the spammers: they have to covertly design attack strategies, so there is always a limited number of brains involved. We have open communications, so good ideas can come from everywhere. -- AlanMcCoy?

Why "covertly"? There was nothing covert about any spam attack I ever saw, and there are places where spammers hang out and chat. Also, if the filter is trained by the public marking edits as spam, it will fail to stop spammers, because they can pretend to be an entire herd of honest citizens. Or it will be vulnerable to abuse to filter out non-spam.

Bayesian filters work for individual filtering, when the attacker can't see that their spam got caught. On a wiki, you have immediate feedback to the attacker. -- ChrisPurcell

The attack itself cannot be covert, but when it comes to unethical SEO methods, I have yet to see spammers who are discussing in a really open manner. At the point of becoming visibly unethical, too much publicity is not wanted and they switch to private communication channels. I'm not claiming that the bayesian filter cannot be outsmarted, in fact any technique can be, but the experiences gained from it could be beneficial. Also, in papers on bayesian spam fighting techniques you can find discussion of automated training methods, like spam traps. A smart attacker could also find out about and avoid them, but still they are useful, because the more investigation the spammer needs, the less economical he will be, so he will head for more error-prone techniques. -- AlanMcCoy?

The best thing to do is provide a feed of diffs from MeatballWiki so that you can develop a BayesianFilter. -- SunirShah

Chris, what if the system scheduled a weekly whois lookup on new external domains? If the whois has not yet been performed, or if the domain was newer than some predetermined period (say, a year), the rendering engine could emit the rel=nofollow attribute for the link.

Better yet, skip the whois query altogether and just have a PeerReview period for new domains. If a new domain link is introduced to the page database (even if it has been seen in the past, as long as it is not currently linked to), the system automatically puts the link on probation, making it rel=nofollow for two weeks.

: They are both only anti-motivation weapons (see MotivationEnergyAndCommunity), and as the above shows, it won't stop spam because the attacker won't know it's there.

: I like the second, if combined with a system to ensure attackers know their effort is wasted. Something that makes life a bit harder when adding previously unseen links, whilst destroying an attacker's motivation to get around the new system. This is exactly what I argue for on MotivationEnergyAndCommunity. However, GlobalRevert is still essential IMO. -- ChrisPurcell

One way we could make it obvious is to not even link the URL in the first place. If you link to a new domain, then the URL won't actually become a link unless it remains unchallenged for two weeks. MeatballWiki puts up with this sort of hardship for new InterWiki links; it's not too much trouble to do so for new linked domains.

We could try it. I'm not convinced it will demotivated spammers. I think they spam first and ask questions later, if ever. -- SunirShah

CategorySpam

CoolHostCase

Discussion