The attack was consistent with human control, meaning neither HumanVerification nor EditMasking would have caught the posts. As an added insult, the attacker actually removed their own links, meaning the existing anti-motivation weapon (history pages are NotIndexed) completely failed to have any effect.
Spammers are growing immune to anti-energy weapons we haven't even implemented yet. While this adds weight to the thesis of MotivationEnergyAndCommunity - we must universally deploy anti-motivation weapons or risk losing our anti-energy ones - it also adds weight to the new thesis of GlobalRevert: we need energy weapons. We must fight fire with fire!
Given the fact that the IPs in use are on the residential networks, it's likely he or she is using zombies rather than OpenProxy machines. It is not likely I will be able to portscan for every security hole imaginable. -- SunirShah
Most of the IPs are from north american ISPs. In most attacks I've seen so far the IPs were more widely distributed. Other wikis were hit too by the same spam during the last weeks. Interestingly for a small amount of these wikis the main domain disney.com was used instead. Different LinkPatterns were used to make them work (partially) with different WikiEngines. -- MarkusLude
Most of the machines are Windows machines (evidenced by an open port 1025; HTTP response on port 5000, which is Universal Plug-n-Play). This again suggests zombies. -- SunirShah
I'm think I ought to switch MeatballWiki to InfiniteMonkey. InfiniteMonkey is much more cleanly a state machine than any UseMod derivative including OddMuse, allowing operators to log, rollback, and replay each action to the site. GlobalRevert, for instance, would be a relatively simple method (10-30 lines) that would be much more easy to guarantee as correct and scale as the script grows in complexity. -- SunirShah
Wasn't the attack with much the same <div>...</div> blocks?
Would a diff-fingerprint used to check for repeatedly adding the same content be able to block that?
Have you an explanation for the large number of IP-resolutions? Are they real or somehow faked?
Then, what about:
Perhaps one must be flexible in defense and combine measurements. I don't think that terms like "anti-energy" or "weapon" add anything to this. If there are hard defenses (like required username, formal registration, capchtas, keywords, automatic reverts) are in place, one can always play around with attackers out of a position of strength. Instead of diff-fingerprints one could go to diff-line-fingerprints (of lines above a certain treshhold of characters). -- HelmutLeitner
I wonder why wiki hosters try to reinvent the wheel on spam. Bayesian filtering works wonderfully on email or usenet spam, so why not train a filter for spam patterns in wiki editing diffs? A thing like this should be easy to implement using tools like CRM114, and would surely scare less contributors away than HardBan usage. -- AlanMcCoy?
I have yet to see the CoolHostCase spam, but I fail to believe that there aren't any probabilistic patterns that can tell ham and spam apart. Sure, the spammers will become more sophisticated, too, but fighting against spam and spamming itself will always be an arms race.
I don't think it would take a GodKing to train. The diff histories should be enough training data, and we have yet to see if mistraining is that easy. Given the fact that in a well-used wiki the spam doesn't last long, the mistrained statistical data will most probably not last long, either. Perhaps training data can even be shared between wikis. Techniques like SharedAntiSpam don't have to stop on patterns extracted by humans, after all. This is our advantage compared to the spammers: they have to covertly design attack strategies, so there is always a limited number of brains involved. We have open communications, so good ideas can come from everywhere. -- AlanMcCoy?
Why "covertly"? There was nothing covert about any spam attack I ever saw, and there are places where spammers hang out and chat. Also, if the filter is trained by the public marking edits as spam, it will fail to stop spammers, because they can pretend to be an entire herd of honest citizens. Or it will be vulnerable to abuse to filter out non-spam.
Bayesian filters work for individual filtering, when the attacker can't see that their spam got caught. On a wiki, you have immediate feedback to the attacker. -- ChrisPurcell
The attack itself cannot be covert, but when it comes to unethical SEO methods, I have yet to see spammers who are discussing in a really open manner. At the point of becoming visibly unethical, too much publicity is not wanted and they switch to private communication channels. I'm not claiming that the bayesian filter cannot be outsmarted, in fact any technique can be, but the experiences gained from it could be beneficial. Also, in papers on bayesian spam fighting techniques you can find discussion of automated training methods, like spam traps. A smart attacker could also find out about and avoid them, but still they are useful, because the more investigation the spammer needs, the less economical he will be, so he will head for more error-prone techniques. -- AlanMcCoy?
Chris, what if the system scheduled a weekly whois lookup on new external domains? If the whois has not yet been performed, or if the domain was newer than some predetermined period (say, a year), the rendering engine could emit the rel=nofollow attribute for the link.
Better yet, skip the whois query altogether and just have a PeerReview period for new domains. If a new domain link is introduced to the page database (even if it has been seen in the past, as long as it is not currently linked to), the system automatically puts the link on probation, making it rel=nofollow for two weeks.
One way we could make it obvious is to not even link the URL in the first place. If you link to a new domain, then the URL won't actually become a link unless it remains unchallenged for two weeks. MeatballWiki puts up with this sort of hardship for new InterWiki links; it's not too much trouble to do so for new linked domains.
We could try it. I'm not convinced it will demotivated spammers. I think they spam first and ask questions later, if ever. -- SunirShah