The first thing I'd like to discuss is whether it is acceptable for Sunir and I to put our anti-spam ideas in place first, and check back with the community later. Obviously, I think yes in this case, because spam moves fast, and the longer we wait, the more work we have to do restoring things later. Changes will always be an OpenProcess, except for cases such as HoneyPot where the actual pagename is not publicised. -- ChrisPurcell
If nobody puts their name down, I think we should assume that tackling the problem as effectively as we can by ourselves is the consensus decision of the community. Admin tasks like this probably don't interest other MB regulars, as evidenced by past SilentAgreement on our decisions. If we want to DevolvePower, we'll need to be unilateral in it, and see who decides to join up then. (If somebody signs up here, of course, this conclusion no longer holds.) -- ChrisPurcell
Just to reiterate this: transparency != DelayAction. As I said, "changes will always be an OpenProcess". Secondly, how much more overt can one be than a freely-editable list of actively (rather than passively) interested people? Thirdly, I don't believe the situation on MB matches a "secret circle"; far from it, as we've always published exactly what we're doing and what we've done, and explicitly invited people to interact whenever we've done so. Finally, to my mind your remaining comment supports my position: if we want it to look like it's not "our project", we need to be unilateral in DevolvingPower, not sit around waiting for people to get interested in a dry discussion. That aside, if you want all decisions to be left open for a two-week period, rather than implemented straight away if the members signed up agree, then this isn't a ConsensusGroup and we shouldn't label it as such; it's a standard PeerReview system with a two-week veto. -- ChrisPurcell
Discussing things here rather than out-of-line is a smart move, and I agree. Now we only need some way of stopping you agreeing 200% with everything ;) -- ChrisPurcell
Any objections if I enable the Akismet anti-LinkSpam service for Meatball, and dump the BanList? I'm concerned with the number of people who are being blocked by our HardBans. -- SunirShah
Akismet is currently blocking a dozen or so spam attacks against the DesignBibliography a day. I'm quite impressed. Ultimately, I'd prefer if it were more informational than an ActiveDefense?, but one step at a time. -- SunirShah
I've been reverting spam on meatball lately. I value meatball, so I want to help when I see this in the recent changes. Akismet is awesome. although, I wonder, whither SharedAntiSpam ? -- SamRose
Ok, I've enabled it. (Oct 18, 2006). -- SunirShah
I've removed the die. It will keep logging, but it will stop blocking the edit. -- SunirShah
Hmm. Thanks Chris for going through this. I'm not impressed with Akismet any longer. Above all else, it fails to be an OpenProcess, which makes it potentially disastrous. -- SunirShah
When blacklisting URLs, I've found in almost every case, it's better to ban the domain used, not the specific subdomain; for instance, example.com, not myspam.example.com. Subdomains are essentially free; domains require registration fees. The only exception is where the spammer has usurped a free service, such as an unmonitored wiki or upload site, to host their spam, and this free service is contained only in a specific sub-domain; here, it is sensible to only ban the compromised sub-domain. Universities are a common example of this recently. -- ChrisPurcell
I'm thinking of adding a couple of pages as new HoneyPots. They're fairly obvious targets: just look at [30 days of RC] and find the ones edited more than 80 times in constant spam-wars. What do people think? -- ChrisPurcell
Akismet seems like an attractive option vs. HardBan. Especially given the fact that Meatball is, for whatever reason, given almost hourly attention by spammers. Although, if enough people are able to help, Chris's local HoneyPot is actually worth working on and seeing if it can be refined and improved. Knowledge about how to deal with these problems locally without having to purchase third party solutions is presumably very valuable to wiki communities. I'd be willing to lend a hand helping with the HoneyPot solution, if help is needed -- SamRose
Is there a possibility to remove an entry or get an entry removed? --MarkusLude
I'm changing my vote to a veto of the extra HoneyPots for now, since we seem to have found better solutions below. -- ChrisPurcell
It's come to my attention that the domain-blocker is case-sensitive, but URL domains aren't. I'd call that a pretty big hole in our defences; we got another wretched blogspot spam because some spammer was smart enough to work this out. I can fix this by simply making the filter case-insensitive. Do I have support? -- ChrisPurcell
Please, do it. -- SunirShah
Now that the OpenProxy defense and Akismet have been knocked down, our spam situation is out of control. I went digging through the source code to MeatballWiki last weekend to see if I could build yet another automated defense system. First, the obligatory statement must be made that, "Man, is the code terrible. It doesn't make any sense!" I'm really agitated by the state of the codebase. I'm also kind of bitter that I have to go back and fix the problem again. Like, holy crap, I really do not want to spend my life defending MeatballWiki from spam. At some point, it's cheaper just to shut the site down.
Anyway, I don't even know what to do, as it is clear that blocking open proxies would solve 95% of our spam problems. What was the problem with the detector? Was it just too slow, or was it blocking legitimate actors?
The other problem is that I don't want to learn the code base. I'd much rather spend time on the DesignBibliography. -- SunirShah
Later the same day. I spent some time today building an offline OpenProxy detector. It goes through the RecentChanges and scans recent contributors after the fact. This will tie into a more specialized RecentChanges later. -- SunirShah
Hi Sunir, instead of shutting it down, what about graceful degrading this open wiki to a wiki for registered users only, at least for some time. I guess, there is enough momentum of core-users to continue this wiki and for new ones, I suppose strongly, we will get enough new friendly constructive people via WikiBlogging and similar community tools. -- FridemarPache
Here is perhaps supplementary set of actions to some of those suggested above: what if, when the spammers are recognized, they are diverted to a form that is a type of survey that is used to help create a psychology map? Once they fill oput this form, they are then reciprocated with a blank page that is seperate from the main wiki where they can deposit their spam. What of this psychology map data is then compared to geographic location data? What if some of these forms are able to be modified over time, in order to fool a cross section of some of the spamming people into revealing more about who they are, and why and how they do this.
So what is the point of creating an (incomplete) "psychology map", and of knowing the location, and any other information you can get about spammers as actual people? In my opinion, knowing their life/existential conditions, and their local problems of existence, and their world views, knowing "why" they spam is the key to creating the conditions of existence (online anyway) that obsolete spamming in general.
For instance, is the blog/guestbook/wiki spam really working for them? What if they could make more money, online, doing something that was constructive and contributive, instead of destructive? What if everyone who had to deal with spam online chipped in a couple of dollars a month to help create such a system, and created blog and wiki software modules that helped to inform spammers about the system? Perhaps also supplemented by people who go into the geo-locations of spammers, and spread around information about the new way to make more money online? Maybe that is crazy, but I see spam as more of a social issue than a technical issue. Of course, in the mean time, there is still daily spam to deal with...-- SamRose
Chris is right. We couldn't pay the spammers. But I am 200% behind forwarding the spammers to a survey. I don't like treating people like insects; I think it would be valuable to see if we could learn something about them by engaging them. It couldn't hurt. Sam, if you could start brainstorming what was on your mind for the SpammerSurvey project, I'll put my shoulder into it. -- SunirShah
I also like a tangential idea of your proposal: send spammers to a TarPit?. The TarPit?, however, will not be indexed by search engines. I fully believe that spammers don't notice or care such things as NoFollow, so this should fool them sufficiently. -- SunirShah
Resolved as the EditHash, but currently the back button is broken.
I'd be interested in seeing how spammers interact with the site. In particular, percentages of spammers that post (HTTP POST) spam directly, not having retrieved (HTTP GET) a page/form. -- JaredWilliams
If can see when spam is being posted, and track back to see the previous actions, yes, would be interested. Can email me at Jared.Williams at ntlworld dot com . -- JaredWilliams
After having a quick look at it, one thing immediately sticks out, is the number of searches. There are 3,783 searches (covering period of 3rd Oct - 26th Oct) performed (/cgi-bin/mb.pl?action=search&) but only a fraction ~9% actually seem legitimate. The other 91% appears to be spam robots searching with spam. So filtering searches by the same regexp filter as the page submissions go through, might save CPU time, if it's not doing that already. Wondering if the spam robots perform a search to find pages to target. -- JaredWilliams
One idea would be to return the HoneyPot in the search results if the search has blacklisted URLs. So if the spam robot is targetting pages that way, it'll ban itself. Also if the reason they're searching is to see if they need to spam (ie don't find any pages with their spam) then that might prevent them too. -- JaredWilliams
The once instance of spam posting I picked out, was preceded by a search using spam, 4 hours previously. Which got me thinking maybe a connection. Need to get the log into SQL so can more easily see patterns.
Seems 12 were for a single plain URL.
Around ~1,900 start with a sentence (like how good/excellent/very good the site is), and then have the spam payload.
3,153, 83% of all searches, have the same URLs duplicated but within different markup. As HTML <a href="...">...</a>, BB Text markup [url=...]...[/url], and some also include it as plaintext. The shotgun approach to getting the spam URLs to render correctly.
Eg: <a href="hxxp://florida.quck.info/">florida</a> news [URL=hxxp://florida.quck.info/]florida[/URL] photo hxxp://florida.quck.info/ florida read hxxp://costa-rica.quck.info/ costarica doc <a href="hxxp://costa-rica.quck.info/">costarica</a> video [URL=hxxp://costa-rica.quck.info/]costarica[/URL] photo
My home IP is now blocked by the MB anti-spam device (I'm unfortunately a -not so- proud AOL user). I will use office connection again a soon a possible. -- JeanChristopheCapelli
No. I disabled the OpenProxy detector, hence our explosion of spam recently. -- SunirShah
Here is a quick chart that I threw together for unique wiki spammer IP addresses per location from 10/27-11/01
If there is an easy way to get the data for a longer time period, I wouldn't mind putting together a chart that covers more than just a few days (like maybe once every quarter-year?), if the information is useful to anyone. I thought it might be useful data for SpammerSurvey, for instance. -- SamRose
This is what I am planning to build next. If the diff demonstrates you are adding an external link, the system will forward your post to a page that will ask you to enter the password. The password will be present on the same page. If you copy and paste the password properly, the post will continue onwards. This will block the robots and non-English speakers. Any objections? -- SunirShah
By the way, you can try it out on http://usemod.com/meatball2/mb.pl?action=edit&id=SandBox
The code may be a bit unstable as I just started refactoring it to use TemplateToolkit? 2. -- SunirShah
Most pages on Meatball are in english or french. Maybe add a french translation of "Magic secret required" and the following sentence mentioning the secret. -- MarkusLude
The patch is now live. If someone wrote a French translation, I'll put it on. I'll do the cookie thing next. -- SunirShah
Looks efficacious. Nice one! -- ChrisPurcell
I'm having trouble navigating this safeguard. (I assume it's this one) do I copy and paste only the italicized word? can I just type it? What is the magic secret? -- CliffSmith
As a solution to the ThoughtStorms? 'bot, I suggest we insist that all future edits have a non-empty digest field. If the ThoughtStorms? editor is, as it looks, a botnet gone mad with power, that should stop it once and for all, since its author is not likely to change it to cope with MB when it's not even posting its actual payload to MB. We also get the nice side-effect of forcing regular authors to write digests in all cases. (The specific UI I'm thinking of is simply to bounce the user back to the edit page with Please enter a short summary of your changes in the digest field displayed.) -- ChrisPurcell
I'm partial on this, so I think we should try it and see how it feels. -- SunirShah
Two things to do:
-- SunirShah
Maybe I am missing something here, but looking at RecentChanges, it looks like this has apparently dramatically cut down on WikiSpam. -- SamRose
Hmmm...that was a short lived victory :(.
Block edits where page content equals digest, and if that doesn't work, a last resort HumanVerification may definitely be worth a try. Although, it seems once they figure out block edits where page content equals digest, that they will be able to automate a way around that by having the bot enter randomized content in the digest.
Another idea that popped into my head while reading this is that it might be possible to create some type of bot of our own that reverts the changes of a spam bot like this. SoftSecurity seems to hit a threshold with spam bots. Creating our own bot and getting it to successfully work against only spam bots is of course a lot more complex than Block edits where page content equals digest, or HumanVerification. But, it gets me thinking about how bots can potentially counter other bots, anyway. -- SamRose
The posts log demonstrates there are no URLs in the GuestBook vandal attacks. Another interesting fact is that the edit request comes from a different IP than the POST, which has been discussed before. The UserAgent? is also only present on the POST half of the transaction, and missing on the GET. The UserAgent? is Opera/9.0 (Windows NT 5.1; U; en), which is unique to this attacker. -- SunirShah
I've implemented an EditHash to deal with the RotatingProxy attack. -- SunirShah
The two new anti-spam patches seem to be remarkably efficacious against spam; checking the logs, it seems the regex filter is almost never triggered. Even when it is, the subsequent HumanVerification step could probably have been relied on to catch the spam. I suggest we now reset the filter, since several important sites (e.g. blogspot) are banned by it. (Thanks, Fridemar, for this suggestion.) Do I have consensus? -- ChrisPurcell
Okay, I've reset the spam filter, and disabled the HoneyPot to prevent it ever being compromised. If we need it, I'll re-enable it. -- ChrisPurcell
InterMapTxt is, irritatingly, a prime target for spammers because of the large quantity of links it contains: faux-spam attracts real-spam. This means those few human spammers not caught by HumanVerification are breaking the PeerReview mechanism. We could perhaps fix this problem by NOINDEXing the page, preventing spammers picking it up in standard search engines. Anyone think this is likely to work? -- ChrisPurcell