MeatballAntiSpamArchive

MeatballWiki | RecentChanges | Random Page | Indices | Categories

Archive page for MeatballAntiSpam committee.

1. Proactivity?

The first thing I'd like to discuss is whether it is acceptable for Sunir and I to put our anti-spam ideas in place first, and check back with the community later. Obviously, I think yes in this case, because spam moves fast, and the longer we wait, the more work we have to do restoring things later. Changes will always be an OpenProcess, except for cases such as HoneyPot where the actual pagename is not publicised. -- ChrisPurcell

: Spam isn't the plague. It's a nuisance. It's ok to go a little slow. If we don't act like we alone own the problem, I hope more people will care, and it will motivate us to share what we're learning. I'd prefer to come here first, talk about what we're doing as solutions in CategoryWikiSpam?, and invite feedback. -- SunirShah

If nobody puts their name down, I think we should assume that tackling the problem as effectively as we can by ourselves is the consensus decision of the community. Admin tasks like this probably don't interest other MB regulars, as evidenced by past SilentAgreement on our decisions. If we want to DevolvePower, we'll need to be unilateral in it, and see who decides to join up then. (If somebody signs up here, of course, this conclusion no longer holds.) -- ChrisPurcell

: If we don't provide an overt PermissionToParticipate?, no one will know to, want to force themselves into our secret circle. I also think that there has been no SilentAgreement, but rather a disbelief that active participation is either possible, worthwhile, or wanted. The more the powers that be do, the less others will feel it is their place to take ownership; or in other words, if it looks like your project, it is your project. Moreover, it's critical to be transparent with algorithms that regulate the community. See the Case of Badvogato (cf. OpenProcess#badvogato). -- SunirShah

Just to reiterate this: transparency != DelayAction. As I said, "changes will always be an OpenProcess". Secondly, how much more overt can one be than a freely-editable list of actively (rather than passively) interested people? Thirdly, I don't believe the situation on MB matches a "secret circle"; far from it, as we've always published exactly what we're doing and what we've done, and explicitly invited people to interact whenever we've done so. Finally, to my mind your remaining comment supports my position: if we want it to look like it's not "our project", we need to be unilateral in DevolvingPower, not sit around waiting for people to get interested in a dry discussion. That aside, if you want all decisions to be left open for a two-week period, rather than implemented straight away if the members signed up agree, then this isn't a ConsensusGroup and we shouldn't label it as such; it's a standard PeerReview system with a two-week veto. -- ChrisPurcell

: I 200% agree with you that it is a Good Idea that the group of people on this list can take necessary action immediately if there is a consensus. However, I also think we should act as if there is a valuable newcomer teetering on whether or not to join. The more proactive at inviting others we are, the more people will want to join. Thus, rather than having the discussion in Skype or email as we have been doing, we should have it out here. It's slower, but rarely is spam such a threat that it requires very quick responses. In fact, I think it pays to go slow so we can learn from the current attacks and then make wiser decisions, rather than reactionary ones. -- SunirShah

Discussing things here rather than out-of-line is a smart move, and I agree. Now we only need some way of stopping you agreeing 200% with everything ;) -- ChrisPurcell

2. Akismet?

Any objections if I enable the Akismet anti-LinkSpam service for Meatball, and dump the BanList? I'm concerned with the number of people who are being blocked by our HardBans. -- SunirShah

: I support dropping the HardBans — very un-wiki ;) — but do we need Akismet or can we get by with the user-maintained HoneyPot? At the least, if you enable Akismet, a full log of exactly what's blocked would be important so we can PeerReview the system and make an informed choice later. -- ChrisPurcell

Akismet is currently blocking a dozen or so spam attacks against the DesignBibliography a day. I'm quite impressed. Ultimately, I'd prefer if it were more informational than an ActiveDefense?, but one step at a time. -- SunirShah

: Then switch to it. We could have it as simply informational for a week or so — just to verify it's as efficacious here as on the DesignBibliography — but I'm all for saving our community the hassle of any anti-spam activity. -- ChrisPurcell

I've been reverting spam on meatball lately. I value meatball, so I want to help when I see this in the recent changes. Akismet is awesome. although, I wonder, whither SharedAntiSpam ? -- SamRose

: Pushing SharedAntiSpam forward is still high on my list of things to do. Akismet will relieve much of the stress in the meantime, and provide a useful benchmark for performance. -- SunirShah

Ok, I've enabled it. (Oct 18, 2006). -- SunirShah

: I tried to post to OurCultureWikiQuestionnaire? but it was understood as spam and I was sent to MessageBoard. -- AndriusKulikauskas

I've removed the die. It will keep logging, but it will stop blocking the edit. -- SunirShah

: OK, some figures. Since it was enabled in logging-only mode, the Akismet filter has caught lots of posts that the existing filters missed, but which bizarrely failed to turn up as real edits. I can only assume the spammers are hitting "Preview" instead of "Save". It has also mirrored the decisions made by our domain-banner, giving some confidence. It would also have caught spam attacks on four pages which our domain-ban missed. It missed the complete wiping of AutomaticHomepage, but so did our filters. It has missed one link-spam attack, which our filters also missed. I'll keep reviewing the logs and post new data here.

: So far, there have been three recorded false positives, the second being Andrius' legitimate edit of OurCultureQuestionnaire, and the third FredericBaud's comment on JeanChristopheCapelli. The problem is that there's no documented way of getting past the Akismet filter, unlike all our existing filters, which are easy to disable by simply determining which URL is offensive to them. I'm not sure what detriment this "magic" filtering would be to the trust of our editors in our wiki. -- ChrisPurcell

Hmm. Thanks Chris for going through this. I'm not impressed with Akismet any longer. Above all else, it fails to be an OpenProcess, which makes it potentially disastrous. -- SunirShah

: I concur; we should stick to other solutions. Akismet is now vetoed by the committee. I've removed the Akismet logging from mb.pl to save us some CPU cycles. -- ChrisPurcell

3. HoneyPot

When blacklisting URLs, I've found in almost every case, it's better to ban the domain used, not the specific subdomain; for instance, example.com, not myspam.example.com. Subdomains are essentially free; domains require registration fees. The only exception is where the spammer has usurped a free service, such as an unmonitored wiki or upload site, to host their spam, and this free service is contained only in a specific sub-domain; here, it is sensible to only ban the compromised sub-domain. Universities are a common example of this recently. -- ChrisPurcell

I'm thinking of adding a couple of pages as new HoneyPots. They're fairly obvious targets: just look at [30 days of RC] and find the ones edited more than 80 times in constant spam-wars. What do people think? -- ChrisPurcell

: Great idea! Do you think you can make the list of honey pots a file to edit while you're at it? -- SunirShah

Akismet seems like an attractive option vs. HardBan. Especially given the fact that Meatball is, for whatever reason, given almost hourly attention by spammers. Although, if enough people are able to help, Chris's local HoneyPot is actually worth working on and seeing if it can be refined and improved. Knowledge about how to deal with these problems locally without having to purchase third party solutions is presumably very valuable to wiki communities. I'd be willing to lend a hand helping with the HoneyPot solution, if help is needed -- SamRose

: By all means, join us in using the HoneyPot! It's there for the whole community; otherwise we could have made it an admin-only task. Think of it more as saving us all time reverting subsequent attacks by the same spammer, rather than as helping a few admins with an arduous task. Just email me and I'll let you know how it works. I also concur with the survey idea. -- ChrisPurcell

Is there a possibility to remove an entry or get an entry removed? --MarkusLude

: Technically, it's trivial for an administrator. If anyone has an idea of how to DevolvePower over the ban list, that would also be good. -- SunirShah

I'm changing my vote to a veto of the extra HoneyPots for now, since we seem to have found better solutions below. -- ChrisPurcell

4. Case-insensitive domains

It's come to my attention that the domain-blocker is case-sensitive, but URL domains aren't. I'd call that a pretty big hole in our defences; we got another wretched blogspot spam because some spammer was smart enough to work this out. I can fix this by simply making the filter case-insensitive. Do I have support? -- ChrisPurcell

: There is no reason to object that I can think of. You have my vote. -- SamRose

Please, do it. -- SunirShah

: Great, all done. -- ChrisPurcell

Now that the OpenProxy defense and Akismet have been knocked down, our spam situation is out of control. I went digging through the source code to MeatballWiki last weekend to see if I could build yet another automated defense system. First, the obligatory statement must be made that, "Man, is the code terrible. It doesn't make any sense!" I'm really agitated by the state of the codebase. I'm also kind of bitter that I have to go back and fix the problem again. Like, holy crap, I really do not want to spend my life defending MeatballWiki from spam. At some point, it's cheaper just to shut the site down.

Anyway, I don't even know what to do, as it is clear that blocking open proxies would solve 95% of our spam problems. What was the problem with the detector? Was it just too slow, or was it blocking legitimate actors?

The other problem is that I don't want to learn the code base. I'd much rather spend time on the DesignBibliography. -- SunirShah

Later the same day. I spent some time today building an offline OpenProxy detector. It goes through the RecentChanges and scans recent contributors after the fact. This will tie into a more specialized RecentChanges later. -- SunirShah

Hi Sunir, instead of shutting it down, what about graceful degrading this open wiki to a wiki for registered users only, at least for some time. I guess, there is enough momentum of core-users to continue this wiki and for new ones, I suppose strongly, we will get enough new friendly constructive people via WikiBlogging and similar community tools. -- FridemarPache

I would certainly prefer to login, rather than have the site (a) converted to Read-Only, (b) taken off th net. -- HansWobbe

What about ClipAndAnnotateSpam? (We can even annotate them on their sites as a community effort). What we need are more FriendlyLinkWorker?s who can develop LinkAwareness? by WikiWeaving. -- FridemarPache

5. Spammer survey?

Here is perhaps supplementary set of actions to some of those suggested above: what if, when the spammers are recognized, they are diverted to a form that is a type of survey that is used to help create a psychology map? Once they fill oput this form, they are then reciprocated with a blank page that is seperate from the main wiki where they can deposit their spam. What of this psychology map data is then compared to geographic location data? What if some of these forms are able to be modified over time, in order to fool a cross section of some of the spamming people into revealing more about who they are, and why and how they do this.

So what is the point of creating an (incomplete) "psychology map", and of knowing the location, and any other information you can get about spammers as actual people? In my opinion, knowing their life/existential conditions, and their local problems of existence, and their world views, knowing "why" they spam is the key to creating the conditions of existence (online anyway) that obsolete spamming in general.

For instance, is the blog/guestbook/wiki spam really working for them? What if they could make more money, online, doing something that was constructive and contributive, instead of destructive? What if everyone who had to deal with spam online chipped in a couple of dollars a month to help create such a system, and created blog and wiki software modules that helped to inform spammers about the system? Perhaps also supplemented by people who go into the geo-locations of spammers, and spread around information about the new way to make more money online? Maybe that is crazy, but I see spam as more of a social issue than a technical issue. Of course, in the mean time, there is still daily spam to deal with...-- SamRose

: Are a few wikizens are going to be able to offer enough money to outcompete major businesses, who are willing to throw a fair amount of money to spammers to raise their PageRank? Remember that we can't simply bribe a few spammers; we'll need to pay them all off. Even if we manage to scrape together the funds initially, the more money you spend, the more people will want to get a piece of this easy cash. If the numbers go past our ability to pay them, the remainder will once again seize upon the money being offered by the major businesses — spamming — and we're right back to square one. Remember: if the honey pot doesn't kill the wasps, all you do is feed a new generation of wasps ;) -- ChrisPurcell

Chris is right. We couldn't pay the spammers. But I am 200% behind forwarding the spammers to a survey. I don't like treating people like insects; I think it would be valuable to see if we could learn something about them by engaging them. It couldn't hurt. Sam, if you could start brainstorming what was on your mind for the SpammerSurvey project, I'll put my shoulder into it. -- SunirShah

: I guess the "paying spammers to do constsuctive work" idea I was talking about was more of an internet-wide idea. I definitely don't think that it could be accomplished by just the meatball community, and probably not just the greater global wiki-community. Plus, of course, it does not solve the immediate problems. although, my idea is not just to pay them, but to divert their energies to something constructive, perhaps something that will directly benefit them in their local communities. But, that is a big project/undertaking, and beyond the scope of MeatballAntiSpam. So, sorry if I created too much of a drift or digression. I can take that idea elsewhere. I'd rather give some input that you can use here regarding your current ideas of how to deal with Meatball WikiSpam.

I also like a tangential idea of your proposal: send spammers to a TarPit?. The TarPit?, however, will not be indexed by search engines. I fully believe that spammers don't notice or care such things as NoFollow, so this should fool them sufficiently. -- SunirShah

6. Cross-site request forgery

Resolved as the EditHash, but currently the back button is broken.

7. Spam statistics

I'd be interested in seeing how spammers interact with the site. In particular, percentages of spammers that post (HTTP POST) spam directly, not having retrieved (HTTP GET) a page/form. -- JaredWilliams

: Jared, do you want access to the logs? -- SunirShah

If can see when spam is being posted, and track back to see the previous actions, yes, would be interested. Can email me at Jared.Williams at ntlworld dot com . -- JaredWilliams

: Ok, I sent you the login details. -- SunirShah

After having a quick look at it, one thing immediately sticks out, is the number of searches. There are 3,783 searches (covering period of 3rd Oct - 26th Oct) performed (/cgi-bin/mb.pl?action=search&) but only a fraction ~9% actually seem legitimate. The other 91% appears to be spam robots searching with spam. So filtering searches by the same regexp filter as the page submissions go through, might save CPU time, if it's not doing that already. Wondering if the spam robots perform a search to find pages to target. -- JaredWilliams

: Handy info! I smell a potential HoneyPot, actually. Grab all the URLs passed into the search box (though don't auto-ban them), then publish them for people to add to the HoneyPot if they look dodgy. Saving CPU time would be a nice secondary effect, but one search a minute shouldn't be too strenuous even for our rickety server, and blocking blacklisted URLs from searches might prevent people searching for old spam. I'll look into gathering and publishing this information. -- ChrisPurcell

One idea would be to return the HoneyPot in the search results if the search has blacklisted URLs. So if the spam robot is targetting pages that way, it'll ban itself. Also if the reason they're searching is to see if they need to spam (ie don't find any pages with their spam) then that might prevent them too. -- JaredWilliams

: Publishing the HoneyPot via a publicly-known method leaves us open to abuse; this may or may not be a concern in the longer term. I'd assumed the 'bots were simply machine-gun-spamming any form they can get their hands on on the off-chance it's a guestbook or wiki. Can you give an example or six of the kinds of searches that are being entered? Is it simply a URL, or is it something like "Nice site! Will add to my bookmarks! Incidentally, host my spam for me!"? -- ChrisPurcell

The once instance of spam posting I picked out, was preceded by a search using spam, 4 hours previously. Which got me thinking maybe a connection. Need to get the log into SQL so can more easily see patterns.

Seems 12 were for a single plain URL.

Around ~1,900 start with a sentence (like how good/excellent/very good the site is), and then have the spam payload.

3,153, 83% of all searches, have the same URLs duplicated but within different markup. As HTML <a href="...">...</a>, BB Text markup [url=...]...[/url], and some also include it as plaintext. The shotgun approach to getting the spam URLs to render correctly.

Eg: <a href="hxxp://florida.quck.info/">florida</a> news [URL=hxxp://florida.quck.info/]florida[/URL] photo hxxp://florida.quck.info/ florida read hxxp://costa-rica.quck.info/ costarica doc <a href="hxxp://costa-rica.quck.info/">costarica</a> video [URL=hxxp://costa-rica.quck.info/]costarica[/URL] photo

-- JaredWilliams

8. Disabling IP bans

My home IP is now blocked by the MB anti-spam device (I'm unfortunately a -not so- proud AOL user). I will use office connection again a soon a possible. -- JeanChristopheCapelli

: Didn't we disable the IP bans already, Sunir? -- ChrisPurcell

No. I disabled the OpenProxy detector, hence our explosion of spam recently. -- SunirShah

: Do we want to reset the IP bans now? -- ChrisPurcell

Here is a quick chart that I threw together for unique wiki spammer IP addresses per location from 10/27-11/01

If there is an easy way to get the data for a longer time period, I wouldn't mind putting together a chart that covers more than just a few days (like maybe once every quarter-year?), if the information is useful to anyone. I thought it might be useful data for SpammerSurvey, for instance. -- SamRose

9. Basic HumanVerification proposal

This is what I am planning to build next. If the diff demonstrates you are adding an external link, the system will forward your post to a page that will ask you to enter the password. The password will be present on the same page. If you copy and paste the password properly, the post will continue onwards. This will block the robots and non-English speakers. Any objections? -- SunirShah

By the way, you can try it out on http://usemod.com/meatball2/mb.pl?action=edit&id=SandBox

The code may be a bit unstable as I just started refactoring it to use TemplateToolkit? 2. -- SunirShah

: Looks like a good idea. Would blocking non-english readers/writers not limit the french translation effort? Maybe something universal could be used? Consider questions like "3 x 24 = " and "2^8 = " as an alternative to distinguish between machine and human? -- Gideon FormerContributor?

Most pages on Meatball are in english or french. Maybe add a french translation of "Magic secret required" and the following sentence mentioning the secret. -- MarkusLude

: Seems quite acceptable. If it helps reduce spam, that will be great. I don't know if its too much more work, but I would not object to a cookie being added to my system(s) that could be used to make this more of an OnlyOnce? method. -- HansWobbe

The patch is now live. If someone wrote a French translation, I'll put it on. I'll do the cookie thing next. -- SunirShah

: Thank you Sunir and all the other activists. Now there will be more energy free for further evolution of MeatballWiki. Hopefully WikiBlogging is now supported in the end. -- FridemarPache

Looks efficacious. Nice one! -- ChrisPurcell

I'm having trouble navigating this safeguard. (I assume it's this one) do I copy and paste only the italicized word? can I just type it? What is the magic secret? -- CliffSmith

10. Specific ThoughtStorms? anti-bot solution

As a solution to the ThoughtStorms? 'bot, I suggest we insist that all future edits have a non-empty digest field. If the ThoughtStorms? editor is, as it looks, a botnet gone mad with power, that should stop it once and for all, since its author is not likely to change it to cope with MB when it's not even posting its actual payload to MB. We also get the nice side-effect of forcing regular authors to write digests in all cases. (The specific UI I'm thinking of is simply to bounce the user back to the edit page with Please enter a short summary of your changes in the digest field displayed.) -- ChrisPurcell

: I like it. It is still within the SoftSecurity realm. (anyone can still write to wiki, and they can write whatever they want into digest box). Thanks for your work, Chris. Meatball shall overcome! :)-- SamRose

I'm partial on this, so I think we should try it and see how it feels. -- SunirShah

: OK, I've committed the simple change to the live engine. -- ChrisPurcell

Two things to do:

Log the POST traffic to see what fields are coming over the wire. -- done, in ~/db/meatball/posts
Check if they are using OpenProxies.

-- SunirShah

: You've bounded the logfile size this time, right? We're not going to lose the entire MB server again? -- ChrisPurcell

: As a quick update, this patch has been blocking over 75 posts a day. While some of those were doubtless "Preview" clicks, that's still a silly number of spam attacks. Wow. -- ChrisPurcell

Maybe I am missing something here, but looking at RecentChanges, it looks like this has apparently dramatically cut down on WikiSpam. -- SamRose

: Yes, it seems quite efficacious. -- ChrisPurcell

: The botnet's back, bypassing the filter by simply posting the same garbage in both main page and digest. I have two possible solutions: the first is to block any edit where the page matches the digest; the second is to require HumanVerification on any edit which changes the entire page (or, more simply, changes both the first and last word in it). What do people think? -- ChrisPurcell

Hmmm...that was a short lived victory :(.

Block edits where page content equals digest, and if that doesn't work, a last resort HumanVerification may definitely be worth a try. Although, it seems once they figure out block edits where page content equals digest, that they will be able to automate a way around that by having the bot enter randomized content in the digest.

Another idea that popped into my head while reading this is that it might be possible to create some type of bot of our own that reverts the changes of a spam bot like this. SoftSecurity seems to hit a threshold with spam bots. Creating our own bot and getting it to successfully work against only spam bots is of course a lot more complex than Block edits where page content equals digest, or HumanVerification. But, it gets me thinking about how bots can potentially counter other bots, anyway. -- SamRose

The posts log demonstrates there are no URLs in the GuestBook vandal attacks. Another interesting fact is that the edit request comes from a different IP than the POST, which has been discussed before. The UserAgent? is also only present on the POST half of the transaction, and missing on the GET. The UserAgent? is Opera/9.0 (Windows NT 5.1; U; en), which is unique to this attacker. -- SunirShah

I've implemented an EditHash to deal with the RotatingProxy attack. -- SunirShah

11. Reset spam regex filter

The two new anti-spam patches seem to be remarkably efficacious against spam; checking the logs, it seems the regex filter is almost never triggered. Even when it is, the subsequent HumanVerification step could probably have been relied on to catch the spam. I suggest we now reset the filter, since several important sites (e.g. blogspot) are banned by it. (Thanks, Fridemar, for this suggestion.) Do I have consensus? -- ChrisPurcell

: Chris, you make all WebWeavers happy. Thank you so much for that. My thanks will be a new idea against Spam in the sense of friendly SoftSecurity. In order to help, that this page remains refactorable, I factor the suggestion out as TwinWikiForAds. -- FridemarPache

Okay, I've reset the spam filter, and disabled the HoneyPot to prevent it ever being compromised. If we need it, I'll re-enable it. -- ChrisPurcell

12. noindex InterMapTxt

InterMapTxt is, irritatingly, a prime target for spammers because of the large quantity of links it contains: faux-spam attracts real-spam. This means those few human spammers not caught by HumanVerification are breaking the PeerReview mechanism. We could perhaps fix this problem by NOINDEXing the page, preventing spammers picking it up in standard search engines. Anyone think this is likely to work? -- ChrisPurcell

: NOINDEX couldn't hurt, since the InterMapTxt is not really meant to be a page that findability crucial for. It's more of an internal administration function than a public facing page. So, I'd say let's do it. -- SamRose

CategoryMeatballWiki CategorySpam