As a result, the readability of the spam is not important, nor is it really important to camouflage the spam as real content. The goal is not to hit a particular wiki where TheAudience is still active, but conversely to find GhostTowns where no one will bother to clean up the spam.
Thus, the basic strategy is to search Google for 'Edit text of this page' (cf. EditMask) and then paste in as many ExternalLinks as possible, and then do this to as many pages as possible (like a ForestFire). One can easily write a robot to do this, although most spammers still do this manually.
However, this form of spam is very noticeable, akin to the loud burst of a shotgun and the wide visible area of damage. As a problem type, this is very good news since it should be easy to create automated defenses against this type of spam. Simply detect that a large number of ExternalLinks have been added. Even easier, spammers will hit multiple pages in a row with the same ExternalLink in a short duration, making this behaviour strongly detectable. Simply keep track of which links have been posted by whom and to what pages, and see if they have pasted the same link to multiple pages. Alternatively, LinkThrottling can restrain someone from posting many many links to the wiki without making it impossible for the good people.
ShotgunSpam can benefit from PeerReview through RecentLinks.
Contrast SemanticSpam which is more insidious.
The current filter on MeatballWiki to deal with ShotgunSpam is the following:
# Poor man's spam filter from MeatBall:ShotgunSpam my %http; $http{$1}++ while( $old =~ /(http:\S+)/sg ); $http{$1}-- while( $string =~ /(http:\S+)/sg ); my $diff = 0; for (keys %http) { $diff++ if $http{$_} < 0; } if( $diff > 5 ) { &AppendStringToFile( "$DataDir/spamlist", "$ENV{'REMOTE_ADDR'}\n" ); &ReleaseLock(); &ReBrowsePage($id, "", 1); return; }
It prevents three URLs from being added or removed at once. Originally it just did a simple count to see if three more URLs were added, but that allowed spammers to replace a page with spam but did not allow for anyone to revert it. This technique still allows spammers to drop up to three links at a time on a page, and it prevents people from pasting in entire essays or papers that may have a lot of references. I don't like the way it masks the fact it's rejected your save either, but UseModWiki is getting harder to hack over time. It also doesn't deal with wide area spam attacks (a more effective technique than blasting only one page). So far it blocks about two to three spam attacks a day. -- SunirShah
Judging from the logs, the filter is working exceptionally effectively against SpamBot?s. It has blocked around 1800 spam edits by around 130 unique IPs on usemod.com since I installed it a couple months ago. -- SunirShah
Are you aware of RichardP? and his invaluable WikiMinion?? (http://www.nooranch.com/synaesmedia/wiki/wiki.cgi?WikiMinion) -- PhilJones
Yes. My goal is to integrate such AntiSpamBot?s with the PeerToPeerBanList in order to make the cost of maintenance O(lg(N)) (?) rather than O(N). But ultimately, such solutions are like trying to move a pile of sand one grain at a time. What's needed is a solution that changes the force dynamic to make spam no longer economically valuable (or possible). Of course, as long as there are abandoned wikis that are being heavily spammed, this will not stop. -- SunirShah
At which line, or roughly where is the code inserted? DanKoehl
In sub DoPost after this blob:
# Consider extracting lock section into sub, and eval-wrap it? # (A few called routines can die, leaving locks.) &OpenPage($id); &OpenDefaultText(); $old = $Text{'text'}; $oldrev = $Section{'revision'}; $pgtime = $Section{'ts'}; # Poor man's spam filter ...
Thanks. DanKoehl
I am starting to hate this patch. It's blocked Meatball users around 10 times, in return for around 50 other unique attempts. With a roughly 15% false positive rate, it's not so hot. On the other hand, it's prevented about 200 pages worth of spam. -- SunirShah
If spammers do a search in Google for 'Edit text of this page', that gives me the idea to change 'Edit text of this page' to something personalized to avoid the spammers finding me. Good idea? --mutante
There appears to be some software making the rounds that looks for unused links (WikiNames that don't point to pages) and sticks the same text (often containing just one link) into them 10 or 20 times with successive edit-and-saves. I have now been hit from Poland, Russia, Roumania, Mexico... If they are trying to overload the history, this doesn't make much sense as there was no history to begin with. Also I now realize that the Poor Man's Spam Filter above may not help as there is often only 1 link per page. However, they are hitting the same pages a good few times in the space of a few minutes, so a rate test might help... Comments? --pm
I'm surprised it took only a few weeks before we experienced spammers willing to circumvent the ShotgunSpam filter. Clearly if concentrations of links on one change per page are denied, the solution is a lot of small changes. I thought I had a few months. Best medium-term solution is something like CitizenArrest, but that feels awfully like giving every citizen a handgun, a solution that does not work effectively on a large scale. Another alternative is to count the number of URL changes a person makes over a period of time. If they exceed a limit, rollback all their changes. This would prevent people like myself from actually writing long essays on the wiki though. And, of course, any form of authorization through authentication is vulnerable to IdentityTheft (LoginsAreEvil), so that won't help in the long term, but it may in the short term. -- SunirShah
My Perl is pretty primitive - could some kind soul combine Sunir's suggestion (counting the number of URL changes a person makes over a period of time) with testing for the presence of URLs. My wiki got hit today with two lines of text containing 2 URLs affecting one unused WikiPage, saved 78 times in the space of about 2 minutes. Even Sunir at his most prolific wouldn't show that pattern! By the way, when I revert (until this patch becomes available), is it better to delete the offending page, or change it to something innocuous with a summary saying something like "reverting spam"? --pm
The vandalism that deleted EarleMartin's homepage pointed out a weakness with the filter: restoring a previous version of a page that happens to contain a lot of links is painfully slow with the filter in place. One obvious fix is to change the filter to compare the body text to a previous version, and not engage the filter if the text matches. One weakness to this fix is that it could allow spammers to simply restore their spam with impunity; on the other hand, if the shotgun filter doesn't allow the link-heavy spam in the first place, there's nothing to revert. --ChuckAdams
The fact that spammers can get around it in its current form is not a reason to not fix the filter -- in fact, if it's useless against spammers, then it's only useful for vandals. StableCopy is a big redesign of the dynamics of wiki (and it's patent hubris to call it The Answer to spam/vandalism), whereas I am making a suggestion for a tweak. --ChuckAdams
I have considered hacking usemod to backend it with a SQL db such as SQLite, but every time I go into the wiki codebase, I'm overcome by the urge to rewrite it. I went down that winding road for a couple months last time before I came to my senses. --ChuckAdams
For sqlite, perhaps not, since it has fairly poor locking for the usage patterns of a wiki. With a real database like PostgreSQL?, hell yes. There is all kinds of code dealing with locks and consistency that any decent DBMS will do for you. PageIndex? becomes a trivial operation, and with an index on last-edited, RecentChanges as well -- currently these scan through every single file on the wiki, which is just insane. This is a solved problem though, and solving it again for an existing codebase just isn't all that challenging or interesting, as anyone wanting a DB backend has already chosen a wiki engine that provides one. --ChuckAdams