ContentFilter

MeatballWiki | RecentChanges | Random Page | Indices | Categories

Rather than HardBaning IPs, which is a fruitless endeavour since IPs are not a good measure of PersonalIdentification, as described on NetworkDistance, it's better to ControlYourself. As long as we have ReversibleChange the spammers cannot hurt the content of the site. The only violence they can do is to the motivation of the community of the site. It's tiresome to continuously undo spam changes.

But we only really want to undo changes Wiki:OnceAndOnlyOnce, and preferably do some PreemptiveModeration before spam is HijackingRecentChanges. If we realize that a wiki also includes the script, we control the information that comes onto the wiki by controlling the script. Further, if you perceive of spam as a world-wide ForestFire, the solution seems simple: can the worms.

Therefore, rather than just revert spam, refactor spam into a ContentFilter that can be used to block the spam preemptively. That is, create a list of banned keywords that the script will test all posts against. While it may be too late for your site, it will help other sites. By creating a PeerToPeerBanList, it is possible to lower the cost of maintain spam greatly. Alternatively, use a HoneyPot to detect spammers automatically with zero effort.

But, trying to filter out each and every phrasing is ridiculous. For LinkSpam, this is competing against the relative inexpense of creating new domains. ContentFilters will have to be continuously updated in order to be relevant. While the PeerToPeerBanList will help react faster, it will not discourage spam since it will always be cheaper to simply overwhelm the capacity of the humans to create the ContentFilter. Particularly problematic is that content filters are routinely abused for political or crass commercial reasons. Maintaining the integrity of an automatic self-censorship tool is crucial, yet very expensive. Probably more expensive than reverting the spam itself.

To think of this another way, it is like trying to move a pile of sand with tweezers one grain at a time. By targeting only single sites at a time, it means that the response can only grow at best O(N) with the links being spammed, even though it might reduce the energy to respond per link. The balance of cost still lies in the spammer's favour (cf. MotivationEnergyAndCommunity)

For SemanticSpam, if you go there, this inevitably inconveniences normal discourse as normal discussions routinely come up against banned keywords. An good alternative is to use a BayesianFilter, which is really another type of ContentFilter.

Caveat, it is critical to use very tight filtering. Rather than match sex.com, you need to match \bsex\.com\b lest you match middlesex.com.

CategorySpam

ContentFilter

Discussion