Abstract
Wiki spam has begun to cripple the use of public wikis. The original WikiWikiWeb has thrown its 'shields up' in part due to spammers. The problem of defending against wiki spam while maintaining the traditional wiki value of openness is a daunting one, yet it goes to the very heart of what the Internet and the world is. MeatballWiki has been leading the discussion of how to defend sensibly against wiki spam. This workshop is intended as both a report on progess as well as a facilitated time to come together as a Greater Wiki Community to devise better solutions for this problem.
Long description
One of the most important values in wiki culture is openness. However, as wikis become more well known in society, they are catching the attention of spammers that wish to exploit this openness. To make this worse, the wiki information structure turns out to be optimal within Google's trademarked PageRank algorithm. Now that dead wikis number in hundreds of thousands, spammers often don't pay attention to whether or not a wiki is active before hitting it. Active wikis therefore are frequently keeping spam at bay, which is a nuisance that puts many off wikis altogether and even challenges our value of openness.
MeatballWiki has been the centre of an ongoing discussion of how to deal with wiki spam [1]. MeatballWiki is also the home of the concept of soft security [2], which is the canonical articulation of how wikis defend themselves with social means. This workshop will begin with a very brief overview of the best theoretical description of wiki spam (i.e. an economic, ecological, crime of opportunity), an outline of known techniques, and a catalogue of implemented or theorized defenses. An open discussion will be then be facilitated to brainstorm either new solutions or new ways to encourage adoption.
Ideal defensive strategies will preserve the openness of wikidom while greatly reducing the time and effort put in defense by members of the wiki. The best strategies will also defend against other nuisance problems such as vandalism, trolls, sociopaths, and edit wars.
Bio
Sunir Shah is the Founder and Editor of MeatballWiki, the pre-eminent centre for community-wide wiki development on the Internet since April 2000. Shah is also the original articulator of the concept of soft security, and has made several important contributions to the defense of wiki culture in the face of new attacks. For the past two years, he has been facilitating the ongoing discussion around wiki spam on MeatballWiki as well as contributing important theoretical concepts and technical solutions, such as the peer-to-peer ban list and citizen arrest.
This text is more up-to-date on the page CategorySpam. It is preserved for historical archival reasons related to WikiSym 2005.
Basic definition. When we speak of spam, we usually refer to one of two types: SemanticSpam that encourages us to buy something; and LinkSpam that takes advantage of Google's PageRank algorithm. As we primarily fight LinkSpam on web-based SocialSoftware like wikis, we will primarily talk about LinkSpam here, although some techniques will apply against SemanticSpam as well. LinkSpam is the most common type of spam since it is more profitable to rise higher in the Google rankings and get thousands of potential readers than it is to get the dozens of readers on a wiki.
Methods. For the most part, LinkSpam is done manually, as labour is cheap in the spammier parts of the world. Some of the most sophisticated of spammers use robots, often custom tailored to their target, but these people are few since the cost/benefit ratio is high. Most spammers will use an OpenProxy or an AnonymousProxy to avoid HardBans of their IP addresses. Some have been known to use ZombieMachine?s, exploiting a security flaw in Windows Remote Desktop.
Dimensions of analysis. Spam and our responses must be analyzed in terms of MotivationEnergyAndCommunity. Spam is primarily motivated by economic factors, whereas community is primarily motivated by less tangible, soft, emotional factors. Solutions pit the energy spammers are willing to expend against the energy produced by community goodwill. Mitigating factors boil down often to TechnologySolutions. In the HardSecurity manner, we can build better shields, better weapons; or in the SoftSecurity manner, we develop better abilities to dodge, absorb, and deflect spam.
Communication. Because spam is not an attempt at communication, attempting to communicate with a spammer using words or ideas will fail. Spammers are merely interested in the act of posting links. Consequently, the only way information will transfer from us to a spammer is through actions. Think of it this way: at the point where a conflict has degraded to a fist fight, words are often useless. You must first create physical distance. The same goes for spammers, except they are not in a temporary foul mood, but in business, which means they will not go away unless the cost to them greatly increases or the benefit disappears.
Essential problem. More traditional methods of increasing their costs, like jail, are impractical short of banning most of the world from using the Internet (which is already happening). This mostly increases costs on their neighbours, who will hopefully take local action to control the problem. However, since China has a Great Firewall strategy, this is doubtful. Internet-centric ways include: downgrading their rankings in the SearchEngines; increasing their labour cost to the point they find cheaper ways of exploiting PageRank; and developing more efficient SearchEngineOptimization? methods that do not depend on harassing others on the Internet. (Search relevance is really Google's problem.)
Because you control the underlying WikiEngine, you can control what content is posted. A theoretically ideal ContentFilter blocks all 'bad' content whilst leaving 'good' content unfettered, but this is impossible as the range of possible content (good or bad) is both infinite and undecidable--you need people to make decisions. Therefore, content filtering becomes a game to identify new 'bad' content as quickly as possible, as well as finding simple patterns that can be scalably exploited to block a wide range of content. In terms of the energy arms race, this is one-for-one in effort with the attacker.
While with cities, you defend borders against neighbours, on the Internet, you defend ports against IP addresses. Defending your server against network attacks seems to be par for the course on the Internet. In terms of the energy arms race, aside from manually blocking IP addresses, this is an advantage in effort over the attacker.
The most SoftSecurity approach is to eliminate any intrinsic interest the spammer has in attacking. The best way is to stop SearchEngines finding or valuing their links.
Often the best solution is to empower the good guys. A strong CommunitySolution is more resilient and adaptive and fair than any algorithm.
Some people, particularly the fine folks at http://www.chongqed.org, would like to take a more proactive stance towards spam. While this strategy may be mildly worrisome for those who remember how spammers stalked, harassed, and threatened the maintainers of the email RealTimeBlackholeList?s in the 1990s, there are things that we can do that do not require putting our necks on the line.
You can also give up the basic wiki principle of open editing by all and concede that some jerks will spoil the fun for everybody. Strategies to adapt exist on a gradient, fortunately, so you can strike a happy medium.
I need help cleaning up, triaging, and organizing the material (aka total mess) on WikiSpam and CategorySpam and http://chongqed.org for presentation in the workshop. My goal is not to talk for 3 hours, but I do want to be able to summarize what's known and not known, and have some sort of plan for how we are going to use the 3 hours. Please please lend a hand. -- SunirShah
Immensely. Thank you so much! I'm sorry I haven't been more interactive. I have one more deliverable to do for OISE...er, and a wiki book review, and then I'm tearing into this! (I suck.) -- SunirShah
I'm 100% aware of chongqed's value. However, half the content on Chongqed refers to what's been written here on MeatballWiki, and I also have to push forward the SharedAntiSpam initiative which is centred here. That's why I'm using both as my reference materials. There's a lot of give and take in the wiki community. This should be celebrated. -- SunirShah
Halz, whenever Manni decides to open the wiki, please inform him that Chongqed:TarPit is incorrect. Typically, the reason why spammers don't have IPs that have reverse lookups is because they are using an OpenProxy. If you ban open proxies, then you block these spammers. Also, their view that Chongqed:ContentBanning has no negatives is unfortunately false. History has shown repeatedly that ContentFilters are almost always abused to censor political enemies. The problem we are having with the SharedAntiSpam initiative is developing a protocol to decide what is trustworthy and what isn't, and to appeal mistakes or malfeasance. There is a reason I put an AuditTrail into the PeerToPeerBanList. -- SunirShah
Typically a workshop means many people contributing. This means at least that participants should talk about how they perceive spam, what measures they take, how well the measures are working and what they would like to see. With ProWiki systems, there is the option that users have to save their preferences (have a cookie, maybe a username) and this keeps off almost all spammers. While this may keep off some unexperienced wiki newcomers too, I don't see this as a problem and won't add further technological measures that potentially add negative effects (like false hits or mb's delays). But this is just my personal preference. If some clever scheme turns up as a standard, I would of course support and implement it. Now, knowing that I'll be at San Diego on the 15th, I'd like to be invited to the workshop. As far as I know one can only participate based on an invitation. -- HelmutLeitner
I am actively fighting Spam and am quite interested in this topic, I wish I could attend to get somemore ideas about Anti Spam Bot strategies and algorithms. Unfortunately, the $450 fee to attend is prohibitive. I can be found on several Wiki's, among them [Chonged.org], (See also [Anti Spam Bots on Chonged]), and [NV|U wiki]. If there is anything I can do to assist, drop a reply [here] as I don't get back here too often.
sessions
Mentioned f2f re: SharedAntiSpam