[Home]EditThrottling

MeatballWiki | RecentChanges | Random Page | Indices | Categories

A special type of SurgeProtector. EditThrottling attempts to minimise the damage to a site or community from a flood of posts. Originally designed to stop automated spam 'bots flooding noisy data into the community, EditThrottling can also be used to protect the community from a psychotic (or precocious) human poster, especially trolls (see WhatIsaTroll).

An edit throttle can target individual IPs, cookies, individual pages, PageClusters, or site-wide activity. If just one IP is involved, the simplest approach is to limit the rate of posts that IP can make. If many IPs are involved over the whole site (e.g. a spam 'bot with a rotating IP number), all posts to the whole site will have to be throttled, perhaps with a ShieldsUp strategy. If an individual page is involved, but many IPs are at work (e.g. during a flame-war), the throttle can blow away the entire page (see PeriPeri:StabiliserThrottle).

EditThrottling is there to help a site's community. The casual reader can be protected from spam by a StableCopy/LayeredWikiInterface scheme, but this assumes the community has the endurance to revert spam. By limiting the amount of damage that can be caused, a throttle allows a community to keep pace. A CommunitySolution alone doesn't work against 'bots or psychotics -- they are just too fast for the people in the community to keep up. However, a throttle alone will not suffice without a CommunitySolution.

A wiki with a per-IP edit throttle will demand community efforts to do the mass changes to BackLinks that, on other sites, one bored individual can perform. Changing the name of a page linked from 200 other pages will need to be done with co-ordination; similarly for spontaneously generating a large category of existing pages. Such a throttle will also limit the cleaning up that one individual can do in the wake of a rotating-IP attack. BarnRaising may become much more important. In contrast, a site-wide throttle does not engender such co-operation, and forces mass changes to happen slowly, if at all.

Throttling can force a large community to use tools and techniques they might otherwise ignore. Throttling the number of changes any individual cluster (including RecentChanges) can show will force the use of more clusters. Per-IP throttles demand calculated use of co-operation. Page throttles force mediation, surrender, or a rapid ForestFire, depending on the energy of the conflict. (All of this is hypothesis, to be verified or refuted by experimentation.)

LinkThrottling is a special case to deal with ShotgunSpam.

See: MetaWikiPedia:Edit_throttling; TrollDetectionFormula.

(The TrollDetectionFormula should be refactored into EditThrottling.)

As far as I know, there are only two someones who can explain WikiWiki's edit throttle, and I'm not going to explain how it works and how it fails while we are still using it. -- SunirShah

CategorySoftSecurity

Related

An analogous mechanism has been [proposed by Matthew Williamson] to slow the spread of viruses in computer networks.

Solutions for spam seems to be applicable in protecting wikis from attacks. Thinking about how to protect against large-scale bot attacks, I can think of two other strategies (both of which were proposed for blocking spam):

-- BayleShanks

There were recent reports on SlashDot of using free pr0n sites to compromise Captchas. The image was redirected to an access control page on a free pr0n site, where people happily interpreted the Captcha to gain access to the image library. While it was unclear whether this exploit was ever actually implemented, it certainly sounds plausible.

Note

Note that multiple PCs may look like a single IP (e. g. the PCs used by a school class) and may falsely qualify for throttling. The simplest solution is to add the cookie-ID to the IP and use this combined identifier instead.

However, many spammers - and regular users - will have cookies turned off, and doing so is much easier than rotating IPs.

But this doesn't improve the situation for the spammer. I also doubt that mb regulars turn off their cookies, because this would stop preferences to work.

If the cookie mechanism is turned off, the cookie-ID will be new each time.

No, there will never be a cookie passed back to the server program. -- HelmutLeitner

Local code

MeatballWiki and UseModWiki employ new code that allows the wiki operator to add statements like the following (either in the script or the config file):

&AddRateLimit?(10*60, 10, 'minutes', 200, 50, 80, 20);

The parameters are:

If the rate is exceeded, the page is not saved, and the user is returned to the editing form with an error message at the top. (The message says which limit for which time period was exceeded. It also dumps a set of statistics which may be useful for debugging.)

For MeatballWiki, we are using a conservative set of limits that should not unduly limit anyone, but which would greatly lessen the damage done by spam scripts.

Spam cleanup and regular editing should not be affected unless multiple hosts/IPs are spamming. (In that case the spammer may reach the different pages "all users" limit, preventing any edits to non-spammed pages. In rare cases the total edit limit could be reached. Cleanup should work because it will not be a "different" page.) The code ignores rate limits for editor/admin users, which may be useful for UseModWiki and most other wiki sites.

Discussion

A surge protector may cause a false alarm for volunteer refactorings. Folks who change 20 or 30 pages at a time are always problematic, but day after day they do, and they usually seem to be improvements.

It's not really an improvement since it disrupts the practice of community building and CommunityLore. If someone makes 30 changes a day, then there is no chance in hell the community can PeerReview that much PageChurn. Have you ever worked with a precocious hacker on an XP project? Someone who generates ridiculous amounts of (good) code compared to the rest of the group? As one of those types of people, I can tell you it is counter-productive as it dissolves the team spirit.

But anyway, yes, I am concerned basic housekeeping like the construction of categories would be broken, but then again, it may result in requests for more co-operation as it would take many hands to accomplish something like that (BarnRaising). That would also mean such things would have to be negotiated rather than just done. This violates the 'Zen' of the site, but I have started to believe Zen is for people who really don't like other people. (e.g. Wiki:ZenSlap) -- SunirShah

If a person is incapable of regulating his or her own time on the site, a SurgeProtector will balance them out with the rest of the community. Thus, it will prevent a massive number of edits, thereby allowing each person's contributions to be outnumbered by TheCollective's ability to PeerReview. The consequence would be that everyone else's interaction with the site would also be time-limited, and thus it would be a reduction in everyone's freedoms for the sake of having any freedoms at all. -- SunirShah

Not recommended during SeedPosting phase?

I have a strategy whereby the SurgeProtector would by default be off, but it could activated immediately if needed, say by creating the page ShieldsUp (*). The community could use the veto system to disable it, using DelayAction to wait two weeks of inactivity, say by using PageDeletion to delete the page. So, while an attacker could just enable it in his or her first move, it would be self-defeating. (*) This is akin to ZWiki:ShieldsUp. -- SunirShah


I think it's time we started building an edit throttle for Meatball, since reverting spam and vandalism is boring. My spike is simple:

I chose 1.5 because 2 seconds grows too quickly, I think.

This will interact badly with a spate of category edits, for instance, but that could be a BarnRaisingNomination rather than a flurry by one actor. It will interact badly for people who save and edit the same page over and over again, like Helmut and myself, but there is Preview and I suppose I should get into the habit of using it. While I could fold edits to the same page, I don't want to, because I want EditWars to also be throttled.

An alternate solution is to throw an increasing number of error 404s up as TomCoates? [suggests].

Any comments, or can I just code this? From here, we can build better surge protectors, like my NetworkDistance model perhaps. -- SunirShah

Maybe activate the EditThrottling only for username-less users?

I wouldn't mind to change working habits to less frequent stores (although this may create serious problems with edit collisions). On the other hand, KeptPages would work more storage-efficiently if it archived only when the author/IP changes. -- HelmutLeitner

I like the idea. You can use an exponential backoff to achieve similar effect without needing to keep a list of timestamps. Keep a single timestamp and "elevation" for each IP. For each new page save, set elevation = elevation * e^((time()-timestamp)/1024) + 1, or somesuch, then use your exponential sleep function.

Yeah, KeptPages is still a problem. I keep waffling over whether the KeptPages problem is sufficient warrant to move to OddMuse before having figured out whatever formatting nits we'll temporarily suffer. -- anon.

From what I have gathered so far, the KeptPages problem is a function of a memory shortage. I also noted Helmut's frustration that his work in editting a page was lost because of this. Can someone confirm this and provide a few specifics about the type of memory? Is this a relatively minor expense? If so, I'm prepared to donate enough to at least let us focus on the formatting "nits", since those are a larger problem for me. -- HansWobbe

re: UserName-less. That would work for the short term, but not for the long term, and it would still let EditWars happen. I'd rather use a cookie-based model for well-known users. But we could do that for the short term at any rate. It is a spike, after all.

re: KeptPages. Folding stores is not necessarily good. Consider the practice of keeping an OnlineDiary. Scott, we could just move to meatballsociety.org. Breaking the links is only a matter of leaving a redirect script, and the lowered PageRank might be a good thing for a while. -- SunirShah

The scheme I now prefer to use with ProWiki is: create a revision if (1) the author changes or (2) the last revision of the same author is more than 3 hours back or (3) the change creates a size difference of more than 1KB. This filters small changes of the same author, makes backups regularly and avoids to loose more than a larger paragraph accidentally. -- HelmutLeitner

[Aside] Hans, it has to do with stringent server limitations the ISP uses to protect itself against misbehaved CGI scripts. Sunir, I just tested meatballsociety and 256MB is ok but it craps out at 512MB. Oh well. ;) Maybe tomorrow I'll have the time to move us over (sticking with UseMod for the time being). Speak now or forever hold your peace. -- anon.

November 9, 2004. I've installed a tentative edit throttle that uses the algorithm "sleep for floor(e^(number of edits in the past hour / 4)) seconds". After having the script enforce a >30 second delay, you will be automatically banned. This will probably not work well, so mailto:sunir@sunir.org if you want to be unbanned. -- SunirShah

Next steps:


To crack down on spam, I should really be throttling external URLs. And not just volume / second, but really volume / IP (over a long period of time). The edit throttle as it stands only means that spammers will have to implement nice into their spambots, so that instead of shotgunning spam onto a site (and thus being visible), they sprinkle "rain" over a wide number of wikis, hoping that some will be lost in the noise of RecentChanges. If we can make external linking more obvious, it will be easier to revert them. Plus, devices like DayPop can be used, but in reverse. While DayPop tries to keep spam links out of their listings, we can use them conversely to detect spam. MetaWiki could be employed to this end. -- SunirShah

Maybe that the kind of changes made could be taken in to account. Dropping a whole lot of links but no english text on a page by someone from an ip adress which hasn't been active otherwise is a dead give away. Maybe someone could create a filter which would allow a limited number of links in a good link/text ratio per edit? Just my two cents. -- AnonymousContributor?


Cookies could be used to allow regulars to moderate during a heavily-throttled spam attack, which would otherwise (assuming a worst-case attack) block them out. Registration grants a unique cookie; anyone can post cookie-less, but will be aggregated for the purposes of throttling. Exponential-cost throttling then blocks 'bots from: posting anonymously; posting under a given cookie; registering for a new cookie. Rotating-IP attacks cannot bypass this. The throttle for established users could be made generous on the assumption that a hacker cannot hijack their accounts. -- ChrisPurcell

Indeed; this is what I am planning to write this Christmas. PasswordlessLogin will be the way to validate the cookie (hence no 'logins'). -- SunirShah

I have been playing around a bit with Certifcates of late as part of an ongoing interest in Trust and Authentication. Last week, I was very pleasantly surprised to find that it was almost a trivial exercise to be granted a Certificate by Datafix, which is itself recognized by Verisign. I mention this only because it was so very easy, I am wondering if there are reasons that this common mechanism is not one we are considering. If the cookies alternative is the one that emerges, I would like to ask if it is possible that these could be easily 'redirected' somehow. This thought comes up as a result of Helmut's post earlier today about how easy it is to access sites from various other locations. I travel enough that this is always a consideration for me, with the result that I now take a USB drive with me at all times. I have found this the easiest way to personalize any computer that I happen to have access to while I am 'out and about'. If cookies or certifictes could be read from such a device, it would effectively become a physical access 'token' that I could use in conjunction with a logical password, to have 'recognized' access from where-ever I might happen to be. -- HansWobbe


Discussion

MeatballWiki | RecentChanges | Random Page | Indices | Categories
Edit text of this page | View other revisions
Search: