EditHash

MeatballWiki | RecentChanges | Random Page | Indices | Categories

Motivation

In HTTP, it is permitted to make changes to the server-side model with a single request. For instance, you can POST a form or (in theory) PUT a file to the server, and the server can respond by saving the form (say, as a comment) or file. If a UserAgent? can accomplish this by only sending the POST or PUT request and no others (say, logging in beforehand), this is a one request attack.

In many MachineInterfaces, this could be considered useful. A client service on another website may want to update a central database that is made available through a web service. On the other hand, it allows spammers to easily overwhelm many OnlineCommunities' comment systems and editable wiki pages. A spammer can simply write a bot to ShotgunSpam a target site, making POST after POST. Worse, the bot can easily be made generic. The bot only needs to load the form used to post once, fill all the editable text fields with LinkSpam, and then POST over and over again.

One common but incorrect solution to the problem is to require UserAuthentication?, such as a login. However, a popular variation on this attack scenario is CrossSiteRequestForgery?. In this attack, an already authenticated user of the target site visits the attacker's site. The attacker's site tricks the user submitting a form (e.g. to submit comment on a blog). The form is actually sent directly to the target site, where the user's UserAgent? unwittingly provides the user's authentication along with the form. The LinkSpam payload is carried in hidden fields in the form.

Another common but incorrect solution is to track a user across a transaction by their IP address alone. Unfortunately, there is nothing guaranteeing that a user acting in good faith will have a consistent IP address across the entire session. For instance, as AOL [answers] to "Can I use the IP address of the request to track a member's access to my site?", a single user may appear from multiple proxies even during one session. While AOL has agreed to WikiMedia's [request] to add an X-Forwarded-For header to uniquely identify a given user, this nicety is not required by all ISPs.

Solution

The correct solution is to make changes to the site require more than one stage:

A GET to create a session and acquire a valid transaction ID.
The POST (or PUT) within that session that carries the payload along with the transaction ID

The transaction ID is a cryptographically strong number (i.e. one that cannot be reproduced by an attacker) that the server can use to associate a change request with the session created by the GET request. It can either be a NumberUsedOnce or a SaltedHash?. A SaltedHash? has the benefit that it doesn't require any server-side storage, but it may not be possible if there is no hashable information in the transaction.

For instance, considering editing a wiki page. When you generate the EditPage?, the form should contain a hidden field that contains an EditHash. A good hash would be h( page_id, current_revision, authors_ip_address, secret_salt ), where

h: The cryptographically strong HashFunction? (e.g. SHA1, MD5)

page_id: The name of the page. This prevents an attacker from acquiring an EditHash for one page and reusing it on multiple pages.

revision: The current page revision. This prevents an attacker from reusing an EditHash on the same page over time. Use the newest revision for the page, always. (This will also catch EditConflicts.)

authors_ip_address: The author's current IP address. This prevents a RotatingProxy attack.

secret_salt: Some key phrase known only on the server. This prevents the attacker who knows the EditHash algorithm from generating the hash themselves.

When the edit page is POSTed, the hash will be POSTed along with it. Compare this EditHash against what the EditHash would be for the current version of the page. This will as a side effect detect EditConflicts, which may be very useful. (The standard approach of including the revision separately can distinguish an EditConflict from an EditHash failure, if desired.) More on that later.

Defense benefits

This strategy makes the following attacks more difficult

RotatingProxy attacks, where an attacker uses new IPs for every request in order to defeat HardBans. Blind attacks (i.e. those not specific to your software), will fail.

CrossSiteRequestForgery?, where an attacker attempts to abuse a third party victim's cookies to attack a target site, totally fail.

Further, it creates time in an attacker's work flow to intervene and block or deflect the attack. For instance, in between the GET and POST, you can scan the attacker's machine to see if it is an OpenProxy.

Drawbacks

First, there is nothing preventing an attacker from making a request to GET the hash and then POSTing, so all this does is slow down a direct attack from a spammer. That sounds like nothing, but it's not. Most spammers use OpenProxy networks they don't control, such as TheOnionRouter?. Every request comes from a different IP. They have no choice in the matter. Tying the hash to the IP address will remove a major set of resources from the attackers' hands, and dramatically increase the costs to them.

This solution fails for users whose IP addresses change mid-session. You can do two things to rectify this:

Remove the IP address from the hash. Simply allow the user to change their IP in the middle of a session. In the short term, spam attacks will increase. In the long term, it may not make a difference.

Ask the user to resubmit. While IP addresses may change from ISPs like AOL on occasion, they do not change frequently. If the user accidentally fails the IP verification, just ask them to resubmit the content, pretending it is an EditConflict. RotatingProxy attackers will not be able to do so, so the cost is only an inconvenience to your users.

EditConflict as an error response

Because the EditHash serves as a method to detect EditConflicts, it is easy to always throw an EditConflict error response to the user even when the cause of the failure is a spam attack. This has a secondary benefit that spammers will have a harder time detecting that their spam algorithm has failed. Instead, it will simply look like the site is very busy, and they will hopefully not think to adapt. For added points, ShowSpamSucceeding? by previewing the submitted content on the page as well. In this way, the LinkSpam appears and the robots will not flag an error.

One obvious flaw with this plan is that when there is no conflict, there are no actual conflicting differences. Presumably you can simply show differences from the current revision's text and the submitted text as you normally would, and it will look superficially reasonable, if a bit mysterious what conflict there was. From a HumaneInterface point of view, this will be absurd to a user, but hopefully not too absurd and hopefully very rarely encountered. How often will a legitimate user trip the EditHash when it isn't a legitimate EditConflict?

Another flaw is the cost of preparing an EditConflict response instead of returning a cheap 403 error. Server CPU time and memory are not free, specially if you run on a shared server.

How to avoid breaking the back button

One has to be very careful that the EditHash does not break the back button. A user often hits the back button to edit a page once again after seeing how it appears when saved. Because the EditHash encodes the current revision, and saving increases the revision, going back uses a hash value this is no longer valid. The simple solution is to check a POSTed EditHash against the EditHash of the most current revision where the current POSTer is not the author, and all subsequent revisions (i.e. all those just recently made by the current POSTer).

CategorySpam

BibWiki implementation

The current implementation of the algorithm in BibWiki is as follows. The salt (cf. SaltedHash?) has been changed for obvious reasons. This algorithm depends on the fact that revision numbers per page grow by 1 each time, unlike, for example, the current MeatballWiki script where there is only a global revision number.

sub EditHash
{
    my ($id) = @_;
    my @revisions = Monkey::Page::Revisions($id);
    my $revision = 0;

    # Avoid breaking the back button by lumping together
    # changes by the sasme IP.
    my @differentIP = grep { $_->{ip} ne $revisions[$#revisions]->{ip} } @revisions;
    $revision = $differentIP[$#differentIP]->{revision} + 1 if @differentIP;

    return md5_hex(
          Monkey::Page::CanonicalizeID($id)
        . $revision
        . $ENV{'REMOTE_ADDR'}
        . 'salt'
    );
}

Variation for forking wikis

If a wiki allows old revisions of a page to be edited and forked, you can prevent a spammer reusing a single EditHash many times by adding a random salt to each EditHash.

: h( page_id, current_revision, authors_ip_address, random_salt, secret_salt )

To avoid having to store the random salt generated by each GET, the value can be added, unencrypted, to the form. (There is no security risk in the spammer knowing this random salt.) When a POST is successful, store the random salt along with the page. When a subsequent POST comes in, check its random salt against all forks of the revision being edited, and abort if it matches any of them.

This prevents a spammer creating hundreds of forks of the same revision of a page without at least doing a corresponding GET for each one.

: I would note that it's the combination of blocking rotating proxies and HumanVerification that seems to stop spammers, not the uniqueness of the EditHash. Still, YMMV. -- ChrisPurcell