RecentChangesCache

MeatballWiki | RecentChanges | Random Page | Indices | Categories

If RecentChanges is generated on the fly, computing RecentChanges is very expensive. However, it is the most hit page a day, which can drag down the server. While at least for UseMod and its inspired descendants, there are many RecentChangesOptions, theoretically most people use roughly the same set of options. At least for RecentChangesJunkies, they will use the same options over and over again. Add to this the RichSiteSummary feed into the mix, which is often polled by web ChangeAggregators once an hour or even once every half hour even if no one is reading it. The RSS feed is even more expensive to compute.

However, the site changes slower than people request RecentChanges, which means the same data is being computed multiple times. Therefore, it seems reasonable to cache this information.

If we could form a canonical stamp for the RecentChangesOptions (which is definitely possible), we can cache the output of the bulk of RecentChanges (except timestamps, UserName specific data, etc.) in a file wiki_root/RecentChanges/canonical_stamp. When we request RecentChanges, if that file exists, we just return that file, perhaps decorated with other user-specific and time-specific information. When we ammend the RecentChanges log, we clear out this directory.

Similarly, wiki_root/RSS/canonical_stamp.

The logic could be slightly more clever by differentiating between MinorEdits and major ones. For instance, the canonical_stamp might have a prefix minor/ for minor edits (vs. nothing for major ones), so that all the minor changes are in a subdirectory. When a minor edit is made, both directories are cleared, but a major edit would only clear the top-level directory. Alternatively, you can simply delete using a more specific pattern for major changes, rm *minor=0*, vs the general pattern for minor ones, rm *.

This is a simple technology to implement and therefore a simple technology to empirically test. It is unknown whether this would actually help. There are about 20 to 50 times as many RecentChanges & RSS loads as changes for most wikis, which may come from a variety of different users using a variety of options. However, we might presume that RecentChangeJunkie?s cluster their requests together, so there should be some degree of correlation. Further, the overhead of doing this isn't very high, so the requests would have to be very heterogeneous to make it be a cost rather than a benefit. Nonetheless, it is important to try this experimentally to verify the assertions here.

Implementations

OddMuse's CacheHTML strategy results in the caching of RecentChanges, as well. It supports plain HTTP/1.1 caching by browsers and proxies. Thus, if a user use requests a certain URL he has seen before, his browser (or proxy) will add headers to tell the wiki to only return data if modified. If nothing was modified, the script will return a NOT MODIFIED response instead of data. Thus, the question is how the browser determines whether a page from the cache is "fresh" or "stale". Usually, browsers just depend on the URL. Therefore, subtle complications are possible:

View RecentChanges with certain URL parameters.
Change your preferences.
View RecentChanges with the same URL parameters.
If your preferences should have changed the output, but the URL remained the same, your browser will serve you the cached copy.

-- AlexSchroeder

CategoryWikiTechnology CategoryRecentChanges

RecentChangesCache

Implementations

Discussion