However, the site changes slower than people request RecentChanges, which means the same data is being computed multiple times. Therefore, it seems reasonable to cache this information.
If we could form a canonical stamp for the RecentChangesOptions (which is definitely possible), we can cache the output of the bulk of RecentChanges (except timestamps, UserName specific data, etc.) in a file wiki_root/RecentChanges/canonical_stamp
. When we request RecentChanges, if that file exists, we just return that file, perhaps decorated with other user-specific and time-specific information. When we ammend the RecentChanges log, we clear out this directory.
Similarly, wiki_root/RSS/canonical_stamp
.
The logic could be slightly more clever by differentiating between MinorEdits and major ones. For instance, the canonical_stamp might have a prefix minor/
for minor edits (vs. nothing for major ones), so that all the minor changes are in a subdirectory. When a minor edit is made, both directories are cleared, but a major edit would only clear the top-level directory. Alternatively, you can simply delete using a more specific pattern for major changes, rm *minor=0*
, vs the general pattern for minor ones, rm *
.
This is a simple technology to implement and therefore a simple technology to empirically test. It is unknown whether this would actually help. There are about 20 to 50 times as many RecentChanges & RSS loads as changes for most wikis, which may come from a variety of different users using a variety of options. However, we might presume that RecentChangeJunkie?s cluster their requests together, so there should be some degree of correlation. Further, the overhead of doing this isn't very high, so the requests would have to be very heterogeneous to make it be a cost rather than a benefit. Nonetheless, it is important to try this experimentally to verify the assertions here.
OddMuse's CacheHTML strategy results in the caching of RecentChanges, as well. It supports plain HTTP/1.1 caching by browsers and proxies. Thus, if a user use requests a certain URL he has seen before, his browser (or proxy) will add headers to tell the wiki to only return data if modified. If nothing was modified, the script will return a NOT MODIFIED response instead of data. Thus, the question is how the browser determines whether a page from the cache is "fresh" or "stale". Usually, browsers just depend on the URL. Therefore, subtle complications are possible: