Introduction ==
The InfiniteTypewriter database model fits the PageModel as an EventStream.
Schema
The database structure planned so far:
CREATE TABLE revisions (
revision serial PRIMARY KEY,
page text NOT NULL,
title text NOT NULL,
digest text NOT NULL,
timestamp datetime NOT NULL,
author text NOT NULL,
ip text NOT NULL,
host text NOT NULL,
text text, -- can be NULL
textHints text,
);
Allowing <tt>revisions.text</tt> to be NULL allows ForgiveAndForget without losing the RC log. Deleting a page means simply deleting all text associated with its revisions. <tt>revisions.textHints</tt> allows the FULLTEXT index to coexist with CamelCase.
CREATE TABLE backlinks (
source int NOT NULL,
dest text NOT NULL,
INDEX ( source ),
INDEX ( dest(30) )
);
This table caches backlinks for each page, allowing rapid execution of certain queries that would otherwise be impossibly slow.
CREATE TABLE openProxyTests (
ip text NOT NULL,
timestamp timestamp NOT NULL,
INDEX ( ip(30) )
);
Used by the OpenProxy detector.
Some sample queries (bearing in mind these must work with MySQL 4.0, so no subqueries):
Load current revision
SELECT * FROM revisions WHERE page = ? AND text IS NOT NULL ORDER BY revision DESC LIMIT 1;
This query is optimized by the page-text index.
Load old revision
SELECT * FROM revisions WHERE revision = ? AND page = ?;
Note that the <tt>page = ?</tt> here is just a (redundant) verification, as revision IDs are unique across the whole db.
AllPages
SELECT page, max(revision) FROM revisions WHERE text IS NOT NULL GROUP BY page;
Could this be usefully optimized by adding a tiny index to text to filter out the non-NULL pages faster? Is the cost of maintaining the index paid for by the savings, given that AllPages is very rarely executed?
RecentChanges
SELECT * FROM revisions WHERE timestamp >= ? ORDER BY timestamp;
This query is optimized by the timestamp index.
Accessible historical revisions
SELECT revision FROM revisions WHERE page = ? AND text IS NOT NULL ORDER BY revision DESC;
This query is optimized by the page-text index.
Maintenance
It appears to be possible to handle post-PeerReview maintenance, fast, and without taking a lock or entering a transaction (isolation).
- Deleted pages are special backlinks of DeletedPage. Backlinks can be determined fast.
- Deleted pages consist of revisions older than the PeerReview period. Thus, deleting only such revisions obviates the need for isolation.
- Replacement pages are special backlinks of ReplaceFile. Backlinks can be determined fast.
- Replacement files require the whole PeerReview period to change content. Thus, replacement can be done without isolation provided the operation does not take two weeks.
- Old revisions targeted for expiration can be determined with a single query.
- The "to be expired" property is monotonic; once an old revision is found, it can be safely expired at any point in the future.