[Home]InfiniteTypewriterDatabase

MeatballWiki | RecentChanges | Random Page | Indices | Categories

1. Introduction
2. Schema
2.1. Load current revision
2.2. Load old revision
2.3. AllPages
2.4. RecentChanges
2.5. Accessible historical revisions
2.6. Maintenance

CategoryMonkey


1. Introduction

The InfiniteTypewriter database model fits the PageModel as an EventStream?.

2. Schema

The database structure planned so far:

  CREATE TABLE revisions (
    revision  serial   PRIMARY KEY,
    page      text     NOT NULL,
    title     text     NOT NULL,
    digest    text     NOT NULL,
    timestamp datetime NOT NULL,
    author    text     NOT NULL,
    ip        text     NOT NULL,
    host      text     NOT NULL,
    text      text,  -- can be NULL
    textHints text,    
  );

Allowing revisions.text to be NULL allows ForgiveAndForget without losing the RC log. Deleting a page means simply deleting all text associated with its revisions. revisions.textHints allows the FULLTEXT index to coexist with CamelCase.

  CREATE TABLE backlinks (
    source int NOT NULL,
    dest   text NOT NULL,
    INDEX ( source ),
    INDEX ( dest(30) )
  );

This table caches backlinks for each page, allowing rapid execution of certain queries that would otherwise be impossibly slow.

  CREATE TABLE openProxyTests (
    ip        text NOT NULL,
    timestamp timestamp NOT NULL,
    INDEX ( ip(30) )
  );

Used by the OpenProxy detector.

Some sample queries (bearing in mind these must work with MySQL? 4.0, so no subqueries):

2.1. Load current revision

SELECT * FROM revisions WHERE page = ? AND text IS NOT NULL ORDER BY revision DESC LIMIT 1;

This query is optimized by the page-text index.

2.2. Load old revision

SELECT * FROM revisions WHERE revision = ? AND page = ?;

Note that the page = ? here is just a (redundant) verification, as revision IDs are unique across the whole db.

2.3. AllPages

SELECT page, max(revision) FROM revisions WHERE text IS NOT NULL GROUP BY page;

Could this be usefully optimized by adding a tiny index to text to filter out the non-NULL pages faster? Is the cost of maintaining the index paid for by the savings, given that AllPages is very rarely executed?

2.4. RecentChanges

SELECT * FROM revisions WHERE timestamp >= ? ORDER BY timestamp;

This query is optimized by the timestamp index.

2.5. Accessible historical revisions

SELECT revision FROM revisions WHERE page = ? AND text IS NOT NULL ORDER BY revision DESC;

This query is optimized by the page-text index.

2.6. Maintenance

It appears to be possible to handle post-PeerReview maintenance, fast, and without taking a lock or entering a transaction (isolation).


Discussion

MeatballWiki | RecentChanges | Random Page | Indices | Categories
Edit text of this page | View other revisions
Search: