As network software becomes more complex, it often accumulations multiple network interfaces. For example, one might accept a normal Web interface, a MachineInterface, a MobileInterface?, a CommandLine? interface for administrators, and so on. This means there are multiple entry points to the software, and this makes logging a challenge as there are multiple logs (depending on the number of server daemons like Apache handling incoming requests).
A simple solution to the logging problem is to funnel all entry points into one event dispatcher. That dispatcher can then be used to log accurately what has happened by simply making a sequential record of all events and their parameters.
Basically, all entry points are considered events. An event has a name that identifies its type, and it is described by a set of named parameters (parameter1=value1; parameter2=value2; etc.). The software itself is organized as event handlers. Event handlers' names correspond to the events' names (in some way). An event handler takes (in some way) a bag of the named parameters that describe the event.
So, for instance, take these entry points that are all equivalent:
<?xml version="1.0"?> <methodCall> <methodName>edit</methodName> <struct> <member> <name>id</name> <value><string>PageName</string></value> </member> <member> <name>revision</name> <value><i4>42</i4></value> </member> </struct> </methodCall>
For each entry point type, you would write a very thin parser that would canonicalize them as { action => 'edit', id => 'PageName', revision => 42 }. You would then pass this bag to the event dispatcher.
The event dispatcher finds the event handler for edit and passes the bag onto it.
The log then is a sequential list of all inputs to the dispatcher. Write a simple file with a format like
blank line action edit id PageName revision 42
Writing the blank line first is essential as it leaves room for additional lines to be written during run-time. Consider that much information is not available just from the event itself, for instance if the LDAP authentication failed. You could write this in subsequent lines as something like
ldap_authentication failed
Also, while it is theoretically unnecessary, you could log some state information that would be useful for logging later. Consider the case of real bandwidth. If we charged users for bandwidth, it would be unfair to charge them for the bits of application code sent over. Instead, we would charge them for the page data and attachments they upload and download. Apache doesn't log this information, so instead we have to calculate it from within the application, and we can best do that after we have loaded some data. So you can write this as well into the log
real_bandwidth 42024
Finally, you would want to autoflush the log in case of power failure.
This design has problems with synchronization. A lock on the log file would have to be kept for the entire duration of the script, which could be very long. This can seriously degrade performance for very active wikis. A trade off is to build the log entry in memory and only write it at the end of the script's life. This means that the log entries will no longer be in order, making replay more complex. A simple solution to that problem is to quickly lock the log file before entering the event handler to record the size of the log file, and then release the lock. When you write your log entry, grab the lock again and write the earlier size of the log file as part of the log entry. Since the log file strictly monotonically grows, this would form a totally ordered set.
If you empty the log file, say during nightly backups, you will need to disable operation of the script during this period of time (which should be a few milliseconds). You will also lose logged data from crashed scripts, which will hinder debugging as well as data recovery from power failures.
An ever more robust solution would be to write a log entry in two halves. Before entering the handler, grab the lock, record the log size, write the event parameters + the log file size, release the lock, and close the log file. After existing the handler, grab the lock, write the additional lines + the log file size, release the lock, and close the log file. Collating this in event of failure or analysis would be more difficult, but a totally ordered combined log file can likely be computed with a reasonably small amount of scratch memory.
This kind of design is very useful for doing research, particular because it gives the power of replaying the whole history of the software. If you take a clean installation and just reapply the log, you will get the same state at the end. Consider the utility of this for transaction logging during database failures or system shutdown or hurricanes.
The detail of the log is precisely and exactly what's needed to fully describe the behaviour of the software (in a theoretical sense). If there are new questions to answer, it is only a matter of slicing and dicing the information in the log in new ways because the information is already there. One need not worry that something is
not represented.
Of course, that isn't 100% true. Things like real_bandwidth would be expensive to calculate without making a special log entry (it would be O(N^2)). But expensive doesn't mean impossible.
Of course, another take on this is that you're simply implementing the Durability of database transactions' WikiPedia:ACID properties. Better to use a database than create your own. -- ChrisPurcell
A durable database ensures committed updates are never forgotten, keeping its own log if necessary. All you'd need to do is a bit of begin/end timestamping (you don't even need a single log table) if you want a replay for debugging purposes. Check out the Wikipedia page. -- ChrisPurcell
Fact: synchronization means running more slowly. If you force your read-only hits to the site to update an ordered log, you introduce unnecessary contention and limit how well your engine scales.
Fact: storing full logging info eats up storage space fast. We smashed the usemod.com server like that.
If you're on a poorly-provisioned machine like ours, which doesn't have multiple processors dedicated to the site, the first problem isn't important; but if you're writing a wiki engine that might be used in that situation, baking in bad scalability may be a problem. If SingleDispatch logging is optional, the second problem goes away; if you engineer your system right, both problem go away when logging is off. Great care is needed to ensure this is a debugging tool, not a performance and resources killer. -- ChrisPurcell