[Home]SingleDispatch

MeatballWiki | RecentChanges | Random Page | Indices | Categories

Challenge

As network software becomes more complex, it often accumulations multiple network interfaces. For example, one might accept a normal Web interface, a MachineInterface, a MobileInterface?, a CommandLine? interface for administrators, and so on. This means there are multiple entry points to the software, and this makes logging a challenge as there are multiple logs (depending on the number of server daemons like Apache handling incoming requests).

Solution

A simple solution to the logging problem is to funnel all entry points into one event dispatcher. That dispatcher can then be used to log accurately what has happened by simply making a sequential record of all events and their parameters.

Benefits

Dispatcher implementation

Basically, all entry points are considered events. An event has a name that identifies its type, and it is described by a set of named parameters (parameter1=value1; parameter2=value2; etc.). The software itself is organized as event handlers. Event handlers' names correspond to the events' names (in some way). An event handler takes (in some way) a bag of the named parameters that describe the event.

So, for instance, take these entry points that are all equivalent:

<?xml version="1.0"?>
<methodCall>
    <methodName>edit</methodName>
    <struct>
        <member>
            <name>id</name>
            <value><string>PageName</string></value>
        </member>
        <member>
            <name>revision</name>
            <value><i4>42</i4></value>
        </member>
    </struct>
</methodCall>

For each entry point type, you would write a very thin parser that would canonicalize them as { action => 'edit', id => 'PageName', revision => 42 }. You would then pass this bag to the event dispatcher.

The event dispatcher finds the event handler for edit and passes the bag onto it.

Logging

The log then is a sequential list of all inputs to the dispatcher. Write a simple file with a format like

 blank line
 action edit
 id PageName
 revision 42

Writing the blank line first is essential as it leaves room for additional lines to be written during run-time. Consider that much information is not available just from the event itself, for instance if the LDAP authentication failed. You could write this in subsequent lines as something like

 ldap_authentication failed

Also, while it is theoretically unnecessary, you could log some state information that would be useful for logging later. Consider the case of real bandwidth. If we charged users for bandwidth, it would be unfair to charge them for the bits of application code sent over. Instead, we would charge them for the page data and attachments they upload and download. Apache doesn't log this information, so instead we have to calculate it from within the application, and we can best do that after we have loaded some data. So you can write this as well into the log

 real_bandwidth 42024

Finally, you would want to autoflush the log in case of power failure.

Optimization

This design has problems with synchronization. A lock on the log file would have to be kept for the entire duration of the script, which could be very long. This can seriously degrade performance for very active wikis. A trade off is to build the log entry in memory and only write it at the end of the script's life. This means that the log entries will no longer be in order, making replay more complex. A simple solution to that problem is to quickly lock the log file before entering the event handler to record the size of the log file, and then release the lock. When you write your log entry, grab the lock again and write the earlier size of the log file as part of the log entry. Since the log file strictly monotonically grows, this would form a totally ordered set.

If you empty the log file, say during nightly backups, you will need to disable operation of the script during this period of time (which should be a few milliseconds). You will also lose logged data from crashed scripts, which will hinder debugging as well as data recovery from power failures.

An ever more robust solution would be to write a log entry in two halves. Before entering the handler, grab the lock, record the log size, write the event parameters + the log file size, release the lock, and close the log file. After existing the handler, grab the lock, write the additional lines + the log file size, release the lock, and close the log file. Collating this in event of failure or analysis would be more difficult, but a totally ordered combined log file can likely be computed with a reasonably small amount of scratch memory.

Utility

This kind of design is very useful for doing research, particular because it gives the power of replaying the whole history of the software. If you take a clean installation and just reapply the log, you will get the same state at the end. Consider the utility of this for transaction logging during database failures or system shutdown or hurricanes.

The detail of the log is precisely and exactly what's needed to fully describe the behaviour of the software (in a theoretical sense). If there are new questions to answer, it is only a matter of slicing and dicing the information in the log in new ways because the information is already there. One need not worry that something is

 not represented.

Of course, that isn't 100% true. Things like real_bandwidth would be expensive to calculate without making a special log entry (it would be O(N^2)). But expensive doesn't mean impossible.

Implementations


Of course, another take on this is that you're simply implementing the Durability of database transactions' WikiPedia:ACID properties. Better to use a database than create your own. -- ChrisPurcell

Even if you had a database that can handle very large blobs of unstructured data and do record-level and not table-level locking, the only part that would change of the discussion above is the part about writing to a file. Instead, you would write to the database. This discussion is mostly about the architecture of the script rather than the log. Perhaps I should move the logging bits out to SingleDispatchLog? ?

However, that being said, it is very unlikely people installing the script will have access to the type of DBMS necessary. I'm not sure if MySQL? has record-level locking yet, but suppose it didn't. Table-level locking would be equivalent to locking the entire logfile, and thus it would not be very efficient. Also, the ability to add any key->value pair is very critical. Finally, the ability to store the very large page content as it is saved is very important to reconstructing the page content.-- SunirShah

A durable database ensures committed updates are never forgotten, keeping its own log if necessary. All you'd need to do is a bit of begin/end timestamping (you don't even need a single log table) if you want a replay for debugging purposes. Check out the Wikipedia page. -- ChrisPurcell

Ah, I see what you mean. That still would not be enough. I'm talking about logging every single access to the script in a very rich way, not simply rebuilding the database, although that is a nice side benefit. Also, the script must be structured in a way to make replay possible. Also, synchronization. Also, simplified code. -- SunirShah

Fact: synchronization means running more slowly. If you force your read-only hits to the site to update an ordered log, you introduce unnecessary contention and limit how well your engine scales.

Fact: storing full logging info eats up storage space fast. We smashed the usemod.com server like that.

If you're on a poorly-provisioned machine like ours, which doesn't have multiple processors dedicated to the site, the first problem isn't important; but if you're writing a wiki engine that might be used in that situation, baking in bad scalability may be a problem. If SingleDispatch logging is optional, the second problem goes away; if you engineer your system right, both problem go away when logging is off. Great care is needed to ensure this is a debugging tool, not a performance and resources killer. -- ChrisPurcell


Discussion

MeatballWiki | RecentChanges | Random Page | Indices | Categories
Edit text of this page | View other revisions
Search: