BrokenLink

MeatballWiki | RecentChanges | Random Page | Indices | Categories

A hypermedium is a graph in some way. Some implementations are directed graphs (like the WorldWideWeb), some are undirected (like WikiWikiWeb - it has BackLinks). With directed graphs, there is the possibility that node A can get to node B, but node B cannot get to node A. For example, in the very simple graph [A]->[B].

When the address of B no longer exists (say B has been deleted or moved), the link from A becomes a "dangling" - or "broken" - link. It points to nothing. Because B is unaware of the link to it from A, it cannot inform A that the link is becoming invalidated.

This is commonly seen on the WorldWideWeb as [Error 404].

This problem can also occur with undirected graphs (bidirectional links) if there is no mechanism to update adjacent nodes of changes. However, usually the method to maintain the link structure is equivalent to the method to ensure link bidirectionality, so this isn't a problem.

So, it would seem the solution is to enforce bidirectional linking. However, this is infeasible in a distributed, open system. To force foreign nodes to add information is naturally impossible because they are beyond your control. Certainly, you only care about friendly nodes anyway (that is, malicious foreign nodes are only hurting themselves), but if the network fails while you are sending update notifications, the link structure will go out of phase.

Instead, you have to maintain your links yourself by continuously testing them. In order to save the hassle of manually testing links, a popular solution is to run a script that periodically checks links. However, this is sensitive to server failures. What's the difference to a script between a server temporarily down and erased content? True, a more complex algorithm could be employed, like a grace period before scavenging a link, but all in all it is still sensitive to luck. Still, running LinkBot? on the site and posting a public listing of broken links would definitely be useful, if server intensive.

Another possibility is to replace broken links with cached versions from an Internet archive like the Google cache. (e.g. Cache:http://www.usemod.com/cgi-bin/mb.pl?BrokenLink). If all clicks are proxied through the server, this has the advantage of being dynamic and adaptive, as well as transparent to the user. Fixed links (say due to a server outage) will automatically become live again as well, unlike the other solutions. Unfortunately, it's not clear whether copyright makes caches illegal.

So, from an information theoretic standpoint, it is impossible to prevent broken links. However, we just want to make it improbable, rare or even auto-correcting. In the latter case, using a cache may be useful when we don't control the links, but then the law becomes a problem.

BrokenLink

Discussion