 PairedPages

MeatballWiki | RecentChanges | Random Page | Indices | Categories

A common pattern of wiki writing is to split pages into different halves (or thirds or fourths) when they get too big. These pages often link to each other to maintain consistency. An interesting thing to determine is the set of all subgraphs that are fully-connected. That is, all dyads that link to each other, all triads that link to each other, etc.

Further, listing the relative sizes of these pages would help refactorers locate pages that were split unnecessarily and thus should be unified.

Finally, it would also be interesting to know if these subgraphs were knots (cf. KnotDetection) or orphans (cf. OrphanedPages). One simply has to measure the number of incoming links and the number of outgoing links across the whole subgraph.

Not to be confused with TwinPages.

Algorithm

To find such subgraphs note that tetrads are made of triads are made of dyads. First find all pairs of pages that link to each other, an O(E) operation. Store these in sorted order, so each pair is alphabetically orderd (A,B), and all pairs are sorted (A,B), (A,C), ..., (B,C), (B,D), ... then, for each (a,b), if (b,c) ^ (a,c), then (a,b,c) forms a triad. For each (a,b,c), if (a,b,d) ^ (c,d), then (a,b,c,d) is a tetrad. If (a,b,c,e) ^ (d,e), then (a,b,c,d,e) is a pentad. Of course, finding all such pairings is a bit more complicated. However, all one really needs then is a function that returns the Hamiltonian distance between two sets of the same rank. If it is exactly two, then they are comparable. So (a,b,c,d) and (a,b,d,e) are comparable, and one need only check (c,e). The algorithm to determine the Hamiltonian distance is ||MERGE-SORT-WITHOUT-DUPLICATES(X,Y)||-||X||+1 where X and Y are two vectors of the same length to be compared.

Discussion

MeatballWiki | RecentChanges | Random Page | Indices | Categories