Further, listing the relative sizes of these pages would help refactorers locate pages that were split unnecessarily and thus should be unified.
Finally, it would also be interesting to know if these subgraphs were knots (cf. KnotDetection) or orphans (cf. OrphanedPages). One simply has to measure the number of incoming links and the number of outgoing links across the whole subgraph.
Not to be confused with TwinPages.
To find such subgraphs note that tetrads are made of triads are made of dyads. First find all pairs of pages that link to each other, an O(E) operation. Store these in sorted order, so each pair is alphabetically orderd (A,B), and all pairs are sorted (A,B), (A,C), ..., (B,C), (B,D), ... then, for each (a,b), if (b,c) ^ (a,c), then (a,b,c) forms a triad. For each (a,b,c), if (a,b,d) ^ (c,d), then (a,b,c,d) is a tetrad. If (a,b,c,e) ^ (d,e), then (a,b,c,d,e) is a pentad. Of course, finding all such pairings is a bit more complicated. However, all one really needs then is a function that returns the Hamiltonian distance between two sets of the same rank. If it is exactly two, then they are comparable. So (a,b,c,d) and (a,b,d,e) are comparable, and one need only check (c,e). The algorithm to determine the Hamiltonian distance is ||MERGE-SORT-WITHOUT-DUPLICATES(X,Y)||-||X||+1 where X and Y are two vectors of the same length to be compared.
CategoryIndexingScheme CategoryGraphTheory