Natural portals can be found by finding pages that link to a lot of other pages perhaps by an AccumulatedRandomWalk or a DepthSearch? with depth of maybe 3. The usefulness of a portal decreases if it refers to too many pages, so AllPages does not qualify as a WikiPortal.
Ideally, a list of WikiPortals would be generated that allow to reach the entire Wiki:
Start with a list of AllPages and an empty list of WikiPortals.
Among the pages in the list, find a NaturalPortal as page that allows you to reach the most pages in the Wiki in a few steps, but skip those pages that have IndexNature? (defined as reaching more than 25% of the Wiki, maybe ). Add the NaturalPortal to the list of WikiPortals, and remove all nodes that are reachable by this portal from the lists of pages in a few steps.
Repeat until the number of pages reached by the "NaturalPortal" in a few steps falls below a threshold. Maybe create an artificial portal for the pages remaining in the list.
--PeterSchaefer?
I could change ShortestPathPages to be iterative.
while there are pages left centre = mostCentralPage() add centre to portal list remove all pages within N steps of centre end while emit portal list
where mostCentralPage() is the number one entry on ShortestPathPages. -- SunirShah