PagesLikeThis

MeatballWiki | RecentChanges | Random Page | Indices | Categories

Backgound

In AnnoWiki I implemented a function (accessed by the name LikePages) but instead of comparing the names of the pages (i.e. a NameSpace neighbor), I look for pages that link to the same ones that the current one does (i.e. ontologic cousins). The ones that link to the most like this one are likely to be most like the current.

Purpose

to suggest to the reader other pages related to the current page, but not directly linked to by the current page
for wikis with participants who usually sign contributions, to list authors with similiar interests

Perhaps both of these benefit the "new reader" of the wiki during a discovery period. I can't think of many reasons established readers and authors would use it though.

Current Implementation and Variations

Currently, this function is done by comparing forward links from pages, and declaring like pages to be the ones with the highest number of common links (PagesLikeThis).

The second variation: a list of pages that have the current one as a high like page(PagesThisIsLike?).

The third variation: the list of pages with the highest number of common back links(PagesMentionedWithThisOne?).

Reality Check/Discussion

Now, I don't have much data to use...I don't know how well this works in practice. Has anybody else tried this? Is this worth pursuing? --DaveParker

: Sounds interesting. Perhaps you should put up a page for it - though you'll need a new name, or we'll get confused. -- ChrisPurcell

If you need some raw link data, you can extract it from UseMod wikis such as Meatball in HTML form [1]. Perhaps simpler to parse is the plain text format available from OddMuse wikis such as EmacsWiki [2]. -- AlexSchroeder

: Thanks for the input (gulp). Yes, the HTML would take some parsing to glean out the links; I couldn't get the raw=1 part to work for emacswiki. I'll see what I can do with the HTML version. --DaveParker

Prelim results with Meatball link data:

To get these resuls, I had to remove InterMapRejections, as it links to so may things.

AlexSchroeder is like all other pages that link to CategoryHomePage
ChrisPurcell is like BayleShanks, DigestedChanges, MeatballMissionDiscussion, and StephenGilbert (6 common links)
CopyLeft is like MartinHarper (8 common links)
DigestedChanges is like SunirsMessageBox?, Meatball Mission Discussion (21 matches)
FeatureKarma is like LinkingBetweenNamespaces, MeatballWikiSuggestions, OnWikisAndSecurity (6 common links)
HardSecurity is like UniversityWiki (10 common links)
HiddenPages is like DigestedChanges (11 common links)
IndexingScheme is like MeatballWikiSuggestions (11 common links)
MeatballMissionDiscussion is like OpenMeatballWiki (29 common links)
SpaghettiWiki is also like MeatballMissionDiscussion (23 common links)
SoftSecurity is like UniversityWiki (25 common links)
SunirsDiaryOne? is like SpaghettiWiki (20 links)
SunirsMessageBox? is like DigestedChanges (21 common links)
SunirShah is like MeatballMissionDiscussion (17 common links)
UseRealNamesDiscussion is like UseRealNamesRefactored (43 common links)
- TarQuin is like both of those (25 and 23 common links)
TechnologySolution is like MeatballMissionDiscussion (9 common links)
UniversityWiki is like SoftSecurity (25 common links)
WikiPedia is like EnglishWikipedia and BiggestWiki (9 common links)

You may want to elide CategoryHomePage from the discussion, although I always suspected that Chris was Bayle's SockPuppet. The bastard(s). ;) -- SunirShah

: I removed it, but it didn't seem to make much of a difference.

I can put this online some of this weekend (Feb 21/22) is there is interest. --DaveParker

Interested. :) I'm curious whether "pages like this" or the inverse "pages this is like" is the more useful metric. -- anon.

Yes, you could also try using back links instead of forward links (or vice versa). -- SunirShah

: Is that what you meant? or the slightly different "the list of pages where the target is not unlike the candidate"? Currently, the "ranking" of the number of common links. This could be changed to be a percentage of all links from the page...so the presence of of not-common links gets you a lower ranking.(?) --dp

No, I meant something different from Sunir. I can't easily parse your "not unlike" proposal; it sounds like it reduces to your original proposal. Here's what we've got so far:

{x | similar_fwdlinks(this_page, x) }
{x | similar_fwdlinks(x, this_page) }
{x | similar_bwdlinks(this_page, x) }

#1 is your original; #2 is mine (and is identical with #1 if your relation is symmetric, though it doesn't sound like it is); #3 is Sunir's. "not unlike" reduces to either #1 or #2, depending on how you read it. -- anon.

#2 sample output

The output of "pages this is like" would be something like:

For this page, the PagesThisIsLike? results:

highest ranking
- pagex (1st out of 20 pages related to pagex)
- pagey (1st out of 5 pages related to pagey)

third ranking
- pagea (3rd out of 4 pages related to pagea)
- pageb (3rd out of 100 pages related to pageb)