[Home]WikiMass

MeatballWiki | RecentChanges | Random Page | Indices | Categories

Perhaps large wikis acquire a "mass" which makes it hard for subcommunities to break off. Quantitatively, this may be an instance of PreferentialAttachment.

A very simple model for WikiMass

People may be more likely to stumble upon a page which provokes a comment on a given topic on a large wiki than they would be likely to have that comment provoked while browsing a smaller wiki.

Let's make a quantitative model. Let's start by assuming that each page on the web has an equal probability of being viewed by each user at any given time; i.e. when users go on the web, they randomly, with equal weight, choose a web page to look at out of all available pages.

Some of these pages belong to wikis. Two of these wikis are BigWiki and BaseballWiki. Let's say that each wiki has a topical distribution of it's pages which describes the proportion of its pages which belong to some disjoint set of topics. For instance, say that the available topics are "pattern languages" and "baseball". 95% of BigWiki pages are about pattern languages and 5% are about baseball. 100% of BaseballWiki's pages are about baseball, and 0% are about pattern languages.

Let's assume that users each have some resevoir of "potential comments" in their head, and these "potential comments" are triggered, with some probabilility, when they read related material. So, if someone has a "potential comment" about baseball, maybe there is a 1% chance that they will post it whenever they see a wiki page about baseball. (this is where PreferentialAttachment comes in; now the chance of a new comment being added to a wiki would be proportional to its current size (almost; it would be that way all wikis had the same topical distribution, at least)).

Now, under this model, if BigWiki is more than 19 times (19 = 95%/5%) the size of BaseballWiki, there will be more chance of new baseball comments getting posted to BigWiki than to BaseballWiki.

Hence, it will be hard for the baseball-loving subcommunity to break away from BigWiki. If they can't stay at least 1/19th of the size of BigWiki, they will be marginalized.

Thoughts about a more detailed model

At first glance, it seems that most changes you might want to make to the model clearly increase the tendancy towards WikiMass phenomenon. Most glaringly, all web pages do not have an equal chance of being read; pages are read only if they are linked to a lot. BigWiki will be more linked to than BaseballWiki, hence baseball pages on BigWiki have a greater chance of being read than those on BaseballWiki. (perhaps one could fight this by modulating "reader interest" on BigWiki according to topical distribution, i.e. say that most BigWiki readers will ignore the baseball pages there, but I feel that this would have a small effect which would not really counteract the PowerLaw distribution-sized effect of the popularity of BigWiki).

In addition, authors of new comments might have a greater incentive to post them on BigWiki rather than BaseballWiki, because there will be more chance of the comments getting read there.

Finally, if the sizes of wikis do indeed follow a PowerLaw distribution, then it would be very common for the size of large wikis to far exceed the size of niche wikis.

However, the Achilles heel of the theory may be the topical distribution. One might expect that the topical distribution would itself be a PowerLaw distribution. In this case, the probability of stumbling on a page about baseball on a wiki for pattern languages might be much lower than 1%.

Further research is required to see if the topical distributions on large wikis follows a PowerLaw distribution. Further modeling is required to evaluate the effect of this on a WikiMass model. Unfortunately, I don't have time for either today.

-- BayleShanks

Well, ok, here's one more thought. Let's say BaseballWiki is the 100th biggest wiki, and baseball is the 100th most popular topic on BigWiki. If the shape and size of the PowerLaw distribution is exactly the same in both cases, that means that the proportion of BaseballWiki to BigWiki is the same as the proportion of the popularity of baseball to pattern languages on BigWiki. Hence the fraction of baseball conversation to all conversation on BigWiki is even less than that, hence BaseballWiki wins. (i.e. it's a situation like BigWiki has 95 pages, BaseballWiki has 5 pages. So BigWiki has .05*95 = 4.75 pages on baseball, so baseball comments are more likely to go onto BaseballWiki).

I don't know what the effect of more realistic parameters for the distributions would be. Maybe the distribution of wiki topic would be more steep than that for wiki size, in which case it gets even easier for the niche wiki to win.

However, making the chance of a page being read depend on the popularity of the site its on may make the niche sites more untenable.

An interesting possibility is that there may be some critical rank such that topics which are ranked above that rank on the biggest wikis migrate to the biggest wikis, whereas topics below that rank separate. I.e. maybe the top 100 topics on BigWiki can't break away and the other topics could; a topical "event horizon" for a large wiki. In real world terms, maybe anything having to do with computers would stay on WardsWiki, for example, but baseball would be able to break away.

One would be able to model this as a quasi-gravitational relationship in "similarity space", as the effect of a wiki's "mass" drops off sharply on distant (dissimilar) topics.

Also, it should be noted that the ranking of topics on the biggest wiki need not correspond to the popularity of topics in the population at large. I doubt that the popular topics of WardsWiki are really the most popular topics on the internet (pattern languages and wikis, for example?). This helps the niche wikis farther. Perhaps another model would predict that the topical distribution on a large wiki would converge to the general distribution for the internet, though; this seems to be happening at WardsWiki.

I'd be interested in following up on this, but I don't have the free time to collect the necessary data to support these models. Oh well.

-- BayleShanks


Other "topic" wikis have died off from the gravitational pull back to WikiWiki. However, you can use the weight of WikiWiki to push people back here too by linking here from there. In theory. -- SunirShah

I suspect that having two wikis very close together - so the namespace is effectively communal, but the RecentChanges pages are separate - might help. If all the links still work verbatim, moving the (rampantly) off-topic will become easier. The two sites may never separate their namespaces, with links like spaghetti joining the two sites, but those who love baseball will no longer annoy those who don't. Again, in theory. This would probably work best if keeping off-topic off-site were maintained as a principle from early on, like at MeatBall, since the active community would help those doing the move rather than blindly perpetuate the problems. -- ChrisPurcell


I absolutely don't understand the reasoning of this page which is assuming page count is the only thing that counts. I would argue quite differently based on user count:

Let's assume that Bigwiki has 10000 pages, 5% = 500 pages about baseball. Baseballwiki has less that 1/20, let's say 400 pages.

Now I estimate the active users of a system as "users=1+sqrt(npage-100)". You can use a different formula if you like. This means that Bigwiki has about 100 active users (5% = 5 baseball interested), while Baseballwiki has about 18 users (100% = 18) that are Baseball-interested.

Typically the Baseball fans will know about the baseball wiki, because it is mentioned on about 1/10 of the baseball pages (50 links). If they want to talk about Baseball they are much better off in the Baseball wiki, because they reach a larger group of Baseball fans (18 instead of 5, annoying 0 instead of 95) and Baseballwiki will grow the center of the Baseball wiki world.

-- HelmutLeitner

I agree that that's where baseball fans want to be. But they just might not hear about Baseballwiki, whereas they might stumble onto the baseball pages on Bigwiki.

-- BayleShanks

That's right. My alternative is speculation (too). I think we just don't have enough empiric knowledge to predict how systems or users will behave. Perhaps some day social scientist will study this and tell us.

But my feeling is that there will be a lot of limits in Bigwiki to Baseball content, they surely wouldn't like discussions about yesterdays games, or all of the 1500? rules the games has and all the Jargon (or pages for players). At some point the community will tell its members to restrict Baseball content and then the more specialized wiki will have the advantage.

-- HelmutLeitner

Yeah, I know, I'd love to do a study and collect the data but of course I'm not in a position to choose to do that.

As for limits, yes, if Bigwiki is committed to keeping a certain degree of topical focus; I think WardsWiki started out limiting it to discussing Wiki:PeopleProjectsAndPatterns, but the topics expanded. For instance, despite lots of posts that they'd prefer that "wiki on wiki" discussion went elsewhere, it seems that they discuss wikis a lot there too.

-- BayleShanks

Once upon a time I knew WardsWiki pretty well, but now I only visit it about once a week. I did not see any substantial wiki discussions recently.

-- HelmutLeitner

I never knew it well. But there are 330 pages in Wiki:CategoryWiki, 151 pages in Wiki:CategoryWikiImplementation, 55 pages in Wiki:CategoryWikiForum, and 36 pages in Wiki:CategoryWikiEngineReview. So the information there about wikis and wiki implementations is much more complete than the information here at MeatballWiki, even though the MeatballMission is closer to this topic than Wiki:PeopleProjectsAndPatterns.

-- BayleShanks

And what do you compare? Wiki:CategoryWikiForum (55 pages) is roughly equivalent to CategoryOnlineCommunity (132 pages). Most of the 36 pages in Wiki:CategoryWikiEngineReview have been created by a single person, which did neither complete the work nor found the things that count (its useless). The 500 pages total about wiki in WardsWiki compare to about 2000 pages here. Why do you think that WardsWiki is more complete?

-- HelmutLeitner


CategoryInterCommunity

Discussion

MeatballWiki | RecentChanges | Random Page | Indices | Categories
Edit text of this page | View other revisions
Search: