FairnessOfKuroshinCommentRating

MeatballWiki | RecentChanges | Random Page | Indices | Categories

Moved from KuroshinRatingIssues...

Measuring "value."

Ultimately, in my opinion, truth, goodness, accuracy, interest, humor, etc., are emergent within a metric called "value". -- KarstenSelf

The goal in traditional moderating systems is to somehow measure the value of material based on its average critical value to the populace as a whole. However, this is infeasible. First, most moderation schemes including Kuro5hin's resolve the entire feel of a comment to a scalar. However, there are more than one dimension to a comments value. e.g. spelling, logic, readability, viability (do I agree with the opinion?), topicness, etc.

While it is true that people can loosely scalarize their "feel" for a post, that ability gets worn out with practice. The more you think about, the harder it becomes. Try moderating 100 posts in a hour. -- SunirShah

In aggregate this works out -- it's the law of large numbers. Though typically on Scoop, "large" is on the order of 3-10 moderations. I've seen very heavily moderated comments with 40 or more moderations. But aggregating, now across comments, you're still talking about large-sample statistics, in which case the system works. Remember that it's a loose fit, not a perfect one. -- KarstenSelf 8 April 2001

If some people rate according to agreement/disagreement, and some rate according to "literary quality", and some others according to helpfulness or informativeness, then, if they all rate one comment, you end up with a pretty good measure of the intersection of those things. I think it's healthy to have people rating on different qualities, and ought to get more accurate ratings overall. -- RustyFoster

K5 moderation is an amalgam of many things. Both personal opinion, and personal response to differing opinion, are going to be part of it. I will moderate down posts I feel are just plain dumb, while I'll give credit to a thoughtful response to views similar to mine. Depends on context, mood, etc. -- KarstenSelf

[ed: Relatedly, the follow reactions to short posts were made independently of the above and each other... --ss]

: I think things may have been edging rather in the "quantity vs. quality" direction lately. The assumption seems to be that short == uninteresting -- RustyFoster

: One-liners. If they're really good, and I get the context, I may vote them up. Others often disagree. Too many, though, just get tedious. In a Wiki, short contributions are very likely to be refactored either beyond recognition or out of existence. This isn't terribly different from moderation in ultimate effect. -- KarstenSelf

I think a better approach would not be some average of people's opinions but the ability to select whose opinion you trust. See WebLogDigests for more. -- SunirShah

Very much so. Automating this _is_ the core idea behind CollaborativeFiltering -- the idea is to be able to identify other raters whose preferences tend to corrolate with your own. A Scoop site should be a very rich database for providing this information. The existing moderation system is a rough cut, and essentially makes the naive assumption that everyone is interested in the same thing. As a first approximation, this is true, but it's ultimately only a very rough approximation. -- KarstenSelf 8 April 2001

Comment sorting and filtering.

A WebLog needs some means for identifying and cultivating useful content. For a small enough board, eyeball evaluation is sufficient -- the volume isn't so large that a user can't identify useful content herself. With size, other methods are required. If what is being measured is some likelihood or probability of interest, then a scalar value is precisely what you're looking for: any one user is going to have some ideal likelihood of interest in reading a particular post. The question being answered is: on a scale of 0 to 5, how likely am I to be interested in reading this post?

I think moderation is better than no moderation -- and if it isn't the reader is welcome to ignore moderation, excepting hidden comments. Also, Scoop's moderation system has sseveral key advantages over the method implemented at Slashdot. Scoop's moderation score is one of several metrics which may provide a proxy for the ideal interest value a reader may have for a particular post. It's a general measure. As currently implemented, it's a global measure -- all readers see the same score on a comment. However both these characteristics could (theoretically) be modified. -- KarstenSelf

: See WebLogDigests.

As it stands right now, comment ratings are deeply inadequate for the task of fine-grained filtering and sorting, both on the rater side, and on the reader side. More needs to be done before this problem can be considered "solved." -- RustyFoster

The issue of "fairness" in ratings is such a sticky one because it's such a personal assessment. I don't think it's even possible to implement (in code) a procedural system to force "fairness" in order to completely blot out bias within the rating system, otherwise voting fraud, totalitarian rule, world hunger and student hazing would all be problems of the past. One obviously can't create an algorithm which satisfies every measure of "fairness" among all community members.

However, one reason that ratings lack a consensus measure of fairness is that the FAQ doesn't specify anything beyond, "Please note that 1 is not 'bad.' It just 'not as good as this other comment, which I'd rate five.' All comments are assumed to contribute in some way." which forces each user to form their own subjective opinion on what constitutes an appropriate rating scheme. Since Rusty purposely doesn't tell the readership that they should be "agreement with poster", "contribution to site" or "proper written English" we get an amalgam of different rating styles.

He says this is good because if enough people rate a comment using their various criteria, we'll get a rational rating from the average. Unfortunately, this doesn't deal with the common situation of only a single rating being attached to a comment. In that case there isn't enough of a sample to form any opinion (or rating) whatsoever.

: With moderation you could argue there are three cases: none, one, and many. No moderations means there's no consensus yet. With one (or a low number) of moderations, the final value is still relatively ambiguous. However, as more moderations are added, the value tends to converge, and its outcome is more certain. So, with many moderations, the concensus value is pretty well settled on. In this sense, K5/Scoop moderation works quite well. --KarstenSelf

One could reasonably argue that given Rusty's lack of direction to the K5 community, and his rationale for doing so, that it makes sense to at least hold back on showing ratings until after each comment reaches a threshold number of ratings. Presumably the nextgen rating system will deal with this by showing either who rated what, or at least how many ratings a comment has received so readers can judge this for themselves. --MaynardGelinas

I think the only way to embed "fairness" in the code is to make WebLogDigests, where you can choose your "AffinityGroup" and give preference to their assessment of things. Making it "fair" means making it more personal. --RustyFoster

Typically, an utterly incorrect statement will generate a counterclaim, often with supporting evidence (usually as Web links). As new readers (or those who've already moderated -- you can change your vote) mull the evidence, the truth is taken into account. Ultimately, it would be nice to have recognized experts in a field, but then you'd need to categorize comments by field of appropriateness. Weighing unfavorably on the accuracy v. complexity scale. -- KarstenSelf

This breaks down, though, when you have high churn of content. To reach a fair and accurate rating, comments must be read and re-evaluated over time. This doesn't happen enough on a reasonably active Scoop-type system, which tends to diminish the accuracy and fairness of ratings. --RustyFoster

Reflection of Consensus.

In one sense, rating comments simply provides an ordering mechanism which presents the community consensus about which comments reader should read first. Imagine an ideal "casual" reader, stopping by the site for the first time. We want that reader to have a good experience reading K5, and hopefully join and contribute themself. So we rate comments in order to highlight the "best" of the discussion for that reader. That is not to say that the dissenting view should be suppressed, especially a good expression of it, merely that one facet of rating can and, I think should, highlight the view of the "community as a whole".

Expressing the consensus opinion clearly and coherently is not, in and of itself, a bad thing. In a lot of ways, it's a very useful thing, whether the poster really believes the view or not. It can serve the purpose of clarifying what a lot of people would repeat individually, and providing a good "hook" off which to hang arguments against the common view. -- RustyFoster

I disagree with Rusty that ratings should be a measure of personal agreement. I think the FAQ should specifically state that a comment rating should be independent of personal agreement (which is impossible to enforce, I know), and that users should strive for an impersonal assessment of a comment's contribution to a story or thread only. It should be stated upfront to users that the CommunityExpectation is to promote discussion and friendly debate, not a convergence of agreement among the K5 community.

: It's a feature of human nature: moderation can't but help to reflect, among other things, personal agreement. As with other things which I describe as "feature, not bug", what I mean is that this is an emergent property of the system that you can't design or harangue around. You have to work with, not against it. This doesn't mean you have to accept bias, but you have to work with the bias to circumvent it. -- KarstenSelf

My primary goal of discussing the rating system is to prevent popular views and people from completely controlling content on the system. It is my opinion that only through deviation of opinion do we have any real content whatsoever. That is, if everyone simply parrots everyone else -- and gets rated up in the process -- then we have no discussion forum to speak of; we'll have monologues written under different account names. Then the only comments that will get rated up will be those with which the majority agrees, and isn't that exactly what folks complain about Slashdot moderation? Don't you think it makes better sense to rate a comment based on how well it promotes healthy discussion, how well it's written, and how much of a contribution that comment makes to a particular story? Because otherwise, we've simply got a popularity contest on our hands. Do we want K5 to be a club or a means for discussion? That's the issue.

So, for the rating system to be successful we need to somehow enshrine in spirit the notion that iconoclastic views are perfectly OK, as long as they're logically formed, well written, and preferably referenced in the factual record. They may even be wrong to the majority, as long as they are polite. I've seen far too many well written posts rated down to below a two simply because they expound views which aren't accepted by a subset of community members. This is particularly the case down a thread, where usually the only members who rate a comment are those who are directly involved in the debate. -- MaynardGelinas

You should be careful fostering consensus. Consensus is often bad. It lead to the decline of IBM and Eaton's from the 70's through the 80's, as the management teams at both company's had become homogeneous. No new ideas came in, and the one's that were coming in weren't adaptive. After all, there was no peer competition to encourage new ideas; it was safer and easier just to say the same thing over and over again. Eaton's eventually died. IBM shuffled the executive and survived.

Consensus is also unnatural. Usually it has to be forced in some way. In this case, you are forcing it by prioritizing comments based on their agreement value. Thus, time-limited readers will only get through the most agreed upon views. In this case, consensus is rotten.

Indeed, to shirk off GroupThink, HealthyConflict should be fostered, not consensus. I'd also recommend taking a glance at CollectiveIntelligence for the ultimate consensus-feedback society. -- SunirShah

Consider the view that the point of a WebLog is not to achieve consensus. Wikis seek to refine data until it becomes knowlege. WebLogs seek to provide information and promote discussion. Synthesizing knowledge is left up to the reader. More to come in WikiLog. -- anon.

Exactly. Promoting consensus views then is lying to your reader. For example, note we don't have consensus about consensus. So, suggesting that consensus is valueless as the agreed upon view would be false.

Note that on a wiki, however, trivial comments often get the same weight as important comments. Even detracting statements that are rare cases seem to be as important as the major case. I think that is a problem. It has lead to the "everything is wrong" attitude on WikiWiki. -- SunirShah

Malicious Collusion

I think the worst potential weakness of the trust system right now is the "sid=trolltalk scenario". If you take a look at TrollTalk, you'll note that it is standard policy to rate all posts there "5", regardless of content. This is actually rooted in practicality--some people don't filter by "unrated first" or "newest first" normally, and they wanted a way to easily pop newer comments to the top. The effect, though, is a "web of mojo" sort of thing, as nearly all the trolltalk regulars are trusted.

In itself, this isn't a bad thing. They're nice people, and have been extremely supportive of K5, and most of them contribute to the rest of the site quite a lot as well (pb, spiralx, streetlawyer, and others). It's not this particular instance, but the fact that an attack based on this practice is quite possible, if it were undertaken by a group who wanted to disrupt the site. This is a problem. The root of it is that sid=trolltalk is not widely read, so if all you ever do is post there, it's no sweat to get trusted. PeerReview of ratings breaks down in this instance.

This is not a trivial thing to stop either, especially with diaries. Make two accounts, post ten comments in a diary on one account, rate each one to 5 with the other. There, you're trusted. The stakes aren't too high at the moment, but a determined attacker with an axe to grind could cause some hassle for people. I'm concerned about this, and not at all sure how to prevent it, or make it more difficult. Possibly requiring a comment to have been rated by, say, four different IP's before being counted toward mojo would help. At least that would make it more time-consuming to do this kind of thing, if still not impossible. -- RustyFoster

I really think the system is badly broken by allowing the corruption of collusion and multiple accounts, while never promoting playing by the rules. One user with one account will always be beaten by this system, so individual users with iconoclastic or unrepresented views are simply pushed away without consideration given to their contribution or involvement. Slashdot avoids this by tightening the reins around moderation through meta-moderation with the result that a different view, as long as it's reasonably well written and not offensive, will usually get through meta-moderation. K5 has no secondary mechanism to handle those who misuse ratings to push their own personal gripes, agenda, or who collude with others. That's a serious problem.

What we have here is a prisoner's dilemma paradox whereby the only way the system can work is if every user behaves responsibly, but any user can defect and gain advantage by manipulating ratings through multiple accounts and collusion with others. Given the rules of this game the only way out is not to play. However, Scoop could be encoded to show transparency to the ratings process, so that it would be impossible to rate another without everyone knowing. This won't prevent multiple account holders from manipulating the system, but it will hold those who use a single account to a consistent standard of conduct.

I think just about every regular writer has been bitten by this, so it's now reaching a tipping point where the general user community is getting annoyed by how comment ratings are being mis-used. That's why there are so many repeat submissions on kuro5hin on this issue: each one is a personal gripe by a user who mustered up enough courage to buck the system and expose its flaws. And don't think that this goes without risk. Rusty likes to say that "there is no K5 cabal," but there most certainly is; and they have a vindictive willingness to punish with glee those they do not like. In the end, will such behavior promote or discourage intelligent and creative commentary? Do we want a K5 club, or a discussion forum which represents a diverse set of views? That's the issue. -- MaynardGelinas

My response to this is that, yes, it's a weakness of the system, but the design of Scoop, moderation, and mojo, should limit any attack based on mallicious collusion (or "Web of Mojo" as it's been called elsewhere). Because mojo is weighted both toward recent activity, and toward intensity of moderation (more heavily moderated posts count more than less heavily moderated ones), an attack based on mojo accumulated in a TrollTalk forum will be overcome by both overwhelming moderation (from the more public forums) and the more recent status of this moderation. Mojo avoids the pitfalls SlashdotKarma? illustrated. Not perfect, but tending toward the right results. -- KarstenSelf 8 April 2001

People forming "tribes" or bands is almost inevitable as it's instinctual for humans to form social groups. The best way to avoid this isn't to try and break them up totaly but to try and keep a balance of power between several competing groups as this would allow individuals to not have to contend with overwhelming groupthink.

So far I can Identify at least two tribes [#kuroshin] and TrollTalk. I suspect there is a "Nicey Nice" group based around a couple of people's diary entries complaining about criticism in editorial comments. I'm pretty sure there's more. -- DanielThomas

If you LimitTemptation, then malicious collusion disappears.

Authentication

Note, K5 users cannot be authenticated -- there's no way of preventing me from creating multiple accounts, or sharing a single account with multiple people. While there are systems which deal with the issue of strong authentication (e.g.: online voting schemes), the authentication step is assumed.

There is no technical fix which is going to change these facts, they are a features of the system. Collaborative filtering and positive incentive mechanisms must take them into account as givens. -- KarstenSelf

I disagree with the capitulation of accepting abuse by members. You argue that this is an authentication problem which is unsolvable, though I think that while it may not be possible to prevent users from opening multiple accounts, it should be possible with a database of all comment ratings and users to ferret out gross abuse. Whether such a system of cross references is too computationally intensive to be possible given cost constraints over membership scaling -- well, that I can't answer to. -- MaynardGelinas

It's not captitulation, it's recognizing the features of the system. In particular, you can't be certain of the uniqueness associated with any one account. It could be many to one, with many users sharing a single account. This happened to one user whose account was being accessed by others in the office she temped at, using session cookies saved on various computers. Or a user could have access to multiple accounts. Actually, I'm guilty of this, having both 'kmself' and 'cypherpunks' at K5. And, of course, 'cypherpunks' is another shared account.

The point being that any system which assumes that there's a consistancy of use of any one account, or that one account equals one user, is flawed. Though this may be a useful generalization. One way to shortcut the problem would be to allow selection (either manually, through a selected group of raters, or an automated CollaborativeFiltering process, or a combination of both) of users whose ratings are considered meaningful. Note that this doensn't have to be agreement. See RustyFoster's AffinityGroups comment above. Perfectly bad taste (consistantly negative corrolation) is also a useful predictor, you just have to flip the sign bit. -- KarstenSelf 8 April 2001

Purposefully bad ratings

The biggest problem I see with ratings is that people really use them to push ideological views and personal biases at the expense of those their marginal views. Since not enough readers rate comments a few poor ratings can destroy an honest contributor's "Trusted" status.

On the other hand, in one of my posts there I personally rated the post down using multiple accounts just to see what would happen. Turns out that over time the post was slowly rated back up to a 3.5... which really does go to show that given enough of a sample the rating system can work. -- MaynardGelinas

Only trusted users can cause real harm with bad rating, and while it is very hard to become trusted and stay trusted, there are enough trusted users that one rogue can't do much harm. Currently about 1% of readers are trusted (~80 out of 8000). So, basically, if some comments are misrated, don't sweat it.

Our system is not like slashdot's "cumulative" system, where your record is simply added to as you go along. It adjusts to current conditions very quickly, and will tend to factor out the "unusual" ratings, the ones that are either much higher or much lower than your norm. -- RustyFoster

CategoryKuroshin

FairnessOfKuroshinCommentRating

Discussion