I have noticed some problems with how Debian is using the popularity-contest data.

  • popcon units are unknown

    Using the popcon score of a package to measure its use is like using the bleeple score of a trip to measure its distance. Both scores have no sensible units attached, though they may be loosely derived from a unit value. Is a trip with a bleeple score of 99 a long trip? Is a package with a popcon score of 99 a rarely used package?

    The only way to resolve this ambiguity at all is to compare ratios of values, so the problimatic units cancel out. A flight from NYC to AMS with a bleepie score of 99 is 50 times as bleepie as my drive home, which scores 2.

    So, any statement like "low popcon score" is basically so lacking in context as to be meaningless. Such statements are deprecated, and should be ignored.

  • not all popcon scores are comparable

    The above example is intentionally bad. Plane flights and car trips are not very comparable when you don't know what units (time / CO2 / distance / number of people sharing a confined space / security theater points) are being used.

    Similarly, comparing a high popcon package like gnome-terminal with a relatively low popcon package like udhcpc is very deceptive. The former is installed by default in the desktop task, but plenty of desktop users would not miss it. The latter is installed only on embedded systems, which can exist in absurd numbers, and none of which will tend to report to popcon.

    So, any attempt to compare popcon scores should include a rationalle about why the two scores are comparable. For example, gnome-terminal and rxvt are somewhat comparable since they are both terminal emulators. But, only the vote scores, not the inst scores should be compared, since gnome-terminal is installed by default. dhcp3-client and udhcpc are not comparable despite being similar packages.

  • popcon scores do not measure long tail effects

    A strength of Debian is that not only commonly used, but also uncommon and niche software is packaged. Popcon does not measure the benefit of some little used peice of software being there, packaged and ready to use when a user needs it.

    For six years I kept satutils in Debian, despite it probably having no users. It has a very specific use case, to control a motorized internet satellite dish typically installed on an RV. I did that because it was essentially no work (the package was approximatly bug free, and required no changes since 2007), and because of the possible payoff if someone needed this thing and there it was, in Debian. The value of Debian in that occasion would spike to a value that, while not directly comparable with a popcon score, would be pretty epic, for that one user, as they pushed arrow keys to move a satellite dish around.

    (It also had the best WITHOUT WARRANTY statement I've had the pleasure to write: "If you break your dish off your vechicle using this software, you get to keep both pieces.")

    Every removal of a package for "low popcon score" runs the risk of silently degrading this overall value of Debian.

  • who wants to be popular?

    Part of the problem is that popcon has been around long enough that the connotations of its name, "popularity contest" have been dulled by repetition (and abbreviation). Popularity contests are not pleasant things. They rarely reach the best result. They embody the tyranny of the majority. The name was originally, to the best of my knowledge, chosen exactly to imply all these failings, to say that hey, popularity-contest is deeply flawed, but is better than nothing for this one specific use case (ordering packages to place on CD sets). We no longer think of popcon with these caveats. That is a regression in your brain. Fix it.

    By removing packages that appear unpopular, we run the risk of Debian becoming bland and homogenous.

Brilliant comment about the long tail

Joey,

Your comment hit the nail on the head so hard that I just had to post a comment (something that I rarely do): your "long tail" argument is something that I had in my mind but that I just lacked the words to describe.

Thank you very much for this most insightful post. I hope that other people stop by and read this. I also hope that they have, at any one time or another, been in the minority (e.g., like Linux users being treated like 2nd or 3rd class citizens, anyone?) so that they can appreciate the relief that it is to have something that works, even when you have nowhere to run to.

Comment by rbrito
Great
Thanks for this very nice summary. Almost everything that I do in Debian deals with packages that have popcon < 100. "low-popcon" sounds bad (we even have a dedicated QA tag for that), but at least in my case there are quite a few very happy people (including me) behind these scores that just love Debian, because of its unprecedented diversity. Debian does a great job as a mainstream distribution, but (maybe more importantly) it excels in so many fields that others haven't even thought of supporting.
Comment by Michael
popcon flaws

Just to complement with a tangential issue: popcon scores are "leaky". Major derived distributions simply divert popcon submissions to their servers, instead of submitting to both Debian's and their own, which is easily possible [1]. Thus it makes difficult to adequately assess popularity of a package, especially a niche one which could be used by 50% of a specialized derived distribution, while used by less than 0.1% of native Debian users.

[1] http://wiki.debian.org/Derivatives/Guidelines#Popularity_Contest

Comment by site-myopenid
Great post
The work that you maintainers do is simply fantastic. The thing I love most about Debian is how I can find software using apt for nearly any task I want. It's awesome, and I think the spirit of your post is what lets users have this sort of comfort: that their needs will almost always be met by some rarely-used program that is packaged because someone cared.
Comment by roshan-george
who wants to be popular?

"Part of the problem is that popcon has been around long enough that the connotations of its name, "popularity contest" have been dulled by repetition (and abbreviation)."

Interesting. Thanks for reminding us new contributors :-P

Comment by Chealer
inst / vote ratio

Normalisation is indeed an issue - against the number of reports of the popularity-contest package or just against some package that is dominating a particular field, say some IRC library that many different tools share. I had suggested this to upstream and was given instructions where to send patches to and ... I just did not get around it.

When I read Joey's post, I had instantly thought about measuring the importance for everyday's life that a tool may have. Here I thought about the vote by inst ratio. With people like me not setting the atime this is difficult, but one gets an idea. Icedove and Iceweasel for instance I use daily. Some sequence alignment tool - not. To have such a ranking my shed some new light on some tools ... much along the line, not perfect though, of Joey's posting as I perceived it.

Comment by Steffen
How to choose a decent MP3 player

Imagine how I do this. (in Ubuntu, but does not much matter) I select all players and sort them in ascending order of popularity, then look at top 5-10 of them. Players of low popularity are the best. Usually they are like 'Aqualung' or AlsaPlayer. Simple, understandable, working.

Comment by Ilya