The number we’d like to calculate is questions per user, . Libscore gives good numbers for the denominator. Turns out there are at least a couple of ways to get useful numbers for the numerator as well.
Number of questions on Stack Overflow
We can see how many questions are listed for the library’s tag on Stack Overflow. This gives the exact number questions, but has the problem that it does not count the number of users who had that question – Stack Overflow discourages duplicate questions.
Number of searches on Google
By digging into the data behind Google Trends we can get a number that is proportional to the number of searches for a particular search term on Google. The theory is that most searches involving a library indicates a developer with a question.
There is an undocumented API that gives us the raw data behind the Google Trends charts. It gives search counts normalized to a scale from 0 to 100, such that 100 is the maximum in the given dataset. As long as we’re careful about comparing everything to a common reference, and scaling the numbers appropriately where needed, we can get a set of numbers that are proportional to the real search counts.
While this fixes the problem of counting cases where multiple people have the same questions, it has problems of its own. One problem is that we’ll end up undercounting certain search terms. Ember, React, Backbone etc. are all normal words that have meanings of their own. To try to deal with this, we can throw in the “js” suffix as shown above. This helps, but it means we’re undercounting searches for these libraries. It helps that this problem occurs for most of the libraries, so at least the bias is spread out somewhat evenly.
I gathered the relevant data for 11 JS libraries and frameworks of various types: jQuery, jQuery UI, Backbone, React, Knockout, AngularJS, Ember, Meteor, Modernizr, Underscore and Lo-Dash. Below, Underscore and Lo-Dash are counted as one, since their counts are conflated on Libscore. The data was collected in mid-December.
|Library||Libscore users||Searches (Relative numbers)||Stack Overflow questions|
The differences here are pretty dramatic. Using the Stack Overflow measure, Angular has a 6.5x higher WTF factor than Backbone, while Meteor has a 21x higher WTF factor than Angular does. Meteor has a staggering 49143x higher WTF factor than Modernizr.
What is bad about this metric?
There are a few ways in which the numbers used here may be biased in one way or the other.
Problems with the Google based metric:
- Undercounting searches. The JS library names are words with meanings of their own, so we have to use the “js” suffix to count them.
- Round-off errors. Because Google trends only produces numbers from 0-100, we have low precision on the numbers.
Problems with the Stack Overflow metric:
- Since SO discourages duplicate questions, this will undercount the number of people having a question.
Problems with both metrics:
- Undercounting sites: Sites that have been taken offline are not counted, while the searches and SO questions made during their development are counted. This produces some bias against the older libraries.
- Sites that are unpublished are not counted, but the searches made during their development are. This produces some bias against the newer libraries.
- Biased site sample: Only the top million sites are counted. This produces a bias against any library that is used mostly on smaller/less-popular sites.
What is good about this metric?
There are certain things that indicate that this approach is doing what it’s trying to do.
One is that the rankings produced by the Stack Overflow WTF and the Google WTF largely agree with each other. This means that neither measure can be entirely wrong, unless they are wrong in the same way.
A low WTF factor does not mean a framework is better, it just means it you will have less questions about it. One would expect a projects WTF factor to correlate with the scope and ambition of the project. One would expect frameworks to have a higher WTF factor than libraries, because they affect more parts of the development. In this view, the fact that Meteor produces the most questions makes sense, given that it is the only full stack framework in consideration: It involves both the client side, the server side and even the database. Choosing such an all-encompassing framework comes at cost: More questions. It is good to be able to quantify just how big that cost is.