Tuesday, August 2, 2016
The other day I remarked on what should have been obvious, namely, that Google Scholar rankings of law reviews by impact are nonsense, providing prospective authors with no meaningful information about the relative impact of publishing an article in comparable law reviews. (Did you know that it's better to publish in the Fordham Law Review for impact than in the Duke Law Journal?) The reason is simple: the Google Scholar rankings do not adjust for the volume of output--law reviews that turn out more issues and articles each year will rank higher than otherwise comparable law reviews (with actual comparable impact) simply because of the volume of output.
When Google Scholar rankings of philosophy journals first came out, a journal called Synthese came out #1. Synthese is a good journal, but it was obviously nonsense that the average impact of an article there was greater than any of the actual top journals in philosophy. The key fact about Synthese is that it publishes five to ten times as many articles per year than the top philosophy journals. When another philosopher adjusted the Google Scholar results for volume of publication, Synthese dropped from #1 to #24.
Alas, various law professors have dug in their heels trying to explain that this nonsense Google Scholar ranking of law reviews is not, in fact, affected by volume of output. I was initially astonished, but now see that many naïve enthusiasts apparently do not not understand the metrics and do not realize how sloppy Google Scholar is in terms of what it picks up.
Let's start with the formula Google Scholar uses in its journal rankings:
The h-index of a publication is the largest number h such that at least h articles in that publication were cited at least h times each. For example, a publication with five articles cited by, respectively, 17, 9, 6, 3, and 2, has the h-index of 3.
The h-core of a publication is a set of top cited h articles from the publication. These are the articles that the h-index is based on. For example, the publication above has the h-core with three articles, those cited by 17, 9, and 6.
The h-median of a publication is the median of the citation counts in its h-core. For example, the h-median of the publication above is 9. The h-median is a measure of the distribution of citations to the articles in the h-core.
Finally, the h5-index, h5-core, and h5-median of a publication are, respectively, the h-index, h-core, and h-median of only those of its articles that were published in the last five complete calendar years.
Obviously, any journal that publishes more articles per year has more chances of publishing highly-cited articles, which then affects both the h-core result and the h-median result. But that's only part of the problem, though that problem is real and obvious enough. The much more serious problem is that Google Scholar picks up a lot of "noise," i.e., citations that aren't really citations. So, for example, Google Scholar records as a citation any reference to the contents of the law review in an index of legal periodicals. Any journal that publishes more issues will appear more often in such indices obviously. Google Scholar picks up self-references in a journal to the articles it has published in a given year. Google Scholar even picks up SSRN "working paper series" postings in which all other articles by someone on a faculty are also listed at the end as from that school. (Google Scholar gradually purges some of these fake cites, but it takes a long time.) Volume of publication inflates a journal's "impact" ranking because Google Scholar is not as discerning as some law professors think.