Wednesday, September 26, 2012
Monday, September 24, 2012
Mr. Phillips writes, in reply to the criticisms noted last week:
Prof. Leiter’s colleague’s concerns are about false positives, which would inflate scores. We find this the lesser evil, and thus diverged from the Leiter method by not using his sampling technique (which we failed to make clear in our methods section, and have since corrected) because we find the technique problematic from a sampling methodology and measurement theory perspective. Leiter looks at the first and last ten citations, counts up the number of “legitimate” ones, and multiplies that percentage by the total number of cites to get his initial raw value. Thus, someone with 1000 “cites” in Westlaw’s JLR, who had 16 legitimate cites of the first and last 10, would have a raw value of 800. This has three major problems. First, Leiter is using a non-random sample to represent the underlying population. That is a statistical no-no unless there is some kind of sophisticated statistical “correction.” Second, even if the sample was randomly drawn, it is too small to make useful inferences. The hypothetical professor we listed above (the average number of cites a professor had in our study was 976), with a random sample of 20 (with 16 legitimate), and 1000 total cites, would have a 95% confidence interval of 626-974, mean the “true” number of legitimate cites is most likely somewhere in that range—which is not very useful. Finally, the Leiter method makes it more difficult to compare scholars since some professors’ scores will be biased high and some biased low due to the non-random nature of the sampling, negating the value of the Leiter scores as a comparative metric, which is the only real value such scores have.
Our methodology just counts everything in the JLR database, biasing the scores higher than the “truth”, but treating everyone the same—equality of inflation—so that comparisons can be more easily made. Our method is also very easily reproduced, as Prof. Leiter’s colleague demonstrated. And we are not claiming our method (or any citation-based measure) is a measure of quality, but of relevance (and given that many citations are put in by student editors, citation studies are a long way from perfect). As to Prof. Strandburg, her situation is so rare—having highly cited works in an unrelated field, then completely shifting career trajectory and turning to the law—that the one or two people that are like her can be easily corrected when brought to our attention (as we did with her score). That is a lesser evil than completely excluding relevant work in peer-reviewed journals, in our opinion. And as for Prof. Cohen, while we have received much feedback wondering who he is and why he is included, we have also received feedback that “he should be included [because] he is well known in the IP field by those who read economics as well as law journals…[and] has done path breaking empirical research in IP for many years.” We appreciate the numerous feedback we have been receiving as we seek to refine our measure and paper.
I have to say this strikes me as unpersuasive. A few quick points: (1) ideally, random sampling for false positives would have been best, but in all the years of doing it non-randomly, no one has ever come forward with a single case where this method distorted the results; (2) by contrast, it is both a "statistical" and intellectual "no-no" to fail to correct for huge rates of false positives, since such rates are not evenly distributed across all names for the obvious reasons (e.g., someone with the last name "Judge"), and several cases of large false positives have now been identified; (3) in any case, it's an empirical, not statistical, question which method yields the most reliable outcomes, but I'm betting on the approach that I and now Sisk have used for quite some time; (4) using Web of Science was a good addition to the mix, but there clearly needs to be some sensible protocols in place to screen out citations utterly irrelevant to legal scholarship and also more sensible protocols about who counts as a member of a law faculty (tenure stream status in law was our criterion, which would eliminate a lot of the strange inclusions in the Phillips & Yoo lists). James Heckman and Gary Becker are now cross-appointed to the law faculty at Chicago, and they crush Cohen (and almost everyone else!) on Web of Science, but it would be bizarre to think that should be decisive in a ranking of law faculties!
Thoughts from readers about all this? Full name and valid e-mail address required.
Friday, September 21, 2012
Wednesday, September 19, 2012
The proliferation of JD/PhDs over the past generation has resulted in many junior faculty candidates facing the question: should I seek a "joint" appointment between the Law School and the cognate PhD discipline?
"Joint" appointments come in various forms, of which the two main ones are: (1) tenure-track status in two units, with two separate tenure reviews, and two separate tenure decisions; and (2) a "courtesy" or "secondary" appointment in the cognate department, with the tenure home residing in the Law School. The former is, I suppose, a fully "joint" appointment, but it is also to be avoided (perhaps even after tenure, since it is likely to increase your administrative burdens [committee work, faculty meetings etc.]). Although it's still easier, alas, to get tenure in a law school than in most academic departments, the bottom line is having two different tenure masters is a bad position to be in. (There are cases of faculty who didn't get tenure in the non-law department, but did get it in the law school, and those situations are unhappy ones all around.) On the other hand, (2) can have benefits for the faculty member (perhaps teaching in the cognate department, involvement with PhD students and the like) without any of the costs.
But a JD/PhD on the rookie law market should be careful about raising the question of courtesy appointments. Law schools understand full well that they offer better terms of employment (in teaching load, salary, and research support) than almost every academic department in the humanites and social sciences, and so a key question for them in hiring JD/PhDs is: why do you want to be in the Law School rather than in the cognate field? The answer had better turn on intellectual and pedagogical considerations. After a JD/PhD has an offer, you can raise the question about courtesy appointments (assuming they exist, not all schools have them), but if you're hired by a Law School, do understand your primary obligations reside there.
Tuesday, September 18, 2012
Saturday, September 15, 2012
A colleague elsewhere writes:
The results looked odd to me, and I checked a few of their reported results, which appear to be very sloppy.
You mention on your webpage that when you generated your citation statistics, you searched the JLR database using the string “first /2 last” and then audited a subsample for false positives. I believe Yoo and Phillips failed to perform this audit. Their appendix lists Kathryn Judge as having 122 citations in her first year. If you search for “Kathryn /2 Judge” you get 124 hits in the JLR database, but only about 35 are true citations. Their results for Michelle Wilde Anderson and Michael Gilbert appear to have come from searching for “Michelle /2 Anderson” and “Michael /2 Gilbert,” which generate mostly false positives in both cases. Oskar Liivak is #3 on the list because Web of Science lists over 400 citations from physics articles he wrote before he went to law school. Even if one thinks that physics citations might be relevant for assessing the quality of a law professor, it certainly doesn’t make sense to divide his total citation count by the number of years he has been a *law* professor.
This also explains why Katherine Strandburg is #10 on the list of most cited professors. She has a total of 389 hits in JLR (not all of which are citations) and almost 1700 citations from physics publications she wrote before she went to law school. This total is once again divided by the number of years she has been a law professor.
Obviously, I don’t mean to disparage these particular professors. The fact that Yoo and Phillips inflated their citation measures doesn’t say anything about the actual quality of their work. But these errors are enough to convince me that Yoo and Phillips aren’t even measuring citations correctly, let alone quality.
We had noted earlier the risk that Web of Science cites would not necessarily pick up citations that reflect impact on legal scholarship, but these are even more extreme cases than I had imagined. The use of Web of Science also explains how economist Wesley Cohen at Duke (who isn't even a member of the core law faculty there!) fares so well in the Phillips & Yoo study, even though, I imagine, most law faculty have never heard of him. If they really didn't correct for false positives, that is also a rather serious error. Hopefully they will correct for these and other mistakes before long. I still think there are virtues to this approach, but it does need to be carried out correctly!
UPDATE: Katherine Strandburg (NYU) writes:
I've been traveling without consistent Internet access and the Phillips-Yoo citation paper just came to my attention because it was pointed out to me by a colleague. I just sent the authors an email pointing out that, based on a quick look at the paper, I believe their methodology is fishy. As I told them, "the problem is that when you count all publications, in my case that includes my physics publications. Cites to those are probably not too relevant to my relevance as a legal scholar. I don't know how many such cites there are, but those papers have been around for awhile. I'm also not sure how you figure "per year". In fact, I can't actually think of any sensible way to do it in my case. It wouldn't make sense to count only my years as a law professor, since my physics papers have been collecting citations (presumably -- I don"t really know whether anyone still cites them) since long before then. But it also doesn't seem to make much sense to count all the years since my first physics publication, since there were about ten years while I was going to law school and practicing law when I didn't do any research at all. All in all, unless I am misunderstanding something, the method doesn't seem to make much sense for someone in my situation (which, admittedly, is a rather weird situation)."
I now see on your blog that someone else has made a similar critique. Just wanted to say that I agree (though it's nice to see how many cites my physics papers have received).
Friday, September 14, 2012
Gregory Sisk (St. Thomas/Minnesota) writes with a useful set of reflections on the Phillips-Yoo citation study discussed last week:
Believing as I do both that U.S. News rankings are flawed (and thus should be supplemented by multiple other ranking approaches) and that scholarly impact or quality is multi-dimensional (and thus also benefits from a diversity of approaches), I too welcome any thoughtful new attempt to evaluate the meaning of citations to legal scholarship. James Phillips and John Yoo have certainly added a thoughtful contribution to scholarly rankings. At the same time, I think Brian Leiter’s conclusion is right – the Phillips-Yoo approach is not better than the Leiter Scholarly Impact Score method, but rather is different.
However, one aspect of the Phillips-Yoo method strikes me as mistaken or at least mistakenly characterized. As a central critique of Scholarly Impact Scores, Phillips and Yoo complain that the method refined by Brian and that we at the University of St. Thomas applied this year are “bias[ed] against younger scholars.” Isn’t it odd to describe as “bias” the natural tendency of older, more experienced, and well-published scholars to draw greater attention from other scholars? If Scholarly Impact Scores were calculated over a lengthy time frame, then tired and semi-retired older scholars – what Brian has aptly described as “once-productive dinosaurs” – would gain misleadingly high scores. But limiting citations to a five-year period – as we do with Scholarly Impact citation ranking – quite properly minimizes the impact of no-longer productive scholars, because citation levels naturally fall over time without anything new being contributed. In general, the fact that an experienced and still active older scholar draws greater attention based on the larger portfolio of work available to be cited in the past five years is hardly a bad or irrelevant thing.
Along the same lines, I don’t know about the wisdom of taking the actual and objective data of current citations and then recalculating scores on the basis of longevity among a particular law faculty. Unless one is careful to explain that this longevity depreciation factor is being used to separate out and identify up-and-coming young scholars (or to rank schools that have more promising younger scholars than other schools), one could characterized this method as genuinely biased in the opposite direction, that is, against older scholars.
But the better point is not that one or the other is biased so much as that they are doing different things – prediction of the future versus description of the present. As I see it, Phillips and Yoo seek to devise a method of predicting the likely future scholarly impact of younger scholars (which is commendable and intriguing). But their introduction of a longevity depreciation factor should not be understood as an improvement on our measurement of current scholarly impact (which it is not).
Let me explain what I mean by an example. Suppose that Professor A, a recently tenured scholar, has published only 3 articles, each of which has been cited 100 times over the past five years, for a total of 300 citations. Professor B has an additional ten years of experience as a tenured faculty member, has published 12 articles, each of which has been cited 50 times over the past five years, for a total of 600 citations. If I understand them correctly, Phillips and Yoo apparently would conclude that the
scholar with the greater impact is Professor A, because each article individually drew more citations and because the number of years in teaching is fewer. But, if we are measuring which scholar today has a greater scholarly impact, doesn’t the reality remain that it is Professor B? The authors of 600 articles saw Professor B’s body of work as worthy of citation, while the authors of half as many articles reached that conclusion with respect to Professor A’s work.
Now Phillips and Yoo may be on to something important in predicting that Professor A is more likely to be the more prominent scholar in the future. Their description of scholars like Professor A as more “relevant” may be shorthand for “making a prediction of the future prediction.” Of course, it is possible that Professor A will not live up to the prediction, because he fails to remain productive, because his three articles prove to have exhausted his abilities and nothing afterward has the same scholarly luster, or because his work remains of the same high quality but he has saturated the scholarly interest in his particular scholarly message and thus he experiences diminishing returns in citation to his future articles that are along the same vein. In fairness to Phillips and Yoo, however, those disappointing possibilities simply reflect that, by focusing on longevity and citations per article, they are attempting to predict the future and any prediction includes an element of uncertainty.
By contrast, the 600 citations garnered by hypothetical Professor B over the past five years is not a prediction but a present reality. Whether by exceptionally prolific writing or diligence in promoting a point of a view through a series of articles or something else, she has succeeded in drawing the attention of the authors of 600 articles. To dismiss or dilute that accomplishment by constructing a depreciation formula that incorporates number of years in teaching is to ignore the reality of the current impact. Again, if our purpose is predictive, we might prognosticate that Professor B’s influence will decline or at least be surpassed in the future by Professor A. But as a description of the present scholarly impact, haven’t the authors of 600 articles in my hypothetical already reached a definite conclusion?
As a final note, the Phillips-Yoo study appears mostly to provide confirmation of the Scholarly Impact Score method, as the changes in rankings among the 16 schools studied are mostly modest. Moreover, because I expect that, however defined, All-Stars and Super-Stars would make up decreasingly smaller percentages of law faculties as one moves down through the ranking, the Phillips-Yoo method is likely to have decreasing significance as one moves from the 16 schools they chose to study to include the larger sets of 96 law faculties we studied in 2012 (and the full 200 law faculties studied in
Thursday, September 13, 2012
Wednesday, September 12, 2012