Tuesday, January 24, 2023

ChatGPT goes to law school at the University of Minnesota

Oy veh:

How well can AI models write law school exams without human assistance? To find out, we used the widely publicized AI model ChatGPT to generate answers on four real exams at the University of Minnesota Law School. We then blindly graded these exams as part of our regular grading processes for each class. Over 95 multiple choice questions and 12 essay questions, ChatGPT performed on average at the level of a C+ student, achieving a low but passing grade in all four courses.

Thoughts from readers?  Submit comments only once, they may take awhile to appear.

UPDATE:  See esp. Derek Muller's comments, below.


Legal Profession, Of Academic Interest | Permalink


And that should pretty much be the end of take-home tests. In class, internet off exams seem to be the only solution.

Posted by: Anon | Jan 24, 2023 10:10:16 AM

I recently played around with ChatGPT. Exam essay answers were coherent but overly generic. I asked it to write my bio, and it gave me a clerkship with Judge Guido Calabresi and a PhD from Berkeley. How can I complain when I get great new credentials?

Over time, it will grow more accurate and produce better answers. Right now, it cleverly writes in a very general way because when it delves into details, it seems to falter on facts. But It will keep improving with further use and development, and any take home tests or papers will be a challenge to police.

The only silver lining is that maybe one day it can write exams and grade exams, not just take exams. Then we can leave the entire exam process to AI, and professors might even be happier than the students . . .

Posted by: Daniel J Solove | Jan 24, 2023 6:08:45 PM

Should IRBs review ChatGPT experiments that use students as a control group? On the one hand, this article merely observes exam performance, and the list of human subjects is not public knowledge. On the other hand, the professors and course titles are public, so many law students would know which of their classmates were in the control group. Employers reviewing transcripts of a control group student could also identify the student as a study participant and compare the student's grade to ChatGPT's. Some law students might not consent to allow employers to compare their grades to ChatGPT's.

Posted by: Anon2 | Jan 25, 2023 8:01:35 AM

I think this says perhaps as much about contemporary grading policies in law schools as it does about ChatGPT. At Minnesota, the paper explains, there is "no requirement to award grades below a B." And, "Instructors at Minnesota Law rarely give D or F grades."

"ChatGPT received a B in Constitutional Law (36th out of 40 students), a B- in Employee Benefits (18th out of 19 students), a C- in Taxation (66th out of 67 students), and a C- in Torts (75th out of 75 students)." In a different era of legal education, it seems possible that this "student" would have failed most of these courses. (Including placing at the bottom, or second from the bottom, in three of the four courses.)

This is not to say that there are very important questions for instructors to ask about open-book exams, etc. in light of ChatGPT. But I think it is in many ways a demonstration that it is nearly impossible to achieve a failing grade as long as one includes, as Professor Solove notes, "coherent" even if "generic" answers.

Posted by: Derek Muller | Jan 25, 2023 11:26:40 AM

Seconding Derek - the headline here seems to be “law school curve protects ChatGPT from academic exclusion.” Or maybe “law professors are right: attending class will help your grade on the final.”

Posted by: Enrique Armijo | Jan 25, 2023 2:12:30 PM

I definitely agree that the curve plays an important role in explaining why ChatGPT ended up passing all of the exams in our study. But I think that both the curve and the rarity of D or F grades at University of Minnesota is in substantial part a function of the excellent student body we have. 99% of our students pass the bar exam, and even our students at the 25% of the entering class had a 3.6 GPA and 162 LSAT score. Consistent with these facts, even the worst exams in a law school class that are produced by law students are often reasonably good on various fronts. By contrast, certainly a blank or extremely bad exam would receive a D or F.

Posted by: Daniel Schwarcz | Jan 26, 2023 7:52:36 AM

Third to Derek’s comment. I had ChatGPT write an answer to one of my exams and it was generic pap that failed to address the issues, indeed, typical of minimally coherent answers that spit back rote material rather than think.

As to grading, the C- or D+ I’d give that exam is the functional equivalent of failing, because consistent grades at that level require automatic dismissal without right of appeal. Even the Minnesota grades average to just above a 2.0, which would put the student just above dismissal and certainly on academic warning.

Posted by: Jeff Lipshaw | Jan 27, 2023 4:49:32 AM

To echo Dan Schwarcz and further respond to comments about Minnesota’s grading practices and curve, Ds and Fs are rare at Minnesota because, as a general rule, our students perform well enough that they should pass. A law school’s curve should reflect the students it has, and we shouldn’t fail students who perform acceptably (albeit less well than their peers) simply because an earlier era with different admissions standards and applicant pools had different expectations.

Meanwhile, a student with a C+ average would not just pass blithely through to graduation at Minnesota. The student would be on academic probation and subject to extra supervision. Our Director of Academic and Bar Success works closely with struggling students to identify why they are struggling and help them improve. We had a 99% bar exam pass rate last year in no small part because of these efforts.

ChatGPT did poorly in important ways, such that I am skeptical it could ever truly do well on a final exam by itself. But like many students (and especially first semester 1Ls) who struggle with different types of exam questions, ChatGPT didn’t do so poorly that a failing grade was appropriate. Regardless, the salient point is that a clever student could use ChatGPT to supplement and improve their performance on an exam.

I hope the lesson people will draw from our study is not that contemporary grading practices are too lax if ChatGPT could pass. Instead, I hope that people will take from the study that we need to make adjustments to our teaching methods as well as our examination and assessment practices in a world with ChatGPT, including figuring out how to leverage ChatGPT’s strengths as a pedagogical tool while warning our students about its limitations.

Posted by: Kristin Hickman | Jan 27, 2023 8:58:15 AM

At Howard, where I am obligated to give Ds in my 1L classes, ChatGPT would have earned a passing grade but only barely. It's answer was coherent, as Prof. Solove noted, but lacked analytical depth. Here's it's performance on one of the questions: https://twitter.com/Prof_Bruckner/status/1608135994984783874?s=20&t=zaEoEW9BHPoflQABvXbBDA.

I'm not concerned about students using ChatGPT3 to answer their exams at the moment.

Posted by: Matthew Bruckner | Feb 8, 2023 1:58:22 PM

Post a comment