The (red) pen is mightier: ChatGPT's threat isn't to education

Hand grading a paper with a red pen. The paper has an F at the top.

I spent 25 years grading community college students' papers. My weapon of choice was originally the red pen, although when COVID forced me to change to giving feedback by email, I ended up liking that better. In this blog post, I'll argue that the (literal or metaphorical) red pen is the perfect response to chatbots in education. Like the light sabre, it's an elegant weapon from a more civilized era. All that's needed is the determination to wield it.

Testing shows that ChatGPT is incapable of logic, can't stay on topic, gets most facts wrong,1 is unable to assess the reliability of different sources of information, and fabricates 100% of its sources.2 That is, when it comes to the substance of writing, it takes the human frailties of our students and amplifies them to superhuman awfulness.

And unlike most of our students, ChatGPT will never, ever get any better. That's because it isn't what computer scientists refer to loosely as artificial general intelligence, and it isn't even an expert system or an inference engine. It's a "stochastic parrot,"3 meaning that it simply imitates the kind of talk that is most frequently found in its training data. If the most frequent kind of talk on a certain topic is a folk tale or a common misconception, then that's what it will spew back: that being transgender has something to do with sexual orientation, or that dogs are pack animals that submit to an alpha. This is also why it can't cite real sources. By design, it will ignore any single source of correct information. It's looking for the consensus, which by its nature is groupthink rather than a specific, reliable source.

What is truly impressive about ChatGPT is its surface-level fluency. It writes in paragraphs. Oh, God, how many times have I inserted a ¶ symbol in my students' writing? Over and over I've written the same plea in a narrow margin: "Don't write walls of text. Use paragraph breaks." Yes, ChatGPT does better at such things. But extruded jive is all it does well. If all we cared about was surface fluency, then where was the fear and outrage about Grammarly?

When I look at English professor Anna Mills' transcript4 of a ChatGPT session in which she used it to do the composition of an essay on a social topic, what strikes me is how bad the system's raw output was, and how much work she had to do to fix it. When it wanders off topic, she tells it that. Its logic lapses. It fails to cite sources. In every case, she is forced to use the higher-level reasoning abilities of a college professor to realize that there is a problem. The whole exercise looks like a lot of work, and the final result still isn't ready to turn in, because the fabricated sources have to be replaced with real ones. We teachers often wonder whether we spend more time correcting a student's work than they spent in writing it. Mills seems to have spent more time ameliorating ChatGPT's bullshit than it would have taken a middling student to produce a similar level of barely-passing pablum. Others who have tested the system have come to similar conclusions.5

I will admit that there is one good reason for teachers to be afraid of ChatGPT. They should be afraid if it was never their practice to engage critically with what their students were actually writing, and if their habit was to assign writing topics that could be responded to with platitudinal drivel. If the student is writing about a novelist, then yes, the instructor needs to make sure the paper supports its claims by citing specific examples from the author's writing -- something that ChatGPT can't do. If the student is is writing about science, then the teacher should be checking that their writing doesn't repeat commonly expressed fallacious beliefs -- something that ChatGPT, by design, will do.

The democratization of higher education has led to a state of affairs in which employers view virtue signaling as the main purpose of a sheepskin. They want good little worker bees who will submit quarterly reports on time, and they can tell who those job candidates are because they're the ones who went to college, attended class, and faithfully turned in C-minus papers that they'd come up with the night before while stoned. Most of my teaching colleagues were never on board with this, although the pressure of tenure and student evaluations may in many cases have caused them to give a C-minus to a paper that should have been a D or an F. The fix for the intrusion of ChatGPT into education is simply to keep on doing what those of us who cared had been doing all along: pay attention to the facts and logic in our students' work.

ChatGPT's threat to our society is not that it's a threat to education. Its threat is that it can be so easily used to generate ten thousand social-media antivax posts in ten minutes.

If you have comments on this piece, please print it out, mark it up with a red pen, and mail it to me. Either that, or get in touch through Mastodon.

Thanks for reading!

Postscript

I had some interesting conversations with people who read this post. Brad Wyble appropriately called me out for conflating K-8, high school, and college. My teaching experience is all at the college level. I would love to hear from K-8 and high school teachers about their thoughts and experiences on this. As a rough take, I would say that in K-8, grades don't matter, so the whole thing is a non-issue. In high school, especially around 9th grade, there may be a lot of English teachers whose classes are basically about the mechanics of writing, which is actually something that ChatGPT does well. My tentative suggestion, not based on any experience teaching this type of class, would be that they could simply do enough in-class essay exams so that students would need to demonstrate what they can actually do without a machine.

Perhaps there is a problem with college freshman classes that are actually 9th grade English in disguise. Remediation in California community colleges has long been a disaster, which is why the state is lurching toward abolishing it. To the extent that these are actually high school classes, it seems to me that the solution would be the same as in high school.

Returning to college-level issues, I want to push back hard on a Slate piece by Chris Gilliard and Pete Rorabaugh which, although generally on the right track, seems IMO to get something fundamentally very wrong by claiming that concerns about technologically aided cheating are nothing more than a "panic." This has not been my experience as a community teacher during the decade leading up to my retirement in 2021. I taught physics, and my craft changed drastically because of the one-two punch of MasteringPhysics (the textbook publisher's homework-checking app) and Chegg (a $20/month cheating service that employs Indian grad students to write up solutions to homework, and incidentally engages in wholesale copyright violation). MasteringPhysics allowed many of my less conscientious colleagues to stop having any contact at all with their students' work. Because students' work was no longer being read by a human, the ability to use Chegg to get answers became the automatic way for a student to get an A without actually learning physics. This has real consequences. My advice would be not to go to any pharmacist who took freshman physics after 2017, because they may not be able to convert from grams to milligrams without getting it wrong half the time.

Josh Nudell wrote on Mastodon: "This has been my position, as well, from a seat in the field of history. However, I also think that there needs to be a renewed commitment to actually teaching writing across disciplines. Anecdotal, but I had a senior in a general education survey course last year tell me that I had him write as many essays in the semester (5, total, 4 of which are short) as the rest of his college career to that point combined." Right on, Josh! People just don't want to read students' writing because it's tedious work.

Ben Crowell, 2023 Feb. 1

other blog posts

This post is CC-BY-SA licensed.

References


  1. See my earlier blog post, Testing ChatGPT. A Washington Post reporter got similar results in a test that required writing workplace emails (Danielle Abril, "Can ChatGPT help me at the office? We put the AI chatbot to the test," Washington Post, February 2, 2023).↩︎

  2. Anna Mills has compiled 13 transcripts of ChatGPT sessions in which the human tries to coax it into helping them write an essay. https://docs.google.com/spreadsheets/d/1KbQIDPP2JIWu7JqXm7r7-zIcQ0PKzSEbDacT3Jaktog/. In the example "Transgender in Turkey," she found that of the eight references it came up with, 100% were fabricated. See discussion at https://mastodon.oeru.org/@amills/109775836022560167.↩︎

  3. Bender at al., "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?," FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and TransparencyMarch 2021 Pages 610–623https://doi.org/10.1145/3442188.3445922, open access at https://dl.acm.org/doi/10.1145/3442188.3445922↩︎

  4. Mills, op cit. (transcript of "Transgender in Turkey")↩︎

  5. Abril, cited above, writes: "It helped, but sometimes its errors caused more work than doing the task manually. ChatGPT served as a great starting point in most cases, providing a helpful verbiage and initial ideas. But it also produced responses with errors, factually incorrect information, excess words, plagiarism and miscommunication."↩︎