Professoring: There's No Such Thing as "Objective" Grading

Everything comes down to judgment

May 22, 2024

I’m grading this week. I’m often reminded of the sign on a colleague’s door at a former institution: “I work for free. They pay me to handle emails.” I feel similarly, except for grading.

Grading sucks. There’s no way around it. It’s required and it can be useful, but ultimately it’s a second-best (at best) solution to many stakeholders’ problems. For employers and other schools, it serves as a means of comparing students and inferring preparedness. For students, it serves as a means of signaling quality and receiving feedback—and even as a means of committing oneself to work even when intrinsic motivation fails with all the other temptations.

These competing purposes place a lot of burdens on grading systems that they simply can’t bear. Students, for example, want a mark that conveys to future audiences how well they performed in a course. But they should also want a mark that conveys to them how well they have advanced in their understanding of material. The same grade can’t perform both tasks. We’ve all likely been in situations in which we did well in a course but learned little, and in situations where we’ve learned a lot from the course but scored poorly. The incentives for students and instructors produced by a system that favors one interpretation of a grade over another would differ wildly.

These problems are separate from the widely acknowledged if not always well understood problem of grade inflation and the less widely acknowledged but broadly as profound problem that a grade has to be assessed at a moment in time—that is, a grade can’t measure performance or understanding six months down the road, much less twenty years.

Grading also represents a practical problem: how are we to assign students to different ranks of performance? There are “objective” methods (which, in practice, mostly means easily assessed tests and similar assignments) and “subjective” methods (essays and other assignments that are more open-ended). This objective/subjective division is probably the more widely felt one, and it is one that student evaluations often record (“How objective were assessments?” for instance).

Sometimes, difficulties of implementation are just practical problems to be overcome through grit or inventiveness or both. At other times, practical problems signal the presence of a deeper theoretical issue. In this case, it’s the latter. Grading is hard even if we consider grade inflation, student evaluations, and all the rest to be solvable because it represents an attempt to impose objective standards where such standards can’t exist.

To lawmakers and novices, the multiple-choice test and similar close-ended ranks likely look more objective: you got the answer right or you got the answer wrong, what’s not to like? By contrast, students often appear to believe that essay marking (in particular) is subjective and reflects instructors’ tastes.

A little experience and a bit of thought dismisses both observations. The experience comes in from the process of grading itself. For any given instructor, marking an essay can become almost unbelievably reliable, particularly in lower-division courses (which is where these dynamics are most tense anyhow). Once you have graded several hundred or a thousand essays, you are pretty well “burned in” on what a good paper looks like and how it fits into a range of badnesses. There’s still a margin of error—how closely would one mark a paper at a given moment and then at a later one—but these tend to be pretty small if one approaches the task with any care at all. Other tools, like rubrics, can reduce the margin of within-grader error still farther.

It’s true that training reliability between graders is a little more difficult, and here the subjectivity is clearest. Different graders have different standards and weight varying goals more or less. Even this, however, is a training problem, and that reduces to a question of time and effort. The point, then, is not that this is an issue of subjectivity but one of resources and expertise.

Multiple-choice exams (and similarly close-ended approaches, like short answer test questions) seem to avoid these problems. And it’s true that their error rate is much lower than in the open-ended cases: the machine will, more or less, mark correct answers correct and wrong answers wrong without error (although Enlil knows there’s always the chance that you’ve misprogrammed the right answers—there’s always another way for error to creep in).

Thank you for reading Systematic Hatreds. This post is public so feel free to share it.

Yet ultimately these are no less subjective, despite their cloak of objectivity. What will be tested is always a matter of subjective preferences. Ask three instructors to produce sets of multiple-choice questions measuring core course objectives and you will receive four sets of possibilities. Ask them to produce sets of course objectives—the goals that should guide the standards that produce what the questions should measure—and you will receive five sets of standards. Even the question of how difficult any given assessment should be, or what a passing score should be, is ultimately a subjective one. (Is there an objective reason why an A should require 93% of the possible points in the class? No—it is arbitrary, but so any other cutoff would be as well.)

George W. Bush’s classic “Rarely is the question asked: Is our children learning?” misses the point, but not because of the grammatical error. The toughest thing is not measuring progress but deciding what progress to measure in the first place. Anyone who’s ever been on a curriculum planning committee will know that there’s no “objective” answer to this question, and any attempts to provide one—such as through external certification—ultimately represent a shifting of the difficulty to an outside authority. That may have real advantages: local coalitions can stifle progress, and broader or higher authorities may set better goals. Ultimately, though, this is just a question about whose taste will be reified.

Recognizing that standards come from somewhere is not a basis for the rejection of standards. Rather, it supplies a basis for thinking not about “objective” or “subjective” grading but about “reliable” or “unreliable” grading processes, about processes that are “productive” of student learning or “unproductive” of it, about means of offering feedback that are “arbitrary” or “standardized”.

A multiple-choice test could be crafted to be a reliable index of what students are capable of performing—or it might not be. A final project to create a podcast might incentivize students to produce their best work and learn new skills—or it might be trash. An instructor might have the same high standards for everyone—or they might play favorites or indulge bigotries. None of these map onto the “objective” or “subjective” distinction, but each of them taps into different dimensions of what we want to accomplish with grading.

Systematic Hatreds

Professoring: There's No Such Thing as "Objective" Grading

Everything comes down to judgment

Discussion about this post