As another exam marking season comes to an end, many academics will be breathing a sigh of relief – even as they brace themselves for the ensuing wave of appeals from students disappointed by their marks.
But those who regard appeals as the resort of spoiled children should reconsider their faith in their own infallibility as markers.
In Germany, there was a minor scandal a few years ago when a student at the University of Freiburg submitted a law essay twice. His intention was merely to make sure it got to the lecturer in time, but both scripts ended up being marked – and the verdicts were very different. One got a “satisfactory” score (9 out of 18), while the other received a marginal pass (a 5).
When the disparity was highlighted in the media, it caused an uproar among students: a kind of grading #MeToo movement. Under an article about the incident in Der Spiegel, for instance, one student said he had once handed in a paper that was marked so low that he was asked to rewrite it. Instead, he handed it in again entirely unchanged – and, this time, the same professor graded the work as “good”.
Discrepancies are inevitable. Certain kinds of subjects and questions lend themselves to model answers but even in these cases, an element of subjectivity remains. There is also a question of motivation. Reading large numbers of papers is about as intrinsically demotivating a task as I can imagine. And I’m not alone. A young accounting lecturer in New Zealand once told me that he had been asked to mark one question 800 times; the misery on his face is still fresh in my mind. After I described marking to another colleague in New Zealand as “pure torture”, he remarked that we should report it to Amnesty International! Another colleague suggested that he and I stay up all night in a desperate, caffeine-fuelled bid to get our grading over and done with.
In such fraught circumstances, consistency of grading is always going to be an issue. My experience is that when I am overloaded with marking, I end up simplifying essays into between three and five categories. My brain is unable to differentiate any more subtly than this.
Consistency is likely to be all the harder if marking is divided up among grading assistants, as in the Freiburg case (in which just under 400 scripts needed to be marked). A representative of the Freiburg examinations office told the magazine Fudder that grading assistants who devote the most time to each script naturally earn less – and that what marking assistants earn “would not make you rich”. In other words, people try to cut corners – especially when they feel underpaid.
There are ways to mitigate this, of course. Before I moved away from lecturing, I used to grade several model scripts and then explain and discuss them with my grading assistants. I also checked random scripts that they had marked to ensure they were keeping to the guidelines. This proved reliable enough and I never had any problems.
Perhaps a better answer ultimately lies in computer-aided grading – especially if essays and exams are done digitally, as is likely to continue to be the case in the wake of the pandemic. I still have a ring binder file in my cellar containing some outstanding multiple-choice economics tests that were graded on early mainframe computers. They were a really accurate, methodologically sound and efficient test of undergraduate knowledge for the massive first-year groups. Setting the questions was highly demanding, but the grading was not only done painlessly and rapidly, it also produced wonderful descriptive statistics.
However, while phrase and concept recognition is now state of the art, the jury is still out on whether machines will ever be able to mark essays effectively. So, for the moment, human sweat will still be required to accurately assess students: you can’t do everything with multiple choice.
Marking variability isn’t only an issue in exams, of course. It also applies to master’s and doctoral dissertations – perhaps even more so. Assessing these longer works is irreducibly subjective and I have heard of extreme divergences. One colleague of mine was awarded a PhD for a suspiciously short and methodologically dubious dissertation; a professor commented some time later that he would not have passed it as a master’s. Meanwhile, an American colleague tried unsuccessfully to sue his university in New Zealand after it failed his doctoral thesis, which he then resubmitted to great acclaim back in the US.
In the Freiburg case, the student’s grade was raised to an 8 (just below the higher of the two grades), and the student was satisfied. The university’s examinations office told Fudder that between 5 and 10 per cent of students complain about their grades, and between 30 and 50 per cent of those achieve a change. Presumably the change is upwards – although I have heard of grades being reduced as well.
The most important point is that, given the inevitable vagaries of grading, universities and graders need to be transparent and accountable. While all those student appeals can cast a cloud over the academic summer, they are vital to assuring that justice is ultimately done.
Brian Bloch is a journalist, academic editor and lecturer in English for academic research at the University of Münster. He has taught a wide range of economic and business-related subjects, including cross-cultural management.