Staff are seeking exams that will reduce workload. Gareth Holsgrove looks at the pros and cons of multiple-choice questions
Multiple-choice questions (MCQs) have existed as an examination method for more than half a century. In that time, different styles have been tried, tested and, in most cases, discarded. Their attraction is that they allow examiners to test a lot of material in a short time and papers can be marked quickly and accurately using optical mark reading equipment.
There are two main types in general use in the United Kingdom: multiple true/false and one best answer. Both have a similar basic structure: an opening statement and a question stem, followed by a number of options, known as branches. A third, even better multiple-choice format, called extended matching, has recently been developed from one-best-answer MCQs.
Multiple true/false MCQs often have short question stems, sometimes two or three words, typically followed by five branches. Candidates are required to indicate whether the information in each branch is true or false. Such questions seem straightforward, but they are beset with problems. Although they are still widely used in medical education, their days are surely numbered.
The second and better format is one best answer. Properly written, they consist of a detailed question stem, usually of two or three sentences, and a list of short branches from which the candidate has to select the one most appropriate answer.
Examiners have two main criticisms of MCQs: they are difficult to write and candidates can gain marks by guessing. The paradox is that students are expected to prepare for exams, yet their examiners may not have been taught how to set or mark them properly.
Guessing is a real problem. Many exams have set out to prevent or correct for guessing by deducting marks for incorrect answers, known as negative marking or penalty scoring. But penalty scoring introduces a new, uncontrollable variable - each candidate's confidence in venturing an answer that they are less than 100 per cent sure about. This raises the question of where penalty scoring comes into operation on the continuum that runs from absolute certainty to total ignorance.
It can be argued that if students do not trust their knowledge enough to answer, they do not functionally possess it. However, negative marking reveals another issue - some students may possess the knowledge and be able to demonstrate it in real life, but are so apprehensive about the prospect of losing marks in an exam that they adopt a cautious strategy.
A survey of more than 200 candidates taking a high-stakes postgraduate medical examination that involved multiple true/false MCQs, with one mark awarded for each correct answer and one deducted for each incorrect response was revealing.
The candidates were asked to indicate the point at which they would commit themselves to answering a question, using a ten-point rating scale, anchored at one end with "I would make a completely blind guess" and at the other with "I would answer only (one) if absolutely certain" (ten). Responses covered all ten points on this scale. The mean (4.4) and the modal values (3.0) were both on the "guess" side of the mid-point on the scale (5.5).
Negative marking throws up two other problems. Imagine two candidates who achieve the same final score in a negatively marked exam. One candidate answers only part of the paper, losing a few marks for wrong answers. The other answers almost all of the paper, getting far more correct but also getting several wrong and having marks deducted. Which one knows more?
The second problem is that the use of negative marking extends the theoretical range of marks for the exam. For example, if one mark is awarded for every correct answer and one deducted for each wrong answer, the theoretical range of the exam is - 100 to +100 per cent.
However, this is not usually taken into account when performing the statistical analysis of the exam and, therefore, the performance indicators may be wrong. It also raises the question of what the exam has been testing in the case of a candidate whose final mark is less than zero.
Instead of continuing the quest for a negative-marking system that works, MCQ exams can be improved by following the advice in Robert Wood's Assessment and Testing (Cambridge University Press, 1991) and "contain the damage by reducing omission (the real culprit in all this) to a point where confidence differences have no real distorting effect on the estimation of achievement or ability".
There is evidence that this approach works. A postgraduate royal college abandoned negative marking several years ago and the reliability of its examination improved immediately. Another medical royal college is planning to replace negatively marked true/false MCQs with one-best-answer questions without negative marking. A third, this time overseas, is well advanced with similar developments in all its medical and surgical fellowship examinations.
These developments should generate enough data to confirm what common sense has indicated for years - multiple true/false MCQs and negative marking have too many adverse characteristics. We can do better.
Gareth Holsgrove is proprietor of Cambridge Medical Education Consultants, which specialises in curriculum and examination development, quality assurance, organisational management and faculty development. See: www.cambridge-medical.bigstep.com