Nancy Rothwell

April 30, 2004

Objective quality assessment can be a minefield. Is good old peer review still the best option or is it time to go 'metrics'?

Judging quality is an integral part of academic life. Marking student essays, reviewing funding applications or manuscripts, appointing and promoting staff all require time and judgement. But it is difficult to ensure objectivity, accountability and reproducibility.

For undergraduate work, the process is relatively straightforward. We should more or less "know the answer" - after all, we taught them. Anonymised scripts, double-blind marking and model answers help to ensure objectivity - but such efforts seem to be taking on a life of their own, with assessment often requiring more effort than teaching. It would be interesting to see an analysis of the benefits of double or even triple marking, compared with the effort expended.

Assessing research is a much trickier business. Objective and reproducible assessment of new pieces of research, funding proposals and the quality of individuals' work present challenges. This is difficult within fields, but the "apples and pears" comparisons across disciplines or at the interfaces (interdisciplinary research being a hot topic) can be a minefield. We normally rely on the tried-and-tested system of peer review. At its best, peer review depends on a group of respected experts taking considerable time and effort to make objective and disinterested decisions for the good of their discipline. But reviewers are human, and no one is perfect (particularly if they reject your paper or grant). Sometimes they miss the point, haven't read the paperwork in detail, have a biased view or are simply misinformed. Yet, in general, we rely on this system as the best available.

In some cases, though, the quality of the work is not the only factor. The top journals consider the general impact of a manuscript and its interest to the breadth of their readership (or even its newsworthiness), as well as the quality of its research. Funding bodies have strategic goals and are required to balance research across disciplines, while the judgements of individuals bring in a whole range of factors, from how we assess applied outputs to social factors such as family leave. Views on these issues vary.

Added to this is the fact that the peer-review system is creaking under the weight of work. How many manuscripts, grants, CVs and so on can we assess well - and still find time to double mark several hundred student essays?

The biggest assessment exercise of all in the UK is the research assessment exercise. This is a huge effort for those submitting and assessing documents, and it has a major impact on funding. It seems that the assessment processes varied considerably between RAE panels in 2001; some considered the unit as a whole, others looked at groups, while others read many or most of the individual outputs of each staff member returned. Some panels paid great attention to the data on grants, students and research associates; for others these were less relevant. Changes in the assessment process have been proposed for the next RAE. Many of these are welcomed, not least the inclusion of "continuous" assessment, which avoids the "precipice" effect on the edges of grades - but, of course, the real issue will be the final funding model.

The overwhelming majority of responses to the Higher Education Funding Council for England's consultation on the next RAE favoured the classical peer-review system. But some respondents suggested alternative review systems, not least for those units that were likely to see thousands of staff returned and where peer review becomes difficult, if not impossible.

An argument was made for assessment based on numerical factors ("metrics") such as funding, citations, impact factors of journals and numbers of PhD students. These have been shown to be effective in some disciplines, largely in science subjects, though in humanities it is probably much more difficult to identify reliable metrics.

A lengthy academic debate could be held on the reliability and choice of such factors. There are problems with metric analysis of groupings across disciplines (even within a given panel), in the timescale of assessing impact and citations and in many aspects of applied research. It seems the use of metrics is premature, even given the huge effort in time and money spent on peer review. But we have an excellent opportunity to test the hypothesis that it could be useful in some disciplines. It does not seem too onerous a task to include in the next RAE a comparison of various metric analyses against the agreed peer-review system. This would not only answer the peer review versus metrics debate across and within subjects, but might also tell us which metrics we should pay attention to - and the resulting publication would be likely to have a significant impact.

Nancy Rothwell is MRC research professor in the faculty of life sciences at Manchester University.

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Register
Please Login or Register to read this article.

Sponsored