Every research-active academic is familiar with the process of peer review. Certainly, there are differences between disciplines, and debates over double-blind, single-blind and open (in all its different forms) continue to rage. But, fundamentally, most academics with whom I speak hold up peer review as the “gold standard” to which we should subject work.
Yet, there are lots of things about peer review of which most researchers are ignorant.
For instance, just in a logical sense, in my discipline: usually, we have two double-blind reviewers. If they disagree we commission a third to arbitrate. Why, though, should that third be any more or less reliable than the other two experts? How is it that, when two experts disagree, we resolve the dissensus by asking a third? Surely we might as well flip a coin if the two preceding experts have violently disagreed?
Indeed, the predictive power of peer review is frequently overrated. Several studies, for instance, have traced the fact that Nobel prizewinning work has been rejected. Furthermore, another experiment by Peters and Ceci resubmitted papers that had been previously accepted at journals in disguised form. Only 8 per cent were detected as plagiarism, yet 90 per cent of the submissions were ultimately rejected.
One might also consider the fact that more than half of rejected papers go on to be published elsewhere anyway: a huge redundancy of labour in re-reviewing work in order to maintain a hierarchy of journal exclusivity. This is why our research team has previously been so concerned about the discourse of “excellence”. It turns out that not only are we poor at defining excellence, we are also poor at spotting it in advance.
Why are we so unaware of how well peer review works? Well, for one thing, it’s usually quite difficult to study, despite the fact that the programme at the Peer Review Congress conference appears as healthy as ever. Layers of anonymity combine with corporate interest and personal copyright to make it very difficult to obtain datasets of reader reports on which one can work. Furthermore, to question peer review as a researcher is in some ways to put one’s reputation on the line: “is s/he only attacking peer review because his/her work isn’t good enough?” is the type of question that others might ask.
This is why a recent research group of which I am principal investigator, thanks to a grant from the Andrew W. Mellon Foundation, will be working with Plos One to investigate their review process. Plos has always had a clause that allows its dataset of reader reports to be used for research purposes, and Veronique Kiermer, executive editor for Plos Journals, will be on the team.
Under conditions of strict confidentiality and report anonymity, our project seeks to describe the anatomies/structures of peer-review reports at Plos One; what do these documents look like when read at scale? We will also be examining aspects of sentiment and stylometric measurement.
For instance, we’d like to know how well reviewer sentiment measures can act as a proxy for overall acceptance. Furthermore, which stylometric indicators, if any, correlate with acceptance, rejection or high-impact articles? Can we train an artificial neural network to recognise which parts of a paper are being described by a reviewer and to attach a sentiment score to this? The latter work could certainly go on to have useful impact for the publishing industry.
Yet, in some ways, the questions we can ask here are niche and specific. We do not have a comparison dataset, so we will be working solely on Plos One’s reviews. This comes with some challenges. Plos One’s peer-review criterion of “technical soundness” is certainly different from that of other venues. Yet it also remains the only space in which we are currently able to conduct this work, although a future extension to examine the Wellcome Trust’s Wellcome Open Research open reviews would be an area for future exploration.
It also means that, since the criteria are so different, we will be able to ask questions about how well reviewers adapt to this new set-up. Is it the case that reviewers disregard novelty, or do they, in fact, revert to what they know and comment on novelty/significance in their reports?
We hope, overall, that the project will bring both some well-needed scrutiny to peer review and will give us an initial insight into the types of questions that we can ask of such datasets. If we can show enough value from the project’s outputs, we hope that this might encourage other organisations to ensure that their own reports can be used for future research in a safe way. Finally, all outputs from the project will be open access, ensuring the broadest dissemination and reach. Provided, of course, the work passes the rigorous standards…of peer review.
Martin Paul Eve is chair of literature, technology and publishing at Birkbeck, University of London.