Pandemic response shines spotlight on coding in science

Debate about computer programs underlying epidemiological modelling leads to wider calls for more openness

June 16, 2020
Visitors pass a giant Videoscreen with moving letters symbolising security codes at the computer- fair Cebit in Hanover
Source: Reuters

The Covid-19 pandemic has brought many scientific issues to wide public attention, but even in these extraordinary times, the way computer coding is used in research is not a topic many would have predicted for mainstream discourse.

Nonetheless, the subject has burst into the open, mainly because of scrutiny of the code used in epidemiological modelling − in particular, the highly influential Imperial College London paper, led by Neil Ferguson, published just as the UK started going into lockdown.

The code underlying the modelling came in for criticism after it was posted to the public repository for programming, GitHub, although, according to a report in Nature this month, scientists who have tested the code have found that its results can be reproduced.

Bill Mitchell, director of policy at the British Computer Society (BCS), said that although it agreed that there was “no credible evidence” of major problems with the Imperial code, the episode had shone a light on the issue of how programming was performed and reviewed in academia.

ADVERTISEMENT

The BCS released a position paper last month in which it said “the quality of the software implementations of scientific models appear to rely too much on the individual coding practices of the scientists” and called for professional software engineering standards to be used where scientific code formed the basis of policy.

Dr Mitchell, a former computing lecturer at the universities of Manchester and Surrey, said there were “lots of very, very standard things that you would expect in the software world” that are not always being done in science.

ADVERTISEMENT

This included code being readily shared on public repositories such as GitHub; being written in such a way that it can be easily understood and tested by others; and tests being published so reviewers can easily try to replicate the results.

“It goes to the heart of doing science. You tell people what experiments you’ve done; you allow them to look at your working,” he said.

Dr Mitchell said his “very personal” view was that scientists might sometimes view coding as just a “mechanical way of generating data” and might not fully appreciate “just how much innovation and ingenuity and cleverness is embedded in their own code and how valuable that is to other people”.

Changing this culture − especially given the “intense” publish or perish pressures in academia − might require incentives similar to those seen in the open access movement, he said.

The “simplest thing” would be to say that all scientific software developed with public money must be made openly available. “I think suddenly when people realise that, ‘Oh my gosh, people are going to be looking at my code’, the standard will instantly improve,” Dr Mitchell said.

Others say the direction of travel is moving towards more openness, but there was a debate to be had about how to speed up progress.

ADVERTISEMENT

“In my field, there has been a movement towards transparency for quite a number of years, and it is becoming more and more common for journals, reviewers and the community to require code to be made available with papers,” said Rosalind Eggo, assistant professor in infectious disease modelling at the London School of Hygiene and Tropical Medicine.

She added that one longer-term solution would be to invest more in employing research software engineers “who are experts in writing and translating scientific code and making it more efficient, shareable and, ultimately, more useful”.

ADVERTISEMENT

“Making sure we have the resources that allow the hiring and long-term funding of software specialists would improve the quality of scientific code and hopefully make it easier to build efficient analysis, and to reuse and repurpose code,” she said.

Konrad Hinsen, a biophysicist at France’s National Centre for Scientific Research (CNRS) and an expert in scientific computing who often blogs on the issue, suggested that employing more research software engineers was a good idea.

However, he added, using them to help write code might be difficult for “small, exploratory projects that are done in informal collaborations”.

“You can’t just add a software expert with a very different working style to such a team. But you can still do after-the-fact code review before accepting results for publication,” he said.

This is where research software engineers could have a key role more generally, including through the traditional publishing process, he said, pointing out that some “pioneering journals” were already including code review as an “integral part” of the peer review process.

More broadly, Dr Hinsen added, the issue was one of “training enough people, and then employing them in appropriate jobs”. However, he was somewhat sceptical about whether progress could be sped up across all disciplines in science.

ADVERTISEMENT

“Much scientific code is long-lived, and habits are even more subject to inertia. Faster improvement is not possible for scientific code in general, though it is in specific, well-defined subjects where motivation is high. Epidemiology might be in that situation right now,” he said.

simon.baker@timeshighereducation.com

POSTSCRIPT:

Print headline: Pandemic models spark calls to reveal more code

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Register
Please Login or Register to read this article.

Related articles

Reader's comments (3)

Somewhat surprised to see no mention of Software Carpentry https://software-carpentry.org in this article.
The article glances over the fact that the attack of Prof Ferguson's coding style was highly suspicious in its motivations, as illustrated by a 'well-known' blog, which I will not dignify with a reference, whose conclusion was to 'defund all epidemiology research'. It seems that some commentators have just discovered that past 'scientific' software was mostly about coding numerical recipes and their algorithmic content may have been much less sophisticated than, say, a GNU Prolog compiler. Disclaimer: obviously my username gives away which style of programming I adhere to, although this bears no relevance to this comment.
Worth mentioning that these concerns are known and have given rise to, among others, the Software Sustainability Institute and the Society of Research Software Engineering.

Sponsored

ADVERTISEMENT