To make research more inclusive, we must rethink citation ‘relevance’

Bibliographic databases’ default ‘sort by relevance’ listings perpetuate bias towards white, Western men, says Katy Jordan

Published on

April 19, 2023

Last updated

April 19, 2023

Twitter: @katy_jordan

Filing cabinets symbolising database sorting

Source: iStock

It may seem like a trivial detail, but by making a small change in our online search habits, academics could help to address some well-known problems with under-representation in education and research.

I’m referring to that setting in the corner of most bibliographical databases marked “sort by relevance”. In all likelihood, the last time you trawled a scholarly database, that was your default setting – and it probably made sense to use it. But there are good reasons to think again.

Typically, this function risks perpetuating biases in academic publishing that over-represent scholars in high-income countries. The beneficiaries tend to be researchers who are white, Western and male, while other contributions are overlooked.

Many of us are aware of this as a wider problem; after all, the evidence has been around for years. A 2013 study, for example, found that articles with a female first author tended to receive significantly fewer citations than those with a male first author. The world’s most-cited research still comes disproportionately from Europe and the US. According to one analysis, take any “international” peer-reviewed journal, and chances are that just 1 per cent of its contents will be from anywhere in sub-Saharan Africa.

It is easy to forget that when scholarly databases sort by “relevance”, the algorithms pre-selecting content are built on this uneven ground.

Take, for example, Google Scholar, which is by far the most popular literature search platform. Its publicly available explanation of how it ranks content tells us that it does this “the way researchers do”, taking into account factors such as the content, journal of publication, the author and recency of citations “in other scholarly literature”.

Want to write for THE? Click for more information

Put simply, the first few pages of returns will probably represent the greatest hits of established researchers in whichever field you’re searching. Based on what we know about pre-existing biases, work by women, scholars of colour, early career researchers or those from the Global South is much more likely to be buried.

To what extent are academics aware of this – and what, if anything, are they doing about it? In a recent study, a colleague and I surveyed 100 academics about how they use search platforms and the assumptions they make about them. We also analysed how “relevance” is defined by 14 of the largest academic bibliographic databases, including Academia.edu, JSTOR, PubMed, Scopus and Semantic Scholar.

Encouragingly, most researchers were wary of Google Scholar. They were frequently uncertain how it determined relevance and often described this as a sort of “algorithmic magic”. As one participant put it: “It’s a total black box”.

Most researchers, however, told us that their main strategy in response to this opacity was to use some of the other, more specialised databases that we also looked at in the study. When we asked about how these sort by relevance, algorithmic magic never came up. This is a problem because in reality their algorithms are just as opaque.

Find out more about how to get full unlimited article access to THE for staff and students.

In fact, of the 14 databases we looked at, “sort by relevance” was the default setting in all but two – and seven provided no information about how this was determined. The remainder offered sketchy details. In many cases, they appeared to rely heavily, once again, on citations and reputation metrics. Well-meaning academics who look to these sources to avoid biases may, therefore, be inadvertently reproducing them.

What can be done to fix this? For a start, the database providers should be more transparent. A brief definition of relevance should be a bare minimum; it is shocking that in some cases no definition is provided at all. Going deeper, developers ought to consider a radical rethink of their ranking algorithms, given what we know about pre-existing biases in citation practices especially.

In recent times, various resources and guidelines for “positive citation practices” have been developed to help researchers ensure that the literature they are citing draws on an appropriately wide range of potential sources. These are only being used on a piecemeal basis right now, but they could be more standard. Universities could set their use as an expectation for research staff, and journals could adopt them as a submission requirement.

It's also up to us, however, to make ourselves and others aware of the risks of citation bias. Staff development programmes, especially for new researchers, could easily incorporate information highlighting the problematic nature of ranking by relevance.

And the simplest measures we can take? Diversify our searches and tweak our search settings. Most platforms allow users to customise how they receive information, so the next time you do a literature search, switch “sort by relevance” off. You might be surprised by the results. And they will almost certainly be fairer for everyone.

Katy Jordan is senior research associate in the Faculty of Education at the University of Cambridge.

Read more about

Read more about:

Academic publishing

Register to continue

Why register?

Registration is free and only takes a moment
Once registered, you can read 3 articles a month
Sign up for our newsletter

Subscribe

Or subscribe for unlimited access to:

Unlimited access to news, views, insights & reviews
Digital editions
Digital access to THE’s university and college rankings analysis

Please or to read this article.

Related articles

An old wooden door

Open access is not enough. We need open equity

Publishers, libraries and funders must do what they can to ensure that no one is priced out of open-access publishing, says Mandy Hill

15 April

A warning sign

Journal blacklists are a useful way to promote academic integrity

We can’t discourage scientific opportunism without providing information about which publishers to avoid, says Natalia Letki

By Natalia Letki

14 April

We won’t defeat predatory journals by making a list of them

Many such journals are on government-approved lists and indexed in mainstream bibliographic databases, says Emanuel Kulczycki

By Emanuel Kulczycki

4 April

Bangkok commuters head home negotiating a narrow walkway through heavily flooded downtown streets

‘Buy papers or lose your job’: the dilemma facing Thai academics

Linking salary rises to publication record has fuelled research ethics violations, with early career academics struggling the most

18 March

Sponsored