We’re investing in data quality to strengthen our rankings

The use of generative AI and a new engine drawing on data from government agencies are among the ways in which we’re doubling down on quality, says David Watkins

二月 24, 2025
data validation quality control
Source: iStock / amgun

At Times Higher Education, we have a duty to higher education worldwide to uphold rigorous standards in the creation and production of university rankings. We know that students, parents, governments, industry, staff and university leaders look to these rankings to understand the performance of institutions, areas of excellence and opportunities for improvement. 

Last year Phil Baty, THE’s chief global affairs officer, wrote an eight-point guide to ranking responsibly, with one of the points being: Invest properly in data collection, validation and quality assurance.” We at THE take data quality very seriously. We have invested significantly in people, process and technology to continuously improve the quality of all our various data sources. 

Over the next few weeks, I will write a regular blog to examine each of our key data sources and discuss how we are ensuring the quality of the data. First, a brief overview. 

University data 

Universities across the world kindly provide us with their institutional data  on an annual basis, and every year about 15 per cent more universities submit data to us. We really appreciate this time and effort from universities and we have a team dedicated to answering questions from institutions. In 2024, we invested in building a new data quality engine that allows us to more accurately detect potential issues in those submissions and fine-tune our queries back to universities. This engine uses data from more than 70 major government and education agencies around the world for verification, as well as deep statistical analysis to detect anomalies. 

Evidence data 

To compile our Impact Rankings, which measure universities’ contributions to the United Nations’ Sustainable Development Goals, we collect and analyse more than 250,000 evidence documents annually. Since the creation of those rankings in 2019, we have analysed those documents manually but last year, to increase efficiency, scalability, accuracy and consistency, we started using generative AI to help assess those documents. Currently one-third of documents are analysed this way, and we are working towards sending far more documents through this automated process. 

Reputation data 

Last year, more than 55,000 cited academics from around the world participated in our global Academic Reputation Survey, a fivefold increase in participation from our 2021 survey, which has improved the signal-to-noise ratio. During that period, we have restricted self-votes, ensured voting patterns are sufficiently diverse and acted on voting syndicates. We are strengthening our data quality checks this year to ensure ever more rigour. 

Citations data 

We utilise bibliometric data from our partners at Elsevier to understand research quality at universities. Two years ago, we came up with a new suite of research quality metrics to help address anomalies, and universities have given us a lot of praise for one metric in particular: research influence. This metric significantly strengthens our rankings by determining the relevance of citations. We are looking at further bolstering the effect and use of the research influence metric in our rankings and external analyses. 

David Watkins is managing director of data at Times Higher Education.   

请先注册再继续

为何要注册?

  • 注册是免费的,而且十分便捷
  • 注册成功后,您每月可免费阅读3篇文章
  • 订阅我们的邮件
注册
Please 登录 or 注册 to read this article.

相关文章

ADVERTISEMENT