We need a social science of data

As data becomes ever more central to policy and commerce, its creation demands closer scrutiny, say Cristina Alaimo and Jannis Kallinikos

六月 11, 2024
A magnifying glass examining box-and-whiskers plots, signifying big data
Source: iStock/ipopba

The developments that have made the internet a widespread channel of transaction and communication have also made data a pervasive component of personal life and a ubiquitous medium that organisations use to structure and conduct their operations. Originally developed to manage administrative and analytical tasks, these techniques have since consolidated into a new body of knowledge known as data science.

As a scientific field, data science represents a mix of statistical methods and computer programming. It is increasingly called upon to predict and manage such diverse things as interaction patterns on social media, city traffic flow, crime detection rates, insurance risks, health care demand and consumer behaviour. And over the past two decades, many higher education institutions have introduced data science programmes that enrol increasing numbers of students.

Impressive as data science is, it treats data as technical elements that can unproblematically be piled up and computed. This overlooks the interests, specific purposes, attitudes and presuppositions that drive data generation and use. While data may appear to be unquestionable carriers of facts, they are nonetheless human inventions and inevitably encode particular interests, purposes, perspectives on the world and unspoken biases. Data-making always involves arbitrary decisions on what to record and why.

For instance, the listening habits of individuals on streaming platforms are rendered into data only through a series of assumptions and rules about what qualifies as a listening or viewing event. Must a track be listened to all the way through, or is just a part of it enough? Cultural conventions and categories (such as artist names or genres) also inform how listening events are classified and related to one another: these are sociocultural processes, not facts.

The measurement controversies that surrounded the Covid pandemic are another good example. The gathering and interpretation of data was crucial to determining who was infected and how quickly, how infection propagated, who died from Covid or other complications, and the efficiency of vaccination. But none of it was uncontroversial. It was a vivid reminder of the ambiguities, predilections or biases that infect data-making, collection and use even in areas where impartial expertise is expected to reign.

Data science can only marginally address this sociocultural embeddedness of data. Its predominant focus is on the efficient computation of standard measures once data have been produced and standardised. Hence, we also need a social science of data, a body of knowledge that can unpack the assumptions, sociocultural presuppositions and methods through which data are generated and made to matter.

Are, for instance, the data produced in healthcare institutions enough to address patient welfare, or do we also need systematic data on daily habits and lifestyles? This is a professional, political and community matter, not an issue of computational suitability or efficiency. The practical and technical knowledge of data science must be complemented by a scientific field that can respond to these challenges and trace their implications for social practice and institutions.

Determining how such a field will look is not the job of two people but, rather, that of a whole scientific and social discourse that we as a society have the obligation to develop and maintain. Students and data users must know the power and subtlety of the artefacts they study and employ.

Such a scientific field should also provide the basis for analysing the social relations and economic dynamics of data generation and use, which is closely associated with several social groups, professions, communities and firms. Healthcare, again, is a good example. Diverse actors – medical staff, patient groups, hospitals, diagnostic centres, insurance firms, pharmaceutical companies, state agencies and others – are connected in large data ecosystems, but their interests and goals are not always aligned. In that sense, the data produced in that ecosystem are the objects of negotiation.

While laying down the foundations for the semiotic, cognitive and communicative analysis of data, a social science of data should also provide the conceptual tools for charting data’s novel social and institutional dynamics.

As data gets ever bigger and more central to policy and commerce alike, the case for examining its provenance much more closely that we currently do only gets stronger.

Cristina Alaimo is assistant professor (research) of digital economy and society and Jannis Kallinikos is full professor of organization studies and CISCO chair in digital transformation and data-driven innovation at LUISS University, Rome. They are co-authors of Data Rules: Reinventing the Market Economy (MIT Press), published this month.

请先注册再继续

为何要注册?

  • 注册是免费的,而且十分便捷
  • 注册成功后,您每月可免费阅读3篇文章
  • 订阅我们的邮件
注册
Please 登录 or 注册 to read this article.
ADVERTISEMENT