An academic ChatGPT needs a better schooling

AI agents are what they ingest. Rather than scraping the internet, better to confine their diets to books and encyclopedias, says Sorin Adam Matei

Published on

November 28, 2023

Last updated

November 29, 2023

Sorin Adam Matei

A robot reading

Source: iStock

If you know how ChatGPT works, you won’t be surprised to learn that AI detection filters consider it highly likely that the chatbot had a large hand in writing the US Constitution and the Book of Genesis. Nor will you be surprised that ChatGPT is biased towards the latest intellectual ideas, skewing liberal.

AI agents are prediction engines using the web as their memory. They do no more than predict which words are more likely to follow any other word or group of words in a given language. When you ask ChatGPT a question, it parses it into words and their sequence, returning answers that match those sequences in reverse. It might sound like a simple trick, and it is, yet the secret sauce is the size of the database the AIs use to perform it.

Of the very heterogeneous mix of content used to train ChatGPT, 60 per cent was a hotchpotch of information culled from websites, blogs or social media. Another 20 per cent was content shared on Reddit and evaluated relatively highly by the users. The rest was books typically found in the public domain (mostly older and general purpose), with a bit of Wikipedia (3 per cent) mixed in for good measure.

AIs store for each word the probability that any other word will follow it. The quality and value of these predictions depend very much on how often and in how many circumstances the software encounters any two (or more words) in proximity, how long a sentence goes, and which sentence might follow another. When put together, these predictions favour the most influential texts of a given culture, which shaped generations upon generations of English language teachers and the students they educated.

Fed and raised on the incantations of Shakespeare and the literature that grew out of King James Bibles, this traditional English thought pattern could not but create AIs that could regenerate the Bible or the Constitution as if they were common knowledge. Yet when asked questions about everyday issues, AI agents will be more likely to use a liberal-secular tone because this perspective dominates web conversations.

Frequently, AI content mixes heavenly and earthly perspectives. For example, when you tempt ChatGPT with the prompt “Continue the story: In the beginning there was…” it will promptly deliver a Genesis-style Feynman physics lecture, “In the beginning, there was a profound stillness that seemed to stretch for eternity. Within this void, a single point of unimaginable density and energy existed. This singularity held within it the potential for all that would come to be. Then, in an instant that defied the very concept of time, the singularity erupted in a cataclysmic explosion known as the Big Bang.” (Try it, although your answer might vary.)

The overlap of old and new in ChatGPT-generated texts is not the cause but the result of the ongoing cultural strife of the American mind with itself. This tension should not lead to finger-pointing. But we do need a healthy conversation about the origins and uses of ChatGPT or its siblings, such as Google’s Bard, Facebook’s LLAMA or Anthropic’s Claude.

First, is such training, jumping from green energy and trans rights to sermons and pro-life arguments in one click, appropriate for a tool used in the academy? Suppose we raised the AI models/agents on a diet of 80 per cent books and 20 per cent information from curated encyclopedias, including Britannica. In that case, they would be less focused on the vagaries of the present and more concerned with the age-old dilemmas and gained certainties of academic knowledge.

Creating AI agents that cater to academic needs could be an expensive proposition, of course. However, given the enormous resources of the leading US and European universities, this could be a stimulating problem to be solved by a large consortium of higher education institutions, such as the American Association of Universities (AAU) or the European University Association. ChatGPT 4 cost “merely” $100 million (£79 million) to train. The AAU universities, a group of 69 large state and private universities, received $31 billion in funding in 2021.

Second, ChatGPT was created with a “just in case” mentality. It was meant to answer all questions for all purposes. This leads to tentative, “he said, she said” answers – even to questions whose answers we should be sure of, such as whether vaccines save lives or whether Communism is as genocidal as Nazism. When trained on specialised information, it should express more confidence about matters that truly matter.

Third, ChatGPT speaks like a parrot because its delivery is not automatically adjusted. More research and engineering are needed to calibrate the tool to each request’s real-life intentions and consequences. In academic learning, these situations should be the pre- and post-stages of the research process: finding arguments and packaging them for public consumption.

The in-between, the moment of discovery, should be reimagined in future pedagogies to scaffold around rather than fall back on AI agents. Assignments must connect to specific competencies demonstrated across written, multimedia and oral presentations. A return of the in-class written or oral exams (horribile dictu) should not be out of the question.

In their current forms, ChatGPT and its siblings are like those three-year-olds who can recite entire stories read to them only once. But turning a three-year-old into a learned person takes 20 years of strenuous, structured education. It is time to stop reading AI agents stories and send them to a real school.

Sorin Adam Matei is associate dean of research and graduate education at Purdue University’s College of Liberal Arts.

Read more about

Read more about:

Teaching and learning

Educational technology

Register to continue

Why register?

Registration is free and only takes a moment
Once registered, you can read 3 articles a month
Sign up for our newsletter

Subscribe

Or subscribe for unlimited access to:

Unlimited access to news, views, insights & reviews
Digital editions
Digital access to THE’s university and college rankings analysis

Please or to read this article.

Related articles

Montage: people dance about on what looks like the CPU of a computer, in the middle of which is a square protuberance with the letters AI on the top

After a year of ChatGPT, is academia getting to grips with generative AI?

Students love it but faculty typically hate it. Both are asking for help with it. But how close are institutions to devising AI policies that protect both academic integrity and student employability? New York University Abu Dhabi vice-chancellor Mariët Westermann offers her reflections

By Mariët Westermann

23 November

A robot hand presses a computer space bar, symbolising ChatGPT

Science journals overturn ban on ChatGPT-authored papers

Prestigious publishing group will allow authors to incorporate AI-written text and figures into papers if technology use is acknowledged and explained

16 November

MENA Universities Summit

MENA Summit: active learning focus urged in response to AI

Conference attendees discuss how to respond to the ‘600-pound gorilla in the room’ on university teaching

By Patrick Jack

14 November

AI pioneer: ChatGPT will soon become scholars’ ‘debate partner’

Gone are the days when AI was seen as the villain in education, according to leading HKUST scientist

17 October

Sponsored