Inside the post-ChatGPT scramble to create AI essay detectors

Edtech giants and plucky start-ups are vying to create potentially lucrative tools to combat the use of AI in assessments, but will they cause more problems than they solve?

二月 6, 2023
Montage of metal detectorists on beach with newsprint. To illustrate the scramble to create AI essay detectors.
Source: Alamy/Getty montage

The case of a “desperate” student wrongly accused of plagiarism lives long in the memory of academic integrity expert Tomáš Foltýnek.

Writing about why Apple had proven to be such a successful technology business, the female undergraduate had been horrified to find that her university had flagged the essay as cheating because 30 per cent of it matched with other sources, according to the ubiquitous plagiarism detection software developed by edtech giant Turnitin.

“I looked at the Turnitin report and saw just random matches – a couple of words here, half a sentence there – with other student essays on a similar topic,” said Dr Foltýnek, a computer science lecturer at Masaryk University in the Czech Republic.

He said the problem was that Turnitin’s database contained thousands of such essays and the student’s teacher had “blindly” followed the plagiarism score and initiated disciplinary procedures. On this occasion, the mistake was easy to rectify by comparing the essay to the work that was said to have been plagiarised and judging whether the accusation was warranted.


THE Campus views: ChatGPT has arrived – and nothing has changed


But trying to detect academic writing generated by artificial intelligence (AI) poses an altogether different challenge, according to Dr Foltýnek.

“There is no source document to verify,” he explained. “The teacher cannot prove anything, and the student cannot defend themselves. The only thing the teacher knows is that this particular sentence or passage looks similar to what AI would generate.”

The emergence of ChatGPT in late 2022 – and the global attention it has gained – has accelerated a race to create a potentially lucrative tool that could be used by teachers worldwide to detect when AI might have been used in assessments.

While few universities – Paris’ Sciences Po being an early exception – have implemented outright bans on the chatbot made by OpenAI, the clamour to understand when it has and has not been used by students has led to the development of a raft of new apps, ranging from those designed by entrepreneurial undergraduates during their winter break to Turnitin’s own version, due in the first half of this year.

Dr Foltýnek feared that such tools would lead to more students being “routinely” accused of misconduct but without any way of defending themselves, since it is hard to convince readers that they did not use ChatGPT.

Jesse Stommel, assistant professor in the writing programme at the University of Denver, agreed that plagiarism detection tools had been “plagued” by false positives and there was no reason that the same will not be true of AI detectors.

This, he said, neglected the fact that “when students cheat, it’s usually unintentional or non-malicious” and such initiatives will only fuel “a culture of suspicion in education…driven all too much by corporate profit”.

AI detectors work by looking for “statistical variations or surprises in the text”, explained Mike Sharples, emeritus professor of educational technology at The Open University. “The idea is humans tend to vary their text, they don’t just write in a predictable way. Whereas AI tools have been trained in a way that is more predictable.

“They work to a certain extent. If you give them an essay written by ChatGPT, sometimes they can detect with a high confidence that it has been written by AI, but sometimes they can’t.”

In Australia, University of Technology Sydney graduate Aaron Shikhule has developed AICheatCheck, which provides a score for a piece of work, showing what percentage it thinks was written by AI as well as an indication of whether the essay is of a high school or college standard.

Mr Shikhule said the tool combats AI with AI and scans words and sentences to look for patterns in a similar way that bots themselves create a piece of writing.

He and his co-founder, David Cyrus, had already been working on an app when ChatGPT exploded on to the scene, and they expedited its release to capitalise on the interest.

The response has been “through the roof”, said Mr Shikhule, who said he was exploring ways to license a new version of the software to universities, as well as planning for how to strengthen the model so it can deal with the release of ChatGPT4, due later this year.

“We created the tool because we believe in academic responsibility,” he said. “There is nothing wrong with AI, but it is important there are mechanisms to protect from people abusing it.”

Another of the apps that has been making headlines is GPTZero, developed by a Princeton University senior, Edward Tian, during his winter break while finishing his thesis on AI detection for his computer science major.

A basic version is already available online, and Mr Tian has recruited a lot of help and interest to develop something more sophisticated, which he has pledged to launch soon.

But Professor Sharples said that when he put into this system an essay that he knew was generated by AI, the software said it was “most likely human” and flagged only a few sentences as being potentially AI-written with low confidence. The prompt Professor Sharples had given ChatGPT was to write a high-quality essay with academic references. And it is capable of handling more sophisticated commands such as being asked to vary the words so they are less likely to be detected.

OpenAI itself has developed a tool for detecting AI-generated text. But it admits that it is “not fully reliable” and is likely to incorrectly label human writing as AI-written 9 per cent of the time.

Dr Foltýnek said academics should be very wary anyway about using “random apps on the internet” to check student essays because it could violate privacy laws.

Although there was no suggestion that the new ChatGPT apps had been set up for nefarious reasons, previously an online “plagiarism detector” had been found to have been storing uploaded essays and later selling them on via an affiliated essay mill, he cautioned.

Universities may be more likely to stick to what they know, and whatever is developed by Turnitin is likely to be in as much, if not more, demand as its plagiarism checker – which received 232 million submissions in 2021.

The company’s chief product officer, Annie Chechitelli, said an AI detector was already in development and engineers were now working at speed to get something out to customers.

“There are different ways to roll something like this out to market,” she said. “One is you wait until it is pretty robust and well tested, and you have a certain amount of data. Or else you put something out that we know is incomplete feature-wise but shows the direction we are moving in.

“We asked our community, and they overwhelmingly told us that perfect was the enemy of the good and the sooner they could get some basic detection, the better. Using the detection itself is a deterrent. Being able to say Turnitin has this will reduce the misuse of it.”

A prototype of the software in development has already been shared by the company. It analyses a text to show how many of the sentences were probably written by ChatGPT – and to what degree of certainty.

Although certainly a challenge, Ms Chechitelli said she was confident that the tool would work with a high level of accuracy. What made it more complicated, however, was the different demands of users.

Some might be happy for students to use ChatGPT for certain assessments, she said, and others less so. Many want a tool that can check references – which ChatGPT has been shown to make up – or allow for additional checks on academic integrity, for example getting students to submit videos alongside their work or an essay at the beginning of the course that can act as a baseline against which to compare.

Ms Chechitelli predicted a raft of different approaches even within institutions, and Turnitin’s tool has to accommodate all this and quickly provide the information needed in an easy-to-understand format.

For some, such efforts are an “arms race” that will never end, given that future AI writing tools will be trained to produce less detectable content.

Professor Sharples said anti-cheating tools that use pattern detection are likely to be useful only temporarily, given that they will soon be overtaken by new text generators that mimic human variation in language.

“If you start penalising students based on the response from one AI system pitted against another AI system, it is a recipe for doom,” he said. “Students are going to challenge this, and it may well get into legal battles. If universities are relying on AI detectors, it is going to be very difficult for them to defend, particularly as we know these are not foolproof.”

He said rethinking assessment and developing a clear set of guidelines on where such tools can and cannot be used would reduce the need for detectors.

Educators must “raise an eyebrow at any technology”, agreed Dr Stommel, and be as vigilant about detectors as they may be about ChatGPT itself.

“We need to ask what pedagogies are embedded in these tools, how they are monetised, how they remove or enable student or teacher agency. The work of teaching is never easy. Institutions need to start by trusting teachers and draw them into conversations about how technology changes education.”

tom.williams@timeshighereducation.com

请先注册再继续

为何要注册?

  • 注册是免费的,而且十分便捷
  • 注册成功后,您每月可免费阅读3篇文章
  • 订阅我们的邮件
注册
Please 登录 or 注册 to read this article.

相关文章

人工智能很快就能像人类一样进行研究和写作。那么,真正的教育会被这种作弊的浪潮淹没吗?还是说,人工智能只会成为教学和评估的又一种技术辅助手段?来自约翰·罗斯(John Ross)的报道

7月 8日

Reader's comments (6)

How about we give an IT programme the following essay title "Time Flies Like an Arrow; Fruit Flies Like a banana; discuss the attributes of fruit aerodynamics as perceived by temporally-distressed angry insects".
The easiest way to check would be to sit down with the student and say "without looking at a copy of it, tell me about your essay"
I'd love to have a chat with each of my 406 first year undergraduate students about their 'Ethics for Computer Science' work... but it would take a looooong time!
Oral viva's would be the best way to check if the student wrote the essay. If they wrote the work, they will be able to explain every aspect of it.
How about a combination of the two. Don't viva every student, just a sample. Every student *might* get a mini-viva and they know this. Detectors are useful in flagging those essays that might be worth a quick chat to the student - but it's up to the educator. Equally, human judgement when marking can is useful to identify those that might be worth a chat. We could viva borderline or extreme marks (or just sample randomly) too, so that a viva doesn't necessarily mean that your work is suspect. Academic integrity training and honour codes are also vital.
I recall several years ago, in response to pressure from universities, Turnitin saying they were scrapping their 30$ student version, using which, students could test their dissertation for its plagiarism score and then edit it (I think three edits were included) until it came up as satisfactory. It appears such a service is still available: powered by Turnitin; Authorised partner Turnitin; your writing stays private - no other plagiarism checker will see your text. https://www.scribbr.com/plagiarism-checker/ Univerisities should boycott Turnitin until it stops this cynical money-grabbing activity.
ADVERTISEMENT