More regulation could make the job of detecting whether academic writing has been generated by artificial intelligence easier, amid concerns that tools created for this purpose are suffering from low accuracy rates and inbuilt biases.
Universities worldwide have embraced the use of AI detectors to combat the rising concern that the likes of ChatGPT and its successor GPT-4 can help students cheat on assignments, although many remain wary as an increasing body of evidence shows that they struggle in real-world scenarios.
In a paper published in June, researchers based across European universities concluded that “the available detection tools are neither accurate nor reliable and have a main bias towards classifying the output as human-written rather than detecting AI-generated text”. This followed another paper that showed that students whose second language was English were being disproportionately penalised because their vocabularies were more limited than native English speakers’.
A third study from academics at the University of Maryland confirmed inaccuracy concerns and found that detectors could be easily outwitted by students using paraphrasing tools to rewrite text initially generated by large language models (LLMs).
Campus collection: AI transformers like ChatGPT are here, so what next?
One of that study’s authors, Soheil Feizi, assistant professor of computer science, said the flaws in the tools had already had a “real-world impact”, with many cases of students suffering “trauma” after being falsely accused of misconduct.
“The issue is that the ‘AI detection camp’ is quite powerful and is successful in muddying the water: they often evaluate their detection accuracy under unrealistic or very specific scenarios and don’t report the full spectrum of false positive and detection rates,” he added.
One of the detectors Dr Feizi tested was the model created by OpenAI, the company behind ChatGPT, which was recently shelved in a move that many viewed as evidence that detection could not be done.
Turnitin – whose detector generally scored higher than most in the studies but did not prove infallible – recently revealed that its tool has already been used 65 million times.
Annie Chechitelli, the company’s chief product officer, said the product was helping maintain “fairness and consistency in classrooms” but was also still “evolving” and the next step was to help educators better understand the numbers the detector produces and what this might indicate.
Swansea University was not yet using Turnitin, according to Michael Draper, a professor of legal education who also serves as the university’s academic integrity director.
He said he had “mixed feelings” about detection. “If you use a detection tool as a primary means of evidence when accusing a student of committing misconduct, then you are on a hiding to nothing,” he said.
“But I think using it as a first step is legitimate. You can then have an exploratory conversation with a student in relation to their submission. Some may volunteer they have used AI, or it will become clear they can’t adequately explain how they have arrived at their answer.”
Professor Draper said universities should consider asking students to submit a “research trail” alongside their final draft to show their workings out, which could form part of the assessment.
“These things can also be fabricated, but it is still a useful extra step in detection,” he said. “Anyway, it would be beneficial for students to develop this skill.”
AI detection was not going to go away, however, according to Professor Draper, who pointed to a recent voluntary commitment made in the US by many of the major companies creating LLMs to develop “robust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system”.
This, he said, would likely be followed by regulation if adequate detection methods were not produced voluntarily, in a “turning of the tide” against companies that “have a vested commercial interest in not having detection”.
“There is increasing recognition that we need to have the ability to differentiate between AI- and human-written text for a number of ethical and legal reasons. It is in everyone’s interest long term to know if something is AI generated or not,” Professor Draper said.
“Some people say detection will never keep up. That’s true when it’s an independent company trying to second-guess what will happen next, but when you have a commitment from the AI companies themselves to create a means of detection, you are on a much stronger wicket.”
Savvy and determined students will find ways around watermarking, but another issue was the blurring of the lines between AI and human writing as chatbots become embedded into everyday programs, according to Mike Sharples, emeritus professor at the Open University's Institute of Educational Technology.
For example, “Copilot” – Microsoft’s soon-to-launch AI assistant – promises to be able to “shorten, rewrite or give feedback” on a user’s written work.
“Rather than generating an entire essay with AI, students will just press the ‘continue’ button or equivalent when they get stuck,” said Professor Sharples.
“Or use it to rewrite a section, or to suggest references. AI will become part of the workflow. It will become increasingly difficult for AI detectors to call out these ‘AI-assisted’ student assignments.”
Register to continue
Why register?
- Registration is free and only takes a moment
- Once registered, you can read 3 articles a month
- Sign up for our newsletter
Subscribe
Or subscribe for unlimited access to:
- Unlimited access to news, views, insights & reviews
- Digital editions
- Digital access to THE’s university and college rankings analysis
Already registered or a current subscriber? Login