The Oxbridge Editing Blog 24th September 2023

What Do We Know About the Reliability of AI Detection Tools?

24th September 2023
Speak right now to our live team of English staff

Artificial intelligence is progressively infiltrating our daily routines, and academia is no exception. With tools such as ChatGPT, you can now generate and edit all sorts of texts, including academic papers, in a matter of seconds. This has brought much concern to educational institutions, and professors are now constantly on the lookout for students who are using AI. With the rise of AI detection software, such as GPTZero, OpenAI, and Turnitin, professors now have a seemingly reliable method for detecting AI use. However, this software comes with its own set of problems. Read on to learn more.

AI detection is far from accurate

Professors interested in upholding academic integrity are drawn to the advantages of AI detection tools. These tools perform a detailed, sentence-by-sentence analysis of academic papers and assign scores based on how much text was written by AI. They seem beneficial because their implementation at universities can deter students from deciding to use AI. However, the accuracy of these tools is far from ideal.

The most important problem with AI detection tools is that they have high false positive rates. This means they are likely to identify human-written text as being written by AI, even if no AI was used to generate content. Some AI detection companies, such as Turnitin, claim their false positive rate is only 4%. Although this percentage seems to indicate high accuracy, this isn’t really so. If a university checks 3000 academic papers, this means that 120 papers will be labelled as AI-generated even though they are not. This is not a small number at all.

Inaccurate AI detection can have dire consequences

When an AI detection tool results in a false positive result, professors are likely to incorrectly accuse a student of cheating. This may lead to disciplinary action and even exclusion. In the last few months, many students have been accused of using AI even though they have written and edited their papers themselves, which has caused them great stress and anxiety (Fowler, 2013). At a particular difficulty are non-native English students, whose texts are more likely to result in false positive detection of AI use (Sample, 2023). These students, like many others who did not actually use AI, are in a dire problem since proving that a paper was not written or edited by AI can be challenging.

A key message

 This blog post does not suggest that you should use AI and not worry about AI detection since AI detection tools are inaccurate. Even though these tools are unreliable, professors still have valid means of detecting whether a paper was written or edited by AI. The key message of this post, instead, is that you should not worry too much if you run your paper through an AI detection software and learn that your paper was “AI-generated”. A false positive result can and, most definitively, will occur sooner or later. In the subsequent blog, we will explore how to prepare yourself and your academic edits in advance to avoid being a victim of false positive rates and the resulting disciplinary action.


 Fowler, G. A. (2023, April 3). We tested a new ChatGPT-detector for teachers. It flagged an innocent student. Retrieved September 11, 2023, from

Sample, I. (2023, July 10). Programs to detect AI discriminate against non-native English speakers, shows study. Retrieved September 11, 2023, from