AI Cheating Went Undetectable — and the Detectors Were Never Trustworthy Anyway

Published:

Updated:

Opinion · AI Detection

AI Cheating Went Undetectable — and the Detectors Were Never Trustworthy Anyway

Runs Detection Drama, stress-testing AI detectors and humanizers; publishes Words At Scale to 26,000+ subscribers.

The New York Times has decided that student AI cheating is now “impossible to detect.” I track AI detectors for a living, and my reaction isn’t panic. It’s relief that the rest of the world is finally saying out loud what the numbers showed years ago: the detectors were never a reliable judge in the first place.

The Times’ framing is that a wave of new apps lets students slip machine-written work past teachers. That’s true. But the sharper detail in the same reporting is the one schools should sit with: in some cases, the companies selling detection tools are also shipping the apps that beat them.

That matters because for three years a detector score has been treated as evidence — the number that fails a student, triggers a misconduct hearing, and follows them onto a transcript. If detection has “lost,” it didn’t lose this week. It was always losing. The accusation machine was simply louder than the accuracy underneath it.

Independent testing has never backed the confidence. A Stanford study found detectors flagged 61% of TOEFL essays by non-native English speakers as AI-written. An earlier test of 14 detectors found more than half of lightly edited AI text slipped through. Those failures predate this month’s headline by years.

The people who pay for that false confidence aren’t the cheaters — they’re the students flagged by mistake. In May, a Palo Alto high schooler’s family filed a civil-rights suit after an AI accusation. That’s what a coin-flip tool with a permanent record attached actually costs.

The obvious objection: if some cheating now gets through, isn’t an imperfect detector better than nothing? No — not when the tool produces confident false positives and the penalty is a transcript notation. Courts are already saying so. A New York court recently annulled a misconduct finding partly because the university ignored the student’s contradictory detection results.

The fix was never a better scanner. It’s process: evidence visible in the work itself, an AI rule actually written into the syllabus, and a clean line between a grading decision and a misconduct charge. Detection was the shortcut schools took to avoid that harder work, and the shortcut just ran out.

If you’ve been flagged, the percentage is not proof. Detection Drama keeps a running guide to what to do in the first 24 hours after an AI accusation — and a tally of the universities already walking away from detectors.

The detectors didn’t just fail to keep up with the cheating apps. They failed the students they flagged by mistake, and that was true long before the Times ran the headline. The schools still treating a percentage as proof are the ones who’ll keep ending up in court.

VI

Vlad Ivanov

Vlad runs Detection Drama, where he stress-tests AI detectors and humanizers against real student writing, and publishes the Words At Scale newsletter to more than 26,000 subscribers. He has spent three years tracking how AI-detection accuracy holds up in the wild.