
Bronze. Cleveland Museum of Art. CC0
Originally conceived as part of The Gates of Hell, Rodin’s The Thinker was not merely a passive figure lost in thought, but a representation of Dante himself, contemplating the fates of souls below. Cast in tension and muscle, he embodies the labor of intellect—the weight of reflection, the cost of authorship, and the solitary burden of making meaning in a world of mechanized shortcuts. A fitting emblem for the human writer mistaken for a machine.
Preface: A Writer Mistaken for a Machine
The main essay that follows this preface was generated wholly by ChatGPT’s “Deep Research” feature, produced at my request after a recent experience that was equal parts amusing and unsettling.
In a recent essay I had written—carefully and thoughtfully—I found myself admiring a few turns of phrase that seemed, perhaps, too polished. Seeking to determine whether I had unconsciously absorbed and repeated something from my recent reading, I turned to a site I had used before—one that aggregates reviews of AI and plagiarism detectors commonly employed by educators. From there, I selected not one, but three highly rated tools to review my essay and determine whether I had inadvertently borrowed a phrase from Blake, Eckhart, Pseudo-Dionysius, or anyone else I have recently been reading.
The results were, to put it mildly, contradictory, though not for the issue I had set out to explore. The first site was no longer operational, citing the unreliability of AI detection in view of the accelerating complexity of AI language model algorithms. The second tool confidently declared that my essay was entirely free of both plagiarism and AI-generated content. The third, by contrast, just as confidently pronounced that my essay was likely 100 percent AI-generated, both in style and content, based on the presence of twenty phrases—unhelpfully left unidentified—that appeared more frequently in AI-generated material. The site explained that those mysterious phrases had been used in training language models and thus their use in my writing rendered it suspect. It passed no judgment on whether I had plagiarized any statements, only that the content bore resemblance to machine-generated text.
My immediate reaction, I confess, was to teeter between horror and bemusement. The accusation—if one may call such pronouncements generated by AI algorithms such—felt surreal. After all, I knew the truth: I had written every word of the essay, agonized over phrasing, amended lines multiple times, and left the final version still slightly flawed in its characteristic manner—overwritten in places, a bit repetitive, and too fond of “dollar words” when “nickel words” might have sufficed. In other words, it bore the unmistakable hallmark of my own inimitable style and vocabulary—a style and vocabulary that had been mine long before AI and computers were available to assist writers.
My suspicion is that some AI detectors struggle with refined style and elevated or scholarly vocabulary, not because the language itself is artificial, but because such prose deviates from what the detectors expect. Many of these tools appear to assume that typical writing samples—particularly from Americans—will reflect a sixth- to eighth-grade reading and writing level, which is often cited as the norm in American education. As a result, writing that demonstrates syntactic complexity, lexical richness, or familiarity with classical or theological sources may be flagged as anomalous—if not by design, then by statistical accident.
But perhaps this is not so much a matter of cynicism as it is a reflection of changing cultural baselines. It may be that AI detectors are most often trained and tested on writing submitted by individuals who, through no fault of their own, have received a relatively standard education—one that is no longer grounded in the Western canon, rhetorical tradition, or literary cultivation. Meanwhile, the language models themselves were trained on vast bodies of material that included precisely such literary and scholarly writings. The result is a curious inversion: those whose writing reflects a more literary or humanistic sensibility may appear “too AI-like” because the models were trained on the very texts that once defined erudition. We have, in a sense, taught the machines what good writing looks like—and then turned around and accused anyone who writes well of being a machine.
Once the bemusement passed, I turned to curiosity. How could this happen? What is the current scholarly consensus on these tools? Are they reliable? Ethical? Legally defensible? And what risks do they pose—to students, educators, or professionals whose authentic work is misjudged by algorithm? The essay that follows is the product of those inquiries: an AI-assisted deep research essay on AI detection tools, their promises and pitfalls, their technical limits, and their unintended consequences.
To be clear, I do use AI tools—but not to draft my writing. I use them as an editor and as a very well-informed assistant. Tasks assigned to AI include reviewing essays for spelling and grammatical errors, formatting footnotes and endnotes, formatting essays for publication on my website, converting material into HTML, creating SEO-friendly titles and tags, checking poetic meter, or assisting me as a thesaurus when a word feels off. AI assists at the margins. It does not craft essays, as writing is my work.
Anyone still in doubt need only glance at my desk—or my nightstand or dining room table. There, amid scattered books, notebooks, half-drafted pages, and layers of revisions, is the reality of my writing process. It is rarely clean, often circuitous, and always human.
Writing is a laborious but enjoyable process. Many essays and poems take months to write, others take weeks, a few only days. Now and then, an essay or poem does arrive nearly whole, a rare gift, as if sprung from the brow of Zeus. But more often, it is a time-consuming process, coming line by line, revision by revision.
So, with that somewhat overwrought introduction, I offer the following AI-generated essay on AI detection tools—an essay which, in my professional and legal opinion, should dissuade any reasonable educator or institution from ever using AI detectors to determine authorship. AI plagiarism detection may still serve a purpose. But AI authorship detectors? Never. Do not be tempted.
And if I may offer some unsolicited advice in their place, grounded not in machine logic but in the lived practice of teaching and learning: when I taught history, reading, and religion to seventh and eighth graders at St. Edward Catholic School in Youngstown, Ohio, I insisted that all assignments be written in ink. “If one is to err, one should err boldly, in ink,” I told my students, and I refused to accept work written in pencil. This approach taught them not only to commit to their words but, more importantly, to reflect on them before committing anything to paper. It encouraged thought and contemplation—qualities essential to authentic writing and learning—rather than the careless drafting and endless erasing that pencils with erasers, and now mechanical tools, permit. That ethic, I believe, translates well to our current moment.
Educators should begin by becoming familiar with the voice, habits, and capabilities of the writers whose work they are assessing. Ask for drafts, notes, outlines, or written reflections that reveal the student’s thinking process. Structure assignments so that substantial components are completed in class, or are grounded in personal experience or classroom dialogue—subjects that AI cannot credibly fabricate. Make clear whether AI tools may be used, and if so, how. Explain why certain shortcuts, especially in formative stages, may undermine the very skills students are meant to acquire.
For developing writers especially, I am inclined to believe it is best to eschew AI altogether—and perhaps even computers and, dare I say, typewriters, should any still have access to them—in the early stages of learning. Write by hand, with ink. Let not an algorithm be found in the process.
Scholarly and Critical Perspectives on AI Content and Detection Tools
A CHAT GPT ESSAY
Introduction
AI content detection tools – such as Copyleaks, Turnitin’s AI-writing detector, GPTZero, and others – have emerged to help educators and publishers identify text that might have been generated by AI. These detectors typically analyze text for telltale patterns or “low perplexity” that could signal machine-written prose. However, as these tools proliferate in classrooms and journals, many academics, educators, and legal experts are raising alarms about their reliability, transparency, and potential harms. Recent studies and critiques suggest that current AI detectors often fall short of their promises and may even produce unintended negative consequences theguardian.comvanderbilt.edu. This report provides an up-to-date overview of how academic, educational, and legal communities view AI content detectors, focusing on concerns over accuracy, fairness, and the risk of false accusations.
Accuracy and Reliability Issues
Detectors’ claims vs. reality: AI detector companies often tout extremely high accuracy rates – some advertise 98–99% accuracy for identifying AI-generated textcitl.news.niu.edu. For example, Copyleaks has claimed 99.12% accuracy and GPTZero about 99%citl.news.niu.edu. In practice, independent evaluations have found such claims “misleading at best” theguardian.com. OpenAI’s own attempt at an AI-written text classifier was quietly discontinued in mid-2023 due to its “low rate of accuracy”insidehighered.combusinessinsider.com. Even Turnitin, which integrated an AI-writing indicator into its plagiarism platform, acknowledged that real-world use revealed a higher false positive rate than initially estimated (more on false positives below)insidehighered.cominsidehighered.com. In short, consensus is growing that no tool can infallibly distinguish human from AI text, especially as AI models evolve.
False negatives and AI evolution: Critics note that detectors struggle to keep up with the rapid progress of large language models. Many detectors were trained on older models (like GPT-2 or early GPT-3), making them prone to “overfitting” on those patterns while missing the more human-like writing produced by newer models such as GPT-4 bibek-poudel.medium.com. A recent U.K. study underscores this gap: when researchers secretly inserted AI-generated essays into real university exams, 94% of the AI-written answers went undetected by graders bibek-poudel.medium.comreading.ac.uk. In fact, those AI-generated answers often received higher scores than human students’ work bibek-poudel.medium.com, highlighting that advanced AI can blend in undetected. This high false-negative rate suggests detectors (and even human examiners) can be easily fooled as AI-generated writing grows more sophisticated. It also reinforces that educators cannot rely on detectors alone – as one analyst put it, trying to catch AI in writing is “like trying to catch smoke with your bare hands” bibek-poudel.medium.com.
Transparency and Methodological Concerns
Many in academia criticize AI detection tools as “black boxes” that lack transparency. Turnitin’s AI detector, for instance, was rolled out in early 2023 with almost no public information on how it worked. Vanderbilt University – which initially enabled Turnitin’s AI checks – reported “no insight into how it [the AI detector] works” and noted that Turnitin provided “no detailed information as to how it determines if a piece of writing is AI-generated or not.” vanderbilt.edu Instead, instructors were told only that the tool looks for unspecified patterns common in AI writing. This opacity makes it difficult for educators and students to trust the results or to challenge them. If a student is flagged, neither the instructor nor the student can see what specific feature triggered the detector’s suspicion. Such lack of transparency runs counter to academic values of evidence and explanation, as decisions about academic integrity are being outsourced to an algorithm that operates in secrecy.
Lack of peer review or independent validation: Unlike plagiarism checkers (which match text against known sources), AI detectors use proprietary algorithms and often haven’t been rigorously peer-reviewed in public. Experts point out that “AI detectors are themselves a type of artificial intelligence” with all the attendant opaqueness and unpredictability citl.news.niu.edu. This raises concerns about due process: should a student face consequences from a tool whose inner workings are not open to scrutiny? Legal commentators note that relying on an unproven algorithm for high-stakes decisions is risky – any “evidence” from an AI detector is inherently probabilistic and not easily explainable in plain terms cedarlawpllc.comcedarlawpllc.com. Some universities have therefore erred on the side of caution. For example, the University of Minnesota explicitly “does not recommend instructors use AI detection software because of its known issues” mprnews.org, and advises that if used at all, it be treated as an “imperfect last resort.”
Privacy concerns: Another transparency issue involves data privacy and consent. Using third-party AI detectors means student submissions (which can include personal reflections or sensitive content) are sent to an external service. Vanderbilt’s review concluded that “even if [an AI detector] claimed higher accuracy… there are real privacy concerns about taking student data and entering it into a detector managed by a separate company with unknown data usage policies.” vanderbilt.edu Educators worry that student work could be stored or reused by these companies without students’ knowledge. This lack of clarity about data handling adds yet another layer of concern, leading some institutions to opt out of detector services on privacy grounds alone.
False Positives and Bias Against Certain Writers
Perhaps the most pressing criticism of AI content detectors is their propensity for false positives – flagging authentic human work as AI-generated. Researchers and educators have documented numerous cases of sophisticated or even simplistic human writing being mistaken for machine output. A dramatic illustration comes from feeding well-known texts into detectors: when analysts ran the U.S. Constitution through several AI detectors, the document was flagged as likely written by AI senseient.com. The reason is rooted in how these tools work. Many detectors measure “perplexity,” essentially how predictably a text aligns with patterns seen in AI training data senseient.comsenseient.com. Paradoxically, a text like the Constitution or certain Bible verses, which use common words and structures, appears too predictable and yields a low perplexity score – causing the detector to misjudge it as AI-produced. As one expert quipped, detectors can incorrectly label even America’s most important legal document as machine-made senseient.com. This highlights a fundamental flaw: well-written or formulaic human prose can trip the alarms because AI models are trained on vast amounts of such text and can mimic it.
Bias against non-native English writers: A growing body of scholarship reveals that AI detectors may disproportionately flag work by certain groups of human writers. A 2023 Stanford study by Liang et al. found that over half of essays written by non-native English speakers were wrongly flagged as AI-generated by popular detectors theguardian.com. By contrast, the same detectors judged over 90% of essays by native English-speaking middle-schoolers to be human-written theguardian.com. The disparity stems from linguistic style: non-native writers, or those with more basic vocabulary and simpler grammar, inadvertently write in a way that the detectors identify as “low perplexity” (too predictable) theguardian.com. Detectors, trained on AI outputs that tend to be straightforward, end up penalizing writers who use simpler phrasing or formulaic structures, even if their work is entirely original theguardian.com. The Stanford team bluntly concluded that “the design of many GPT detectors inherently discriminates against non-native authors” themarkup.org. This bias can have serious implications in academia and hiring: an ESL student’s college essay or a non-native job applicant’s cover letter might be unfairly flagged, potentially “marginalizing non-native English speakers on the internet” as one report warned theguardian.comtheguardian.com.
Beyond language background, other kinds of “atypical” writing styles trigger false positives. People with autism or other neurodivergent conditions, who might write in a repetitive or highly structured way, have been snared by AI detectors. Bloomberg reported the case of a college student with autism who wrote in a very formal, patterned style – a detector misidentified her work, leading to a failing grade and a traumatic accusation of cheating gigazine.net. She described the experience as feeling “like I was punched in the stomach” upon learning the software tagged her essay as AI-written gigazine.net. Likewise, younger students or those with limited vocabulary (through no fault of their own) could be at higher risk. In tests on pre-ChatGPT student essays, researchers found detectors disproportionately flagged papers with “straightforward” sentences or repetitive word choices mprnews.orggigazine.net. These examples underline a key point from critics: AI content detectors exhibit systemic biases – they are more likely to falsely accuse certain human writers (non-native English writers, neurodivergent students, etc.), raising equity and ethical red flags.
Real World Consequence: False Accusations and Student Harm
For students and educators, a false positive isn’t an abstract statistical problem – it can derail a person’s education or career. Recent incidents show the tangible harm caused by over-reliance on AI detectors. At Johns Hopkins University, lecturer Taylor Hahn discovered multiple instances where Turnitin’s AI checker flagged student papers as 90% AI-written, even though the students had written them honestly themarkup.orgthemarkup.org. In one case, a student was able to produce drafts and notes to prove her work was her own, leading Hahn to conclude the “tool had made a mistake.” themarkup.org He and others have since grown wary of trusting such software. Unfortunately, not all students get the benefit of the doubt initially. In Texas, a professor infamously failed an entire class after an AI tool (reportedly ChatGPT itself) “detected” cheating, only for it to emerge that the students hadn’t cheated – the detector was simply not a valid evidence tool businessinsider.combusinessinsider.com. Incidents like this have fueled professors’ concerns that blind faith in detectors could lead to wrongful punishments of innocent students.
The psychological and academic toll of false accusations is significant. Students report experiencing stress, anxiety, and a damaged sense of trust when their authentic work is misjudged by an algorithm citl.news.niu.edu. For international students, the stakes can be even higher. As one Vietnamese student explained, if an AI detector wrongly flags his paper, it “represents a threat to his grades, and therefore his merit scholarship” – even raising fears about visa status if academic standing is lost themarkup.orgthemarkup.org. In the U.S., where academic misconduct can lead to expulsion, an unfounded cheating charge could put an international student at risk of deportation themarkup.org. These scenarios illustrate why students like those at the University of Minnesota say they “live in fear of AI detection software”, knowing one false flag could be “the difference between a degree and going home.” mprnews.orgmprnews.org
Unsurprisingly, some students and faculty have fought back. In early 2025, a Ph.D. student at the University of Minnesota filed a lawsuit alleging he was unfairly expelled based on an AI cheating accusation mprnews.orgmprnews.org. He maintains he did not use AI on an exam, and objects that professors relied on unvalidated detection software as evidence mprnews.orgmprnews.org. The case, which garnered national attention, underscores the legal minefield institutions enter if they treat AI detector output as proof of misconduct. Similarly, a community college student in Washington state had his failing grade and discipline overturned after lawyers demonstrated to the school’s administration how unreliable the detection program was – notably, the college vice-president admitted that even her own email reply was flagged as 66% AI-generated by the tool cedarlawpllc.com cedarlawpllc.com. In voiding the penalty, the college effectively acknowledged that the detector’s result was not trustworthy evidence cedarlawpllc.com. These cases highlight a common refrain: without corroborating evidence, an AI detector’s output alone is too flimsy to justify accusing someone of academic dishonesty cedarlawpllc.com.
Responses from Educators and Institutions
The educational community’s response to AI detectors has rapidly evolved from initial curiosity to growing skepticism. Many instructors, while concerned about AI-assisted cheating, have concluded that current detector tools are “a flawed solution to a nuanced challenge.” They argue that “they promise certainty in an area where certainty doesn’t exist” bibek-poudel.medium.com. Instead of fostering integrity, heavy-handed use of detectors can create an adversarial classroom environment and chill student creativity medium.com. For these reasons, a number of teaching and learning centers at universities have published guides essentially making the case against AI detectors. For instance, the University of Iowa’s pedagogy center bluntly advises faculty to “refrain from using AI detectors on student work due to the inherent inaccuracies” and to seek alternative ways to uphold integrity teach.its.uiowa.edu. Northern Illinois University’s academic technology office labeled detectors an “ethical minefield,” arguing their drawbacks (false accusations, bias, stress on students) “often outweigh any perceived benefits.” citl.news.niu.edu Their guidance encourages faculty to prioritize fair assessments and student trust over any quick technological fixcitl. news.niu.edu.
Importantly, some universities have instituted policy decisions to limit or reject the use of AI detection tools. In August 2023, after internal tests and consultations, Vanderbilt University decided to disable Turnitin’s AI detector campus-wide vanderbilt.edu vanderbilt.edu. Vanderbilt’s announcement cited multiple concerns: uncertain reliability, lack of transparency, the risk of ~1% false positives (potentially hundreds of students falsely flagged each year), and evidence of bias against non-native English writers vanderbilt.eduvanderbilt.edu. Northwestern University likewise turned off Turnitin’s AI detection in fall 2023 and “did not recommend using it to check students’ work.” businessinsider.com The University of Texas at Austin also halted use, with a vice-provost stating that until the tools are accurate enough, “we don’t want to create a situation where students are falsely accused.” businessinsider.com Even Turnitin’s own guidance to educators now stresses caution, advising that its AI findings “should not be used as the sole basis for academic misconduct allegations” and should be combined with human judgment turnitin.com. In practice, many colleges have shifted focus to preventative and pedagogical strategies – designing assignments that are harder for AI to complete (personal reflections, oral exams, in-class writing), educating students about acceptable AI use, and improving assessment design mprnews.orgmprnews.org. This approach seeks to address AI-related cheating without leaning on fallible detection software.
On a broader policy level, OpenAI itself has cautioned educators against over-reliance on detectors. In a back-to-school guide for fall 2023, OpenAI explicitly warned that AI content detectors are not reliable for distinguishing GPT-written text businessinsider.com. The company even confirmed what independent studies found: detectors tend to mislabel writing by non-English authors as AI-generated, and thus should be used, if at all, with extreme care businessinsider.com. As a result, many institutions are rethinking how to maintain academic integrity in the AI era. The emerging consensus in education is that no AI detection tool today offers a magic bullet, and using them blindly can cause more harm than good. Instead, instructors are encouraged to discuss AI use openly with students, set clear policies, and consider assessments that integrate AI as a learning tool rather than treat it as a forbidden trick mprnews.orgmprnews.org.
Legal and Ethical Considerations
The controversies around AI writing detectors also raise legal and ethical questions. From an ethical standpoint, deploying a tool known to produce errors that can jeopardize students’ academic standing is highly problematic. Scholars of educational ethics argue that the potential for “unfounded accusations” and damage to student well-being means the costs of using such detectors may outweigh the benefits themarkup.org themarkup.org. There is an implicit breach of trust when a student’s honest work is deemed guilty until proven innocent by an algorithm. This reverses the usual academic principle of assuming student honesty and has been compared to using an unreliable litmus test that forces students to “prove their innocence” after a machine accuses them themarkup.org. Such an approach can poison the student-teacher relationship and create a climate of suspicion in the classroom.
Legally, if a student is disciplined or loses opportunities due to a false AI detection, institutions could face challenges. Education lawyers note that students might have grounds for appeal or even litigation if they can show that an accusation rested on junk science. The defamation lawsuit at Minnesota (mentioned above) may set an important precedent on whether sole reliance on AI detectors can be considered negligent or unjust by a university mprnews.orgmprnews.org. Additionally, since studies have demonstrated bias against non-native English speakers, one could argue that using these detectors in high-stakes decisions could inadvertently violate anti-discrimination policies or laws, if international or ESL students are disproportionately harmed. Universities are aware of these risks. As the Cedar Law case in Washington illustrated, once informed of the detector’s fallibility, administrators reversed the sanction to avoid unfairly tarnishing a student’s record cedarlawpllc.comcedarlawpllc.com. The takeaway for many is that any evidence from an AI detector must be corroborated and cannot be treated as conclusive. As one legal commentary put it: the “lesson from these cases is that colleges must be extremely conscientious given the present lack of reliable AI-detection tools, and must evaluate all evidence carefully to reach a just result.” cedarlawpllc.com
Finally, there are broader implications for academic freedom and assessment. If instructors were to let fear of AI cheating drive them to use opaque tools, they might also chill legitimate student expression or push students toward homogenized writing. Some ethicists argue that the very concept of AI text detection may be a “technological dead end” – because writing is too variable and AI is explicitly designed to mimic human style, trying to perfectly separate the two may be futile bibek-poudel.medium.combibek-poudel.medium.com. A more ethical response, they suggest, is to teach students how to use AI responsibly and adapt educational practices, rather than leaning on surveillance technology that cannot guarantee fairness.
Conclusion
Current scholarly and critical perspectives converge on a clear message: today’s AI content detectors are not fully reliable or equitable tools, and their unchecked use can do more harm than good. While the idea of an “AI lie detector” is appealing in theory, in practice these programs struggle with both false negatives (missing AI-written text) and false positives (flagging innocent writing) to a degree that undermines their utility. The lack of transparency and independent validation further erodes confidence, as does evidence of bias against certain writers. Across academia, educators and researchers are warning that an over-reliance on AI detectors could lead to wrongful accusations, damaged student-teacher trust, and even legal repercussions. Instead of providing a quick fix to AI-facilitated cheating, these tools have become an object of controversy and caution.
In the educational community, a shift is underway – away from automated detection and toward pedagogy and policy solutions. Many universities have scaled back use of detectors, opting to train faculty in better assessment design, set clear guidelines for AI use, and foster open dialogue with students about the role of AI in learning mprnews.orgmprnews.org. Researchers are continuing to study detection methods, but most acknowledge that as AI writing gets more advanced, the detection arms race will only intensify edintegrity.biomedcentral.comedintegrity.biomedcentral.com. In the meantime, the consensus is that any use of AI content detectors must be coupled with human judgment, skepticism of the results, and a recognition of the tools’ limits edintegrity.biomedcentral.comedintegrity.biomedcentral.com. The overarching lesson from the past two years is one of caution: integrity in education is best upheld by informed teaching practices and fair processes, not by uncritically trusting in artificial intelligence to police itself cedarlawpllc.comcitl.news.niu.edu.


