AI-Generated Research Papers: 2026 Statistics on Retractions, Peer Review, and Journal Policies
papers retracted from Wiley’s Hindawi portfolio between 2022 and 2024 — the largest single retraction event in academic publishing history, fueled by AI-assisted paper mills.
Key Takeaways
- 22% of computer science papers analyzed in 2024 show signs of LLM-generated content — the highest rate of any field (Science.org)
- 15.8% of peer reviews at ICLR 2024 were written with the help of an LLM, across 4,428 of 28,028 reviews (arXiv 2405.02150)
- 49.4% of ICLR 2024 paper submissions received at least one AI-assisted review (Stanford / Liang et al.)
- 57% of scientists in Nature’s 2025 survey admitted to using AI for writing help in the past two years (Nature)
- 15,000+ papers flagged by the Problematic Paper Screener for tortured AI paraphrases like “nucleic corrosive” for “nucleic acid” (Cabanac et al.)
- 139 GPT-fabricated papers identified on Google Scholar — two-thirds via undisclosed ChatGPT use (HKS Misinformation Review)
- $35-40M in revenue Wiley lost in a single fiscal year tied directly to the Hindawi AI paper-mill scandal (Dark Daily)
- 5 major publishers (Elsevier, Springer Nature, Wiley, Taylor & Francis, SAGE) explicitly ban AI authorship while permitting disclosed AI assistance (SciPub+)
The Retraction Tsunami: 11,300 Papers and Counting
The 2024 Wiley/Hindawi event is the largest retraction wave in the history of academic publishing — and AI is implicated at every step. According to reporting in The Register, paper mills used large language models to mass-produce manuscripts with fabricated data, plagiarised text, and hallucinated citations. Wiley shut down 19 scholarly journals as a result, and disclosed $35-40 million in lost annual revenue. The same dynamic that drives growth in the AI detection industry is also driving its inverse: industrial-scale generation of fake scholarship that detection cannot keep pace with.
| Retraction Event | Volume | Year(s) | Primary Driver |
|---|---|---|---|
| Wiley / Hindawi portfolio | 11,300+ | 2022-2024 | AI-assisted paper mills |
| Total fraudulent withdrawals (global) | 10,000+ | 2023 | Fake peer review + AI generation |
| Fake peer review retractions (cumulative) | 6,400+ | 2024-2025 | Compromised review pipelines |
| Retraction Watch total corpus | 55,000 | through Aug 2025 | All causes |
| AI-related retractions (peak year) | 667 | 2023 | Frontiers systematic review |
| Saveetha University authors | 80+ | 2024 | Mass paper-mill output |
Wiley’s disclosed annual revenue loss directly attributable to the Hindawi AI paper-mill scandal. The financial damage to publishers now rivals what individual universities spend defending against false-positive AI accusations.
Source: Dark Daily
Frontiers in Research Metrics published a systematic review showing AI-related retractions peaked at 667 in 2023, with the curve continuing to climb through 2024. Retraction Watch has documented at least one journal — Neurosurgical Review — that paused accepting commentaries after being overwhelmed by LLM-generated submissions, while Saveetha University authors saw at least 80 retractions in 2024 alone. The retraction infrastructure built for occasional misconduct cannot scale to the volume that paper mills now produce. This mirrors what we documented in the AI humanizer industry report: every defensive system in the integrity ecosystem is being outpaced by the generation side.
How Much AI Is Actually In Published Papers
Two measurement methods produce wildly different numbers, and both matter. The first is statistical word-frequency analysis: comparing the language of post-ChatGPT papers to pre-2022 baselines reveals shifts in token distribution that signal LLM input. Stanford’s Liang et al. analysis using this method found 17.5% of computer science papers contained at least some AI-drafted content. Science.org’s reporting on related work put the figure as high as 22% for CS — the most AI-saturated field by a wide margin.
The second method is self-report: surveys ask researchers directly whether they used AI. Nature’s 2023 survey found 30% of scientists had used generative AI to help write papers. By the time Nature’s 2025 follow-up ran, that figure climbed to 57% in the past two years and 72% in the next two. The gap between detected AI text in published papers (1-3% in some conservative analyses) and self-reported AI assistance (57%) tells the most important story: most researchers use AI as an editor or co-pilot, not as a ghost-writer, and that quiet middle ground is essentially undetectable. Detection Drama’s prior reporting on what makes writing sound AI-generated to humans reinforces this: when AI is used to polish rather than draft, the linguistic signal disappears.
of computer science papers analyzed in 2024 contained probable LLM input — the highest documented infiltration rate for any scientific discipline. CS is both the easiest field for AI to write convincingly and the field whose researchers are most enthusiastic about using it.
Source: Science.org analysis
AI in Peer Review: Half of ICLR Submissions Hit a Bot
If AI in written papers is the headline, AI in peer review is the buried lead. The AI Review Lottery study analyzed all 28,028 reviews submitted to the International Conference on Learning Representations (ICLR) in 2024 and classified 4,428 of them — 15.8% — as crafted with LLM assistance. Crucially, 49.4% of submissions received at least one AI-assisted review, meaning roughly half of all ICLR authors had their work judged in part by a language model rather than a human reviewer. The same researchers also found that AI-assisted reviews boosted paper scores and acceptance rates, suggesting a systematic distortion of which research enters the citation graph.
A parallel Stanford HAI analysis of 50,000 CS conference peer reviews from 2023-2024 estimated up to 17% of all review sentences were likely written by an LLM. The implication is that the integrity layer most readers assume protects published research — disinterested human expert review — is itself being delegated to AI. This is happening in parallel to the issue we documented in our professors using ChatGPT report, where instructors increasingly use AI for the same grading and feedback tasks they penalize students for automating.
| Venue / Study | Reviews Analyzed | AI Footprint | Source |
|---|---|---|---|
| ICLR 2024 (Liang et al.) | 28,028 | 15.8% LLM-assisted | arXiv 2405.02150 |
| ICLR 2024 submissions with ≥1 AI review | 49.4% | ~half of papers | arXiv 2405.02150 |
| CS Conference reviews (Stanford) | 50,000 | Up to 17% of sentences | Stanford HAI |
| ICLR 2024 sentences modified by ChatGPT | 10.6% | substantially modified | arXiv 2403.07183 |
of ICLR 2024 paper submissions received at least one AI-assisted peer review. Roughly half of all reviewed papers had their fate partially decided by a language model — and AI-assisted reviews systematically inflated paper scores.
Source: arXiv 2405.02150 (Liang et al.)
Detection Signals: What Actually Works on Published Papers
Investigators tracking AI-generated papers rely on three signal types, none of which resemble what commercial AI detectors do. The first is leaked LLM phrases: a Wiley Learned Publishing analysis documented how queries like “as of my last knowledge update” and “certainly, here is” surface thousands of papers on Google Scholar that authors forgot to redact. The phrase “regenerate response” has appeared verbatim in dozens of indexed manuscripts. These are not detection-tool outputs — they are forensic search queries any reader can run.
The second signal is tortured phrases. Guillaume Cabanac and collaborators built the Problematic Paper Screener, which scans 130 million scientific publications weekly using nine detectors. The flagship detector catches machine-paraphrased text in which terminology has been rewritten by synonym substitution — “nucleic corrosive” instead of “nucleic acid”, “counterfeit conscience” instead of “artificial intelligence”. The system has flagged over 15,000 papers, providing the most reliable corpus of confirmed-suspicious literature in the field.
The third method — statistical word-frequency shift analysis — is how Stanford and Nature produced their 17.5% and 22% headline figures. None of these methods resemble the commercial AI-detection tools used in classrooms, which our false positive statistics report documents as performing erratically on academic prose. As covered in our ESL bias research, the same detectors that flag Charles Dickens at 95% AI are useless on real journal submissions — which is precisely why publishers built their own forensic pipelines instead.

Publisher Policies in 2026: The Five-Publisher Consensus
By mid-2024, the five largest academic publishers had converged on near-identical AI policies. A SciPub+ comparison documents the consensus: AI tools cannot be listed as authors because authorship implies a responsibility for the work that no algorithm can take on. ChatGPT cannot be sued, cannot be reprimanded, cannot retract a co-authored claim. Authorship without accountability is impossible.
| Publisher | AI as Author? | Disclosure Required? | Special Note |
|---|---|---|---|
| Elsevier | Prohibited | Yes, in dedicated section | AI-generated images banned in articles |
| Springer Nature | Prohibited | Yes, in methods or acknowledgments | Explicitly bans AI-image generation in scientific manuscripts |
| Wiley | Prohibited | Yes | Built proprietary paper-mill detection tool post-Hindawi |
| Taylor & Francis | Prohibited | Yes | Authors retain full responsibility for AI-assisted content |
| SAGE | Prohibited | Yes | Disclosure must specify which sections used AI |
The disclosure requirement is where enforcement falls apart. There is no mechanism for journals to verify whether disclosure is truthful, and Stanford’s data suggests the disclosure rate is dramatically below the actual usage rate — researchers admitted in surveys to using AI without acknowledging it in the corresponding manuscripts. The result is a policy regime that publicly forbids what 22-57% of researchers privately do anyway. The same enforcement gap appears in classroom contexts, as documented in our AI detection lawsuit tracker and our analysis of AI cheating consequences at universities.
When ChatGPT was asked 30 times each to evaluate 217 retracted papers, not one of the 6,510 outputs flagged the retraction. The same LLMs being used to write papers are blind to the integrity record of the literature they cite.
Source: Retraction Watch (2025)
📊 Field-by-Field AI Infiltration Calculator
Click any column header to sort. Estimated AI prevalence is drawn from the studies cited in the methodology section. Hover a row for source.
| Field / Venue | AI Footprint (%) | Sample Size | Year | Source |
|---|---|---|---|---|
| Computer Science papers | 22.0 | large corpus | 2024 | Science.org |
| Computer Science (Stanford method) | 17.5 | large corpus | 2024 | Stanford HAI |
| ICLR 2024 peer reviews | 15.8 | 28,028 | 2024 | arXiv 2405.02150 |
| CS conference review sentences | 17.0 | 50,000 | 2024 | Stanford HAI |
| ICLR 2024 sentences modified | 10.6 | 28,028 | 2024 | arXiv 2403.07183 |
| Scientific introductions (avg.) | 3.0 | varied | 2023-24 | NCBI / PMC |
| Self-reported (Nature 2023) | 30.0 | 1,600 | 2023 | Nature |
| Self-reported (Nature 2025) | 57.0 | survey | 2025 | Nature / Engineering |
| Self-reported (next two years) | 72.0 | survey | 2025 | Nature |
| Detected fabricated (Google Scholar) | 0.001 | 139 papers | 2024 | HKS Misinfo Review |

Methodology & Inclusion Criteria
Statistics in this report were drawn from peer-reviewed studies (arXiv preprints noted as such), publisher disclosures, and recognized integrity databases (Retraction Watch, the Problematic Paper Screener, the HKS Misinformation Review). Where multiple methodologies produced different headline figures — for instance Stanford’s 17.5% and Science.org’s 22% for AI infiltration of CS papers — both are reported with their sources. Self-report survey numbers (Nature 2023, Nature 2025) are listed separately from corpus-detection numbers because the two measure different phenomena: stated usage versus detectable usage. Retraction totals are current as of August 2025 per Retraction Watch; ICLR and peer-review figures are from 2024 conference cycles. The fact that detected GPT-fabrication (139 papers on Google Scholar) is orders of magnitude smaller than self-reported AI assistance (57%) reflects detection difficulty, not absence — and is itself one of the most important findings in this report.
Frequently Asked Questions
How many research papers have been retracted because of AI use?
Wiley alone retracted more than 11,300 papers from its Hindawi portfolio between 2022 and 2024 in the largest retraction event in publishing history, and at least 10,000 fraudulent articles were withdrawn across scientific journals in 2023. AI-assisted paper mills are a primary driver, with annual AI-related retractions peaking at 667 in 2023 per a Frontiers systematic review.
What percentage of research papers are written with ChatGPT or other AI?
Estimates vary by field and methodology. Stanford researchers found 17.5% of computer science papers contain AI-drafted content, and a Science.org analysis put the figure as high as 22% for CS. In self-report surveys, 30% of scientists in Nature’s 2023 survey and 57% in the 2025 follow-up admitted using AI for paper writing.
Can AI detectors identify AI-generated academic papers?
Detection is unreliable, as our broader false positive rates report documents. The Problematic Paper Screener developed by Guillaume Cabanac scans 130 million papers weekly using tortured-phrase matching and other heuristics, flagging over 15,000 suspicious papers, but it still requires human expert review to confirm misconduct. Commercial AI detectors perform poorly on academic prose.
Do journals allow ChatGPT to be a co-author on research papers?
No. Every major publisher — Elsevier, Springer Nature, Wiley, Taylor & Francis, and SAGE — explicitly prohibits AI tools from being listed as authors because authorship requires accountability that AI cannot provide. However, all five publishers permit disclosed AI assistance in a dedicated methods or acknowledgments section.
How often is AI used to write peer reviews?
A 2024 study of all 28,028 reviews submitted to the ICLR conference found 15.8% were written with LLM assistance, and 49.4% of paper submissions received at least one AI-assisted review. A Nature analysis of 50,000 CS conference reviews estimated up to 17% of all review sentences were LLM-generated.
How do investigators find AI-written papers in the wild?
Three signals dominate. First, leaked ChatGPT phrases like “as of my last knowledge update” and “certainly, here is” are searchable on Google Scholar. Second, tortured paraphrases like “nucleic corrosive” for “nucleic acid” are caught by the Problematic Paper Screener. Third, statistical word-frequency analyses compare post-ChatGPT papers to pre-2022 baselines — the method behind Stanford’s 17.5% figure.
Sources & References
- The Register — Wiley shuts 19 scholarly journals amid AI paper mill problem
- The Epoch Times — Wiley Shuts Down 19 Journals Amid Research Fraud Scandal
- ULiège Library — 10,000 fraudulent articles withdrawn from scientific journals in 2023
- Retraction Watch — Springer Nature journal clears AI papers
- Frontiers — Artificial intelligence in the retraction spotlight (systematic review)
- Dark Daily — Wiley Launches Paper Mill Detection Tool
- Science (AAAS) — One-fifth of computer science papers may include AI content
- Stanford HAI — How Much Research Is Being Written by Large Language Models
- Stanford HAI — AI’s Growing Role as Scientific Peer Reviewer
- arXiv 2405.02150 — The AI Review Lottery (Liang et al.)
- arXiv 2403.07183 — Monitoring AI-Modified Content at Scale
- HKS Misinformation Review — GPT-fabricated scientific papers on Google Scholar
- The Conversation — Problematic Paper Screener
- Wiley Learned Publishing — “As of my last knowledge update”: ChatGPT content in premier journals
- SciPub+ — Elsevier vs. Springer Nature: Comparing AI Policies
- Engineering — Scientists Increasingly Using AI to Help Write Papers (Nature 2025 follow-up analysis)
- Retraction Watch — AI Unreliable in Identifying Retracted Research Papers
- Litmaps — ChatGPT for Research: Do’s and Don’ts (Nature 2023 survey reference)
