AI-Generated Research Papers: 2026 Statistics on Retractions, Peer Review, and Journal Policies

Published:

Updated:

AI generated research papers 2026 statistics dark hero showing academic paper stack with binary code overlay

AI-Generated Research Papers: 2026 Statistics on Retractions, Peer Review, and Journal Policies

By Detection Drama Research Team · Updated May 26, 2026 · 11 min read
11,300+
papers retracted from Wiley’s Hindawi portfolio between 2022 and 2024 — the largest single retraction event in academic publishing history, fueled by AI-assisted paper mills.

Source: The Register · Wiley publisher disclosures

Key Takeaways

  • 22% of computer science papers analyzed in 2024 show signs of LLM-generated content — the highest rate of any field (Science.org)
  • 15.8% of peer reviews at ICLR 2024 were written with the help of an LLM, across 4,428 of 28,028 reviews (arXiv 2405.02150)
  • 49.4% of ICLR 2024 paper submissions received at least one AI-assisted review (Stanford / Liang et al.)
  • 57% of scientists in Nature’s 2025 survey admitted to using AI for writing help in the past two years (Nature)
  • 15,000+ papers flagged by the Problematic Paper Screener for tortured AI paraphrases like “nucleic corrosive” for “nucleic acid” (Cabanac et al.)
  • 139 GPT-fabricated papers identified on Google Scholar — two-thirds via undisclosed ChatGPT use (HKS Misinformation Review)
  • $35-40M in revenue Wiley lost in a single fiscal year tied directly to the Hindawi AI paper-mill scandal (Dark Daily)
  • 5 major publishers (Elsevier, Springer Nature, Wiley, Taylor & Francis, SAGE) explicitly ban AI authorship while permitting disclosed AI assistance (SciPub+)
Section 1

The Retraction Tsunami: 11,300 Papers and Counting

Wiley retracted more than 11,300 papers from its Hindawi portfolio between 2022 and 2024, shuttered 19 journals, and lost $35-40 million in revenue. The underlying mechanism: AI-assisted paper mills industrializing what used to be one-off scientific fraud.

The 2024 Wiley/Hindawi event is the largest retraction wave in the history of academic publishing — and AI is implicated at every step. According to reporting in The Register, paper mills used large language models to mass-produce manuscripts with fabricated data, plagiarised text, and hallucinated citations. Wiley shut down 19 scholarly journals as a result, and disclosed $35-40 million in lost annual revenue. The same dynamic that drives growth in the AI detection industry is also driving its inverse: industrial-scale generation of fake scholarship that detection cannot keep pace with.

Retraction Event Volume Year(s) Primary Driver
Wiley / Hindawi portfolio 11,300+ 2022-2024 AI-assisted paper mills
Total fraudulent withdrawals (global) 10,000+ 2023 Fake peer review + AI generation
Fake peer review retractions (cumulative) 6,400+ 2024-2025 Compromised review pipelines
Retraction Watch total corpus 55,000 through Aug 2025 All causes
AI-related retractions (peak year) 667 2023 Frontiers systematic review
Saveetha University authors 80+ 2024 Mass paper-mill output
$35-40M
Wiley’s disclosed annual revenue loss directly attributable to the Hindawi AI paper-mill scandal. The financial damage to publishers now rivals what individual universities spend defending against false-positive AI accusations.
Source: Dark Daily

Frontiers in Research Metrics published a systematic review showing AI-related retractions peaked at 667 in 2023, with the curve continuing to climb through 2024. Retraction Watch has documented at least one journal — Neurosurgical Review — that paused accepting commentaries after being overwhelmed by LLM-generated submissions, while Saveetha University authors saw at least 80 retractions in 2024 alone. The retraction infrastructure built for occasional misconduct cannot scale to the volume that paper mills now produce. This mirrors what we documented in the AI humanizer industry report: every defensive system in the integrity ecosystem is being outpaced by the generation side.

Section 2

How Much AI Is Actually In Published Papers

Estimates vary by methodology. Stanford put computer science at 17.5% AI-drafted; Science.org went as high as 22% for CS. Self-reported usage is dramatically higher: 30% of scientists in 2023, jumping to 57% in Nature’s 2025 follow-up.

Two measurement methods produce wildly different numbers, and both matter. The first is statistical word-frequency analysis: comparing the language of post-ChatGPT papers to pre-2022 baselines reveals shifts in token distribution that signal LLM input. Stanford’s Liang et al. analysis using this method found 17.5% of computer science papers contained at least some AI-drafted content. Science.org’s reporting on related work put the figure as high as 22% for CS — the most AI-saturated field by a wide margin.

The second method is self-report: surveys ask researchers directly whether they used AI. Nature’s 2023 survey found 30% of scientists had used generative AI to help write papers. By the time Nature’s 2025 follow-up ran, that figure climbed to 57% in the past two years and 72% in the next two. The gap between detected AI text in published papers (1-3% in some conservative analyses) and self-reported AI assistance (57%) tells the most important story: most researchers use AI as an editor or co-pilot, not as a ghost-writer, and that quiet middle ground is essentially undetectable. Detection Drama’s prior reporting on what makes writing sound AI-generated to humans reinforces this: when AI is used to polish rather than draft, the linguistic signal disappears.

AI Footprint in Academic Papers by Measurement Method

Self-reported (Nature 2025)

57%

Self-reported (Nature 2023)

30%

CS papers (Science.org)

22%

CS papers (Stanford)

17.5%

ICLR peer review sentences

17%

Detected fabricated (Google Scholar)

<1%

22%
of computer science papers analyzed in 2024 contained probable LLM input — the highest documented infiltration rate for any scientific discipline. CS is both the easiest field for AI to write convincingly and the field whose researchers are most enthusiastic about using it.
Source: Science.org analysis
Section 3

AI in Peer Review: Half of ICLR Submissions Hit a Bot

A 2024 study of all 28,028 reviews submitted to ICLR found 15.8% were written with LLM assistance, and 49.4% of paper submissions received at least one AI-assisted review. A separate Nature analysis of 50,000 CS conference reviews put per-sentence LLM authorship at up to 17%.

If AI in written papers is the headline, AI in peer review is the buried lead. The AI Review Lottery study analyzed all 28,028 reviews submitted to the International Conference on Learning Representations (ICLR) in 2024 and classified 4,428 of them — 15.8% — as crafted with LLM assistance. Crucially, 49.4% of submissions received at least one AI-assisted review, meaning roughly half of all ICLR authors had their work judged in part by a language model rather than a human reviewer. The same researchers also found that AI-assisted reviews boosted paper scores and acceptance rates, suggesting a systematic distortion of which research enters the citation graph.

A parallel Stanford HAI analysis of 50,000 CS conference peer reviews from 2023-2024 estimated up to 17% of all review sentences were likely written by an LLM. The implication is that the integrity layer most readers assume protects published research — disinterested human expert review — is itself being delegated to AI. This is happening in parallel to the issue we documented in our professors using ChatGPT report, where instructors increasingly use AI for the same grading and feedback tasks they penalize students for automating.

Venue / Study Reviews Analyzed AI Footprint Source
ICLR 2024 (Liang et al.) 28,028 15.8% LLM-assisted arXiv 2405.02150
ICLR 2024 submissions with ≥1 AI review 49.4% ~half of papers arXiv 2405.02150
CS Conference reviews (Stanford) 50,000 Up to 17% of sentences Stanford HAI
ICLR 2024 sentences modified by ChatGPT 10.6% substantially modified arXiv 2403.07183
49.4%
of ICLR 2024 paper submissions received at least one AI-assisted peer review. Roughly half of all reviewed papers had their fate partially decided by a language model — and AI-assisted reviews systematically inflated paper scores.
Source: arXiv 2405.02150 (Liang et al.)
Section 4

Detection Signals: What Actually Works on Published Papers

Three signals dominate verified detection: leaked ChatGPT phrases, tortured paraphrases, and statistical word-frequency shifts. The Problematic Paper Screener scans 130 million papers weekly and has flagged over 15,000. Commercial AI detectors, by contrast, perform poorly on academic prose.

Investigators tracking AI-generated papers rely on three signal types, none of which resemble what commercial AI detectors do. The first is leaked LLM phrases: a Wiley Learned Publishing analysis documented how queries like “as of my last knowledge update” and “certainly, here is” surface thousands of papers on Google Scholar that authors forgot to redact. The phrase “regenerate response” has appeared verbatim in dozens of indexed manuscripts. These are not detection-tool outputs — they are forensic search queries any reader can run.

The second signal is tortured phrases. Guillaume Cabanac and collaborators built the Problematic Paper Screener, which scans 130 million scientific publications weekly using nine detectors. The flagship detector catches machine-paraphrased text in which terminology has been rewritten by synonym substitution — “nucleic corrosive” instead of “nucleic acid”, “counterfeit conscience” instead of “artificial intelligence”. The system has flagged over 15,000 papers, providing the most reliable corpus of confirmed-suspicious literature in the field.

The third method — statistical word-frequency shift analysis — is how Stanford and Nature produced their 17.5% and 22% headline figures. None of these methods resemble the commercial AI-detection tools used in classrooms, which our false positive statistics report documents as performing erratically on academic prose. As covered in our ESL bias research, the same detectors that flag Charles Dickens at 95% AI are useless on real journal submissions — which is precisely why publishers built their own forensic pipelines instead.

AI in academic publishing 2026 snapshot infographic showing six key statistics including Wiley retractions ChatGPT use and tortured phrases
Six headline statistics on AI infiltration of academic publishing in 2026.
Section 5

Publisher Policies in 2026: The Five-Publisher Consensus

Every major publisher — Elsevier, Springer Nature, Wiley, Taylor & Francis, and SAGE — explicitly prohibits AI tools from being listed as authors but requires disclosure of any AI use during writing. The reasoning is identical across all five: authorship requires accountability that AI cannot provide.

By mid-2024, the five largest academic publishers had converged on near-identical AI policies. A SciPub+ comparison documents the consensus: AI tools cannot be listed as authors because authorship implies a responsibility for the work that no algorithm can take on. ChatGPT cannot be sued, cannot be reprimanded, cannot retract a co-authored claim. Authorship without accountability is impossible.

Publisher AI as Author? Disclosure Required? Special Note
Elsevier Prohibited Yes, in dedicated section AI-generated images banned in articles
Springer Nature Prohibited Yes, in methods or acknowledgments Explicitly bans AI-image generation in scientific manuscripts
Wiley Prohibited Yes Built proprietary paper-mill detection tool post-Hindawi
Taylor & Francis Prohibited Yes Authors retain full responsibility for AI-assisted content
SAGE Prohibited Yes Disclosure must specify which sections used AI

The disclosure requirement is where enforcement falls apart. There is no mechanism for journals to verify whether disclosure is truthful, and Stanford’s data suggests the disclosure rate is dramatically below the actual usage rate — researchers admitted in surveys to using AI without acknowledging it in the corresponding manuscripts. The result is a policy regime that publicly forbids what 22-57% of researchers privately do anyway. The same enforcement gap appears in classroom contexts, as documented in our AI detection lawsuit tracker and our analysis of AI cheating consequences at universities.

0 / 6,510
When ChatGPT was asked 30 times each to evaluate 217 retracted papers, not one of the 6,510 outputs flagged the retraction. The same LLMs being used to write papers are blind to the integrity record of the literature they cite.
Source: Retraction Watch (2025)

📊 Field-by-Field AI Infiltration Calculator

Click any column header to sort. Estimated AI prevalence is drawn from the studies cited in the methodology section. Hover a row for source.

Field / Venue AI Footprint (%) Sample Size Year Source
Computer Science papers 22.0 large corpus 2024 Science.org
Computer Science (Stanford method) 17.5 large corpus 2024 Stanford HAI
ICLR 2024 peer reviews 15.8 28,028 2024 arXiv 2405.02150
CS conference review sentences 17.0 50,000 2024 Stanford HAI
ICLR 2024 sentences modified 10.6 28,028 2024 arXiv 2403.07183
Scientific introductions (avg.) 3.0 varied 2023-24 NCBI / PMC
Self-reported (Nature 2023) 30.0 1,600 2023 Nature
Self-reported (Nature 2025) 57.0 survey 2025 Nature / Engineering
Self-reported (next two years) 72.0 survey 2025 Nature
Detected fabricated (Google Scholar) 0.001 139 papers 2024 HKS Misinfo Review
Click column headers · Numeric sorts are descending first
Where AI hides in academic papers 2026 horizontal bar chart comparing self-reported AI use computer science papers peer reviews and detected fabrications
Self-reporting dwarfs forensic detection — the gap is where the actual AI use lives.

Methodology & Inclusion Criteria

Statistics in this report were drawn from peer-reviewed studies (arXiv preprints noted as such), publisher disclosures, and recognized integrity databases (Retraction Watch, the Problematic Paper Screener, the HKS Misinformation Review). Where multiple methodologies produced different headline figures — for instance Stanford’s 17.5% and Science.org’s 22% for AI infiltration of CS papers — both are reported with their sources. Self-report survey numbers (Nature 2023, Nature 2025) are listed separately from corpus-detection numbers because the two measure different phenomena: stated usage versus detectable usage. Retraction totals are current as of August 2025 per Retraction Watch; ICLR and peer-review figures are from 2024 conference cycles. The fact that detected GPT-fabrication (139 papers on Google Scholar) is orders of magnitude smaller than self-reported AI assistance (57%) reflects detection difficulty, not absence — and is itself one of the most important findings in this report.

Frequently Asked Questions

How many research papers have been retracted because of AI use?

Wiley alone retracted more than 11,300 papers from its Hindawi portfolio between 2022 and 2024 in the largest retraction event in publishing history, and at least 10,000 fraudulent articles were withdrawn across scientific journals in 2023. AI-assisted paper mills are a primary driver, with annual AI-related retractions peaking at 667 in 2023 per a Frontiers systematic review.

What percentage of research papers are written with ChatGPT or other AI?

Estimates vary by field and methodology. Stanford researchers found 17.5% of computer science papers contain AI-drafted content, and a Science.org analysis put the figure as high as 22% for CS. In self-report surveys, 30% of scientists in Nature’s 2023 survey and 57% in the 2025 follow-up admitted using AI for paper writing.

Can AI detectors identify AI-generated academic papers?

Detection is unreliable, as our broader false positive rates report documents. The Problematic Paper Screener developed by Guillaume Cabanac scans 130 million papers weekly using tortured-phrase matching and other heuristics, flagging over 15,000 suspicious papers, but it still requires human expert review to confirm misconduct. Commercial AI detectors perform poorly on academic prose.

Do journals allow ChatGPT to be a co-author on research papers?

No. Every major publisher — Elsevier, Springer Nature, Wiley, Taylor & Francis, and SAGE — explicitly prohibits AI tools from being listed as authors because authorship requires accountability that AI cannot provide. However, all five publishers permit disclosed AI assistance in a dedicated methods or acknowledgments section.

How often is AI used to write peer reviews?

A 2024 study of all 28,028 reviews submitted to the ICLR conference found 15.8% were written with LLM assistance, and 49.4% of paper submissions received at least one AI-assisted review. A Nature analysis of 50,000 CS conference reviews estimated up to 17% of all review sentences were LLM-generated.

How do investigators find AI-written papers in the wild?

Three signals dominate. First, leaked ChatGPT phrases like “as of my last knowledge update” and “certainly, here is” are searchable on Google Scholar. Second, tortured paraphrases like “nucleic corrosive” for “nucleic acid” are caught by the Problematic Paper Screener. Third, statistical word-frequency analyses compare post-ChatGPT papers to pre-2022 baselines — the method behind Stanford’s 17.5% figure.

Sources & References

  1. The Register — Wiley shuts 19 scholarly journals amid AI paper mill problem
  2. The Epoch Times — Wiley Shuts Down 19 Journals Amid Research Fraud Scandal
  3. ULiège Library — 10,000 fraudulent articles withdrawn from scientific journals in 2023
  4. Retraction Watch — Springer Nature journal clears AI papers
  5. Frontiers — Artificial intelligence in the retraction spotlight (systematic review)
  6. Dark Daily — Wiley Launches Paper Mill Detection Tool
  7. Science (AAAS) — One-fifth of computer science papers may include AI content
  8. Stanford HAI — How Much Research Is Being Written by Large Language Models
  9. Stanford HAI — AI’s Growing Role as Scientific Peer Reviewer
  10. arXiv 2405.02150 — The AI Review Lottery (Liang et al.)
  11. arXiv 2403.07183 — Monitoring AI-Modified Content at Scale
  12. HKS Misinformation Review — GPT-fabricated scientific papers on Google Scholar
  13. The Conversation — Problematic Paper Screener
  14. Wiley Learned Publishing — “As of my last knowledge update”: ChatGPT content in premier journals
  15. SciPub+ — Elsevier vs. Springer Nature: Comparing AI Policies
  16. Engineering — Scientists Increasingly Using AI to Help Write Papers (Nature 2025 follow-up analysis)
  17. Retraction Watch — AI Unreliable in Identifying Retracted Research Papers
  18. Litmaps — ChatGPT for Research: Do’s and Don’ts (Nature 2023 survey reference)