AI-Generated Research Papers: 2026 Statistics on Retractions, Peer Review, and Journal Policies

Published:

May 25, 2026

Updated:

June 27, 2026

Author:

AI generated research papers 2026 statistics dark hero showing academic paper stack with binary code overlay

Disclaimer

As an affiliate, we may earn a commission from qualifying purchases. We get commissions for purchases made through links on this website from Amazon and other third parties.

Detection Drama · Free Download

Want to bypass Turnitin in 2026? Grab the free prompt pack.

Get the exact text-humanization prompts I use to drop an AI score by hand — copy, paste, submit. Free, straight to your inbox.

Send me the free prompts →

Free · No credit card · Straight to your inbox

AI-Generated Research Papers: 2026 Statistics on Retractions, Peer Review, and Journal Policies

By Detection Drama Research Team · Updated May 26, 2026 · 11 min read

11,300+ papers retracted from Wiley’s Hindawi portfolio between 2022 and 2024 — the largest single retraction event in academic publishing history, fueled by AI-assisted paper mills.

Source: The Register · Wiley publisher disclosures

Key Takeaways

22% of computer science papers analyzed in 2024 show signs of LLM-generated content — the highest rate of any field (Science.org)
15.8% of peer reviews at ICLR 2024 were written with the help of an LLM, across 4,428 of 28,028 reviews (arXiv 2405.02150)
49.4% of ICLR 2024 paper submissions received at least one AI-assisted review (Stanford / Liang et al.)
57% of scientists in Nature’s 2025 survey admitted to using AI for writing help in the past two years (Nature)
15,000+ papers flagged by the Problematic Paper Screener for tortured AI paraphrases like “nucleic corrosive” for “nucleic acid” (Cabanac et al.)
139 GPT-fabricated papers identified on Google Scholar — two-thirds via undisclosed ChatGPT use (HKS Misinformation Review)
$35-40M in revenue Wiley lost in a single fiscal year tied directly to the Hindawi AI paper-mill scandal (Dark Daily)
5 major publishers (Elsevier, Springer Nature, Wiley, Taylor & Francis, SAGE) explicitly ban AI authorship while permitting disclosed AI assistance (SciPub+)

What’s In This Report

The Retraction Tsunami
How Much AI Is In Papers
AI in Peer Review
Detection Signals That Work
Publisher Policies (2026)
Field-by-Field Calculator
Methodology
FAQ

Section 1

The Retraction Tsunami: 11,300 Papers and Counting

Wiley retracted more than 11,300 papers from its Hindawi portfolio between 2022 and 2024, shuttered 19 journals, and lost $35-40 million in revenue. The underlying mechanism: AI-assisted paper mills industrializing what used to be one-off scientific fraud.

The 2024 Wiley/Hindawi event is the largest retraction wave in the history of academic publishing — and AI is implicated at every step. According to reporting in The Register, paper mills used large language models to mass-produce manuscripts with fabricated data, plagiarised text, and hallucinated citations. Wiley shut down 19 scholarly journals as a result, and disclosed $35-40 million in lost annual revenue. The same dynamic that drives growth in the AI detection industry is also driving its inverse: industrial-scale generation of fake scholarship that detection cannot keep pace with.

Retraction Event	Volume	Year(s)	Primary Driver
Wiley / Hindawi portfolio	11,300+	2022-2024	AI-assisted paper mills
Total fraudulent withdrawals (global)	10,000+	2023	Fake peer review + AI generation
Fake peer review retractions (cumulative)	6,400+	2024-2025	Compromised review pipelines
Retraction Watch total corpus	55,000	through Aug 2025	All causes
AI-related retractions (peak year)	667	2023	Frontiers systematic review
Saveetha University authors	80+	2024	Mass paper-mill output

$35-40M Wiley’s disclosed annual revenue loss directly attributable to the Hindawi AI paper-mill scandal. The financial damage to publishers now rivals what individual universities spend defending against false-positive AI accusations. Source: Dark Daily

Frontiers in Research Metrics published a systematic review showing AI-related retractions peaked at 667 in 2023, with the curve continuing to climb through 2024. Retraction Watch has documented at least one journal — Neurosurgical Review — that paused accepting commentaries after being overwhelmed by LLM-generated submissions, while Saveetha University authors saw at least 80 retractions in 2024 alone. The retraction infrastructure built for occasional misconduct cannot scale to the volume that paper mills now produce. This mirrors what we documented in the AI humanizer industry report: every defensive system in the integrity ecosystem is being outpaced by the generation side.

Section 2

How Much AI Is Actually In Published Papers

Estimates vary by methodology. Stanford put computer science at 17.5% AI-drafted; Science.org went as high as 22% for CS. Self-reported usage is dramatically higher: 30% of scientists in 2023, jumping to 57% in Nature’s 2025 follow-up.

Two measurement methods produce wildly different numbers, and both matter. The first is statistical word-frequency analysis: comparing the language of post-ChatGPT papers to pre-2022 baselines reveals shifts in token distribution that signal LLM input. Stanford’s Liang et al. analysis using this method found 17.5% of computer science papers contained at least some AI-drafted content. Science.org’s reporting on related work put the figure as high as 22% for CS — the most AI-saturated field by a wide margin.

The second method is self-report: surveys ask researchers directly whether they used AI. Nature’s 2023 survey found 30% of scientists had used generative AI to help write papers. By the time Nature’s 2025 follow-up ran, that figure climbed to 57% in the past two years and 72% in the next two. The gap between detected AI text in published papers (1-3% in some conservative analyses) and self-reported AI assistance (57%) tells the most important story: most researchers use AI as an editor or co-pilot, not as a ghost-writer, and that quiet middle ground is essentially undetectable. Detection Drama’s prior reporting on what makes writing sound AI-generated to humans reinforces this: when AI is used to polish rather than draft, the linguistic signal disappears.

AI Footprint in Academic Papers by Measurement Method

Self-reported (Nature 2025)

57%

Self-reported (Nature 2023)

30%

CS papers (Science.org)

22%

CS papers (Stanford)

17.5%

ICLR peer review sentences

17%

Detected fabricated (Google Scholar)

<1%

22% of computer science papers analyzed in 2024 contained probable LLM input — the highest documented infiltration rate for any scientific discipline. CS is both the easiest field for AI to write convincingly and the field whose researchers are most enthusiastic about using it. Source: Science.org analysis

Section 3

AI in Peer Review: Half of ICLR Submissions Hit a Bot

A 2024 study of all 28,028 reviews submitted to ICLR found 15.8% were written with LLM assistance, and 49.4% of paper submissions received at least one AI-assisted review. A separate Nature analysis of 50,000 CS conference reviews put per-sentence LLM authorship at up to 17%.

If AI in written papers is the headline, AI in peer review is the buried lead. The AI Review Lottery study analyzed all 28,028 reviews submitted to the International Conference on Learning Representations (ICLR) in 2024 and classified 4,428 of them — 15.8% — as crafted with LLM assistance. Crucially, 49.4% of submissions received at least one AI-assisted review, meaning roughly half of all ICLR authors had their work judged in part by a language model rather than a human reviewer. The same researchers also found that AI-assisted reviews boosted paper scores and acceptance rates, suggesting a systematic distortion of which research enters the citation graph.

A parallel Stanford HAI analysis of 50,000 CS conference peer reviews from 2023-2024 estimated up to 17% of all review sentences were likely written by an LLM. The implication is that the integrity layer most readers assume protects published research — disinterested human expert review — is itself being delegated to AI. This is happening in parallel to the issue we documented in our professors using ChatGPT report, where instructors increasingly use AI for the same grading and feedback tasks they penalize students for automating.

Venue / Study	Reviews Analyzed	AI Footprint	Source
ICLR 2024 (Liang et al.)	28,028	15.8% LLM-assisted	arXiv 2405.02150
ICLR 2024 submissions with ≥1 AI review	49.4%	~half of papers	arXiv 2405.02150
CS Conference reviews (Stanford)	50,000	Up to 17% of sentences	Stanford HAI
ICLR 2024 sentences modified by ChatGPT	10.6%	substantially modified	arXiv 2403.07183

49.4% of ICLR 2024 paper submissions received at least one AI-assisted peer review. Roughly half of all reviewed papers had their fate partially decided by a language model — and AI-assisted reviews systematically inflated paper scores. Source: arXiv 2405.02150 (Liang et al.)

Section 4

Detection Signals: What Actually Works on Published Papers

Three signals dominate verified detection: leaked ChatGPT phrases, tortured paraphrases, and statistical word-frequency shifts. The Problematic Paper Screener scans 130 million papers weekly and has flagged over 15,000. Commercial AI detectors, by contrast, perform poorly on academic prose.

Investigators tracking AI-generated papers rely on three signal types, none of which resemble what commercial AI detectors do. The first is leaked LLM phrases: a Wiley Learned Publishing analysis documented how queries like “as of my last knowledge update” and “certainly, here is” surface thousands of papers on Google Scholar that authors forgot to redact. The phrase “regenerate response” has appeared verbatim in dozens of indexed manuscripts. These are not detection-tool outputs — they are forensic search queries any reader can run.

The second signal is tortured phrases. Guillaume Cabanac and collaborators built the Problematic Paper Screener, which scans 130 million scientific publications weekly using nine detectors. The flagship detector catches machine-paraphrased text in which terminology has been rewritten by synonym substitution — “nucleic corrosive” instead of “nucleic acid”, “counterfeit conscience” instead of “artificial intelligence”. The system has flagged over 15,000 papers, providing the most reliable corpus of confirmed-suspicious literature in the field.

The third method — statistical word-frequency shift analysis — is how Stanford and Nature produced their 17.5% and 22% headline figures. None of these methods resemble the commercial AI-detection tools used in classrooms, which our false positive statistics report documents as performing erratically on academic prose. As covered in our ESL bias research, the same detectors that flag Charles Dickens at 95% AI are useless on real journal submissions — which is precisely why publishers built their own forensic pipelines instead.

AI in academic publishing 2026 snapshot infographic showing six key statistics including Wiley retractions ChatGPT use and tortured phrases

Six headline statistics on AI infiltration of academic publishing in 2026.

Section 5

Publisher Policies in 2026: The Five-Publisher Consensus

Every major publisher — Elsevier, Springer Nature, Wiley, Taylor & Francis, and SAGE — explicitly prohibits AI tools from being listed as authors but requires disclosure of any AI use during writing. The reasoning is identical across all five: authorship requires accountability that AI cannot provide.

By mid-2024, the five largest academic publishers had converged on near-identical AI policies. A SciPub+ comparison documents the consensus: AI tools cannot be listed as authors because authorship implies a responsibility for the work that no algorithm can take on. ChatGPT cannot be sued, cannot be reprimanded, cannot retract a co-authored claim. Authorship without accountability is impossible.

Publisher	AI as Author?	Disclosure Required?	Special Note
Elsevier	Prohibited	Yes, in dedicated section	AI-generated images banned in articles
Springer Nature	Prohibited	Yes, in methods or acknowledgments	Explicitly bans AI-image generation in scientific manuscripts
Wiley	Prohibited	Yes	Built proprietary paper-mill detection tool post-Hindawi
Taylor & Francis	Prohibited	Yes	Authors retain full responsibility for AI-assisted content
SAGE	Prohibited	Yes	Disclosure must specify which sections used AI

The disclosure requirement is where enforcement falls apart. There is no mechanism for journals to verify whether disclosure is truthful, and Stanford’s data suggests the disclosure rate is dramatically below the actual usage rate — researchers admitted in surveys to using AI without acknowledging it in the corresponding manuscripts. The result is a policy regime that publicly forbids what 22-57% of researchers privately do anyway. The same enforcement gap appears in classroom contexts, as documented in our AI detection lawsuit tracker and our analysis of AI cheating consequences at universities.

0 / 6,510 When ChatGPT was asked 30 times each to evaluate 217 retracted papers, not one of the 6,510 outputs flagged the retraction. The same LLMs being used to write papers are blind to the integrity record of the literature they cite. Source: Retraction Watch (2025)

📊 Field-by-Field AI Infiltration Calculator

Click any column header to sort. Estimated AI prevalence is drawn from the studies cited in the methodology section. Hover a row for source.

Field / Venue	AI Footprint (%)	Sample Size	Year	Source
Computer Science papers	22.0	large corpus	2024	Science.org
Computer Science (Stanford method)	17.5	large corpus	2024	Stanford HAI
ICLR 2024 peer reviews	15.8	28,028	2024	arXiv 2405.02150
CS conference review sentences	17.0	50,000	2024	Stanford HAI
ICLR 2024 sentences modified	10.6	28,028	2024	arXiv 2403.07183
Scientific introductions (avg.)	3.0	varied	2023-24	NCBI / PMC
Self-reported (Nature 2023)	30.0	1,600	2023	Nature
Self-reported (Nature 2025)	57.0	survey	2025	Nature / Engineering
Self-reported (next two years)	72.0	survey	2025	Nature
Detected fabricated (Google Scholar)	0.001	139 papers	2024	HKS Misinfo Review

Click column headers · Numeric sorts are descending first

Where AI hides in academic papers 2026 horizontal bar chart comparing self-reported AI use computer science papers peer reviews and detected fabrications

Self-reporting dwarfs forensic detection — the gap is where the actual AI use lives.

Methodology & Inclusion Criteria

Statistics in this report were drawn from peer-reviewed studies (arXiv preprints noted as such), publisher disclosures, and recognized integrity databases (Retraction Watch, the Problematic Paper Screener, the HKS Misinformation Review). Where multiple methodologies produced different headline figures — for instance Stanford’s 17.5% and Science.org’s 22% for AI infiltration of CS papers — both are reported with their sources. Self-report survey numbers (Nature 2023, Nature 2025) are listed separately from corpus-detection numbers because the two measure different phenomena: stated usage versus detectable usage. Retraction totals are current as of August 2025 per Retraction Watch; ICLR and peer-review figures are from 2024 conference cycles. The fact that detected GPT-fabrication (139 papers on Google Scholar) is orders of magnitude smaller than self-reported AI assistance (57%) reflects detection difficulty, not absence — and is itself one of the most important findings in this report.

Frequently Asked Questions

How many research papers have been retracted because of AI use?

Wiley alone retracted more than 11,300 papers from its Hindawi portfolio between 2022 and 2024 in the largest retraction event in publishing history, and at least 10,000 fraudulent articles were withdrawn across scientific journals in 2023. AI-assisted paper mills are a primary driver, with annual AI-related retractions peaking at 667 in 2023 per a Frontiers systematic review.

What percentage of research papers are written with ChatGPT or other AI?

Estimates vary by field and methodology. Stanford researchers found 17.5% of computer science papers contain AI-drafted content, and a Science.org analysis put the figure as high as 22% for CS. In self-report surveys, 30% of scientists in Nature’s 2023 survey and 57% in the 2025 follow-up admitted using AI for paper writing.

Can AI detectors identify AI-generated academic papers?

Detection is unreliable, as our broader false positive rates report documents. The Problematic Paper Screener developed by Guillaume Cabanac scans 130 million papers weekly using tortured-phrase matching and other heuristics, flagging over 15,000 suspicious papers, but it still requires human expert review to confirm misconduct. Commercial AI detectors perform poorly on academic prose.

Do journals allow ChatGPT to be a co-author on research papers?

No. Every major publisher — Elsevier, Springer Nature, Wiley, Taylor & Francis, and SAGE — explicitly prohibits AI tools from being listed as authors because authorship requires accountability that AI cannot provide. However, all five publishers permit disclosed AI assistance in a dedicated methods or acknowledgments section.

How often is AI used to write peer reviews?

A 2024 study of all 28,028 reviews submitted to the ICLR conference found 15.8% were written with LLM assistance, and 49.4% of paper submissions received at least one AI-assisted review. A Nature analysis of 50,000 CS conference reviews estimated up to 17% of all review sentences were LLM-generated.

How do investigators find AI-written papers in the wild?

Three signals dominate. First, leaked ChatGPT phrases like “as of my last knowledge update” and “certainly, here is” are searchable on Google Scholar. Second, tortured paraphrases like “nucleic corrosive” for “nucleic acid” are caught by the Problematic Paper Screener. Third, statistical word-frequency analyses compare post-ChatGPT papers to pre-2022 baselines — the method behind Stanford’s 17.5% figure.

Sources & References

About the author

Written by

Detection Drama Staff

AI-Generated Research Papers: 2026 Statistics on Retractions, Peer Review, and Journal Policies

Want to bypass Turnitin in 2026? Grab the free prompt pack.

AI-Generated Research Papers: 2026 Statistics on Retractions, Peer Review, and Journal Policies

Key Takeaways

The Retraction Tsunami: 11,300 Papers and Counting

How Much AI Is Actually In Published Papers

AI Footprint in Academic Papers by Measurement Method

AI in Peer Review: Half of ICLR Submissions Hit a Bot

Detection Signals: What Actually Works on Published Papers

Publisher Policies in 2026: The Five-Publisher Consensus

📊 Field-by-Field AI Infiltration Calculator

Methodology & Inclusion Criteria

Frequently Asked Questions

Sources & References

Latest Posts

AI Detection Industry Statistics 2026: Market Size, Users & Adoption

How to Humanize AI Discussion Replies Fast

Turnitin Made AI Reports One-Click Downloadable – For Everyone Except Students