Skip to main content
ukiyo journal - 日本と世界をつなぐ新しいニュースメディア Logo
  • All Articles
  • 🗒️ Register
  • 🔑 Login
    • 日本語
    • 中文
    • Español
    • Français
    • 한국어
    • Deutsch
    • ภาษาไทย
    • हिंदी
Cookie Usage

We use cookies to improve our services and optimize user experience. Privacy Policy and Cookie Policy for more information.

Cookie Settings

You can configure detailed settings for cookie usage.

Essential Cookies

Cookies necessary for basic site functionality. These cannot be disabled.

Analytics Cookies

Cookies used to analyze site usage and improve our services.

Marketing Cookies

Cookies used to display personalized advertisements.

Functional Cookies

Cookies that provide functionality such as user settings and language selection.

The Era of "Mass-Produced" Papers with AI: Productivity Has Increased, But Can We Discern Quality?

The Era of "Mass-Produced" Papers with AI: Productivity Has Increased, But Can We Discern Quality?

2025年12月23日 12:45

Is "More Papers with AI" Good News?

In December 2025, a study capturing the large-scale changes brought by generative AI (large language models: LLM) to academia became a hot topic. According to Phys.org, scientists using LLMs (e.g., ChatGPT) are visibly increasing their paper production compared to those who do not use them. Moreover, this growth is uneven across fields and regions, particularly providing a strong tailwind for researchers whose native language is not English.Phys.org


This news resonates because it simultaneously reveals the "structural bottleneck" that academic publishing has always faced and the "new bottleneck" created by generative AI. For researchers, less "writing time" is good news. However, for reviewers, editors, and society using research outcomes, the "burden of reading/detecting" skyrockets. The benefits of efficiency bounce back as costs elsewhere.


How Did the Study Measure "AI Usage"?

The foundation of this analysis is preprints (papers published before peer review). Researchers from Cornell University and UC Berkeley targeted over 2 million abstracts submitted to large preprint servers like arXiv, bioRxiv, and SSRN (from January 2018 to June 2024).Phys.org


The key here is how they estimated "who used LLMs." According to Phys.org, the research team created an algorithm that had GPT-3.5 (GPT-3.5 Turbo-0125) generate "AI-like abstracts" based on pre-2023 abstracts, learned the differences (phrasing and statistical quirks) from human texts, and detected whether those "fingerprints" appeared in subsequent abstracts.Phys.org


Furthermore, as introduced by EurekAlert!, they used a difference-in-differences approach to track changes before and after LLM adoption, comparing how productivity changed post-adoption.EurekAlert!


In other words, they did not rely on "self-reported AI usage" but rather "estimation from textual features." There are limitations here (such as the potential mix of style templates and proofreading tools). However, due to its large scale, it is easier to grasp "overall trends" that are hard to see in individual cases.


Result 1: Paper "Productivity" Really Increased

The numbers are quite strong. Phys.org reports an increase by field: 59.8% in social sciences and humanities, 52.9% in biological and life sciences, and 36.2% in physics and mathematics.Phys.org


The Cornell Chronicle aligns with this, stating about a one-third increase (≈36%) on arXiv, and over 50% on bioRxiv and SSRN.Cornell Chronicle


A common misconception here is that the increase is due to "AI generating research ideas," whereas the effect seems more significant in reducing the "friction in writing, organizing, and preparing for submission." Even if the same amount of time is spent on experiments and analysis, just speeding up the writing process increases the "number of papers that can be submitted." Especially in areas like abstracts, introductions, organizing related research, and refining expressions, LLMs excel.


Result 2: "Equalizing Disparities" for Non-English Speaking Regions

The most symbolic aspect of this study is the regional differences. Phys.org reports "up to an 89% increase among Asian researchers," touching on the history where the culture of top journals requiring English has been disadvantageous to non-English speaking regions.Phys.org


The Cornell Chronicle goes further, stating that submissions from researchers affiliated with Asian institutions increased by 43.0% to 89.3%, while those from English-speaking institutions or authors with "Caucasian" names were more modest at 23.7% to 46.2%.Cornell Chronicle


EurekAlert! also summarizes that the productivity boost ranges from 23.7% to 89.3%, with greater benefits for those facing higher language barriers.EurekAlert!


This suggests the potential diminishing of the long-standing "premium on English fluency" in academia. If "the ability to refine in English" has been overvalued compared to the content of the research, there is indeed an aspect where this distortion is being corrected.


On the other hand, this also represents a "change in the rules of competition." Those who previously excelled due to English proficiency may find their advantage reduced, with evaluations possibly shifting to other signals (affiliation, advisors, brand). Phys.org warns that as judgment by writing falters, there might be a reliance on status indicators like the author's background and affiliation.Phys.org


Result 3: While Writing Becomes "Sophisticated," Quality Indicators Reverse

This is the scariest point of the study. Traditionally, the ability to write readable and professionally coherent texts correlated with research quality. Reviewers and editors have relied on the completeness of writing as a "quick check" amidst their busy schedules.


However, according to the Cornell Chronicle, while papers written by humans showed a tendency for "more complex writing to be more likely accepted," papers likely involving LLMs showed a reversal, with "more complex writing being less likely accepted."Cornell Chronicle


TechXplore (reposting from The Conversation) similarly summarizes that under AI assistance, "complex language" can be used to cover weak content.TechXplore


Simply put, the shortcut of "good writing = good research" has broken. More accurately, "good writing has become cheaper to obtain." Therefore, the probability of hitting the "jackpot" by just looking at the writing decreases.


Result 4: An Unexpected Plus in the "Diversification" of Citations and Exploration

It's not all gloomy. The Cornell Chronicle introduces the possibility that AI use in exploring related research could influence access, with AI searches (e.g., Bing Chat) making it easier to access new literature and books, relatively weakening the bias towards "old, highly-cited classics" that traditional searches tend to pick up.Cornell Chronicle


EurekAlert! also summarizes that LLM adopters tend to read and cite more diverse literature (books, newer research, less-cited works).EurekAlert!


While this suggests a potential broadening of "research idea width," it also raises other concerns. As citations increase and diversify, the burden of checking the accuracy of references also rises. If the hallucinations of generative AI (plausible errors) mix in, it could become a breeding ground for erroneous citation chains.


"Paper Flood" and the Future of Peer Review: What Gets Bottlenecked?

Here's how it looks when summarized into a single diagram.

  • Researchers: Shortened writing process → Increased number of submissions

  • Journals/Reviewers: Increased number of papers to read + Harder to "sift" by writing quality

  • Research Evaluation (Hiring/Promotion/Grants): Further distortion of quantity indicators (too much volume)

  • Society: Increased cost of identifying reliable knowledge


Phys.org suggests "deeper checks" and "AI-based peer review support agents" as countermeasures.Phys.org


TechXplore (The Conversation) suggests that given the increase in paper submissions and editorial burden, there might be situations where using AI review support is inevitable, like "fighting fire with fire."TechXplore


However, the structure of judging AI with AI also carries new risks. There could be a "cat-and-mouse game" with detector biases, misjudgments, and tool competition (optimizing text to avoid detection). The important shift should be from "cracking down on AI-like features" to "verifying the robustness of research."


Reactions on Social Media (Trends): Points of Division

※ The following is not a comprehensive collection of individual posts but an organization of recurring points in articles, reposts, and discussions (i.e., "reaction types" that easily spread on social media).


1) "Hope for Non-English Speaking Regions" Group: Expectation for Democratization of Research

The figures of a 43.0% to 89.3% increase among Asian researchers are striking and easily connect with the experience that the English barrier has distorted research competition.Cornell Chronicle


The context of "being able to compete on content even if not good at English" and "lowering proofreading costs is justice" makes it easy to receive positively.


2) "Beware of AI Slop" Group: Polished Writing Increases "Noise"

On the other hand, as TechXplore discussed with the strong term "AI slop (flood of low-quality outputs)," there is an easy reaction that it's dangerous for polished writing submissions to avalanche in when peer review and editing are already at their limits.TechXplore


The Cornell Chronicle also conveys the "on-the-ground feeling" of editors lamenting the influx of "well-written but scientifically low-value manuscripts."Cornell Chronicle


3) "Evaluation System Breakdown" Group: Fear of Accelerating Quantity Incentives

The more papers increase, the more the "system of evaluation by quantity" becomes precarious. As Phys.org points out, if writing skill no longer serves as a signal, there could be an ironic return to affiliation and author brand.##

← Back to Article List

Contact |  Terms of Service |  Privacy Policy |  Cookie Policy |  Cookie Settings

© Copyright ukiyo journal - 日本と世界をつなぐ新しいニュースメディア All rights reserved.