Skip to main content
ukiyo journal - 日本と世界をつなぐ新しいニュースメディア Logo
  • All Articles
  • 🗒️ Register
  • 🔑 Login
    • 日本語
    • 中文
    • Español
    • Français
    • 한국어
    • Deutsch
    • ภาษาไทย
    • हिंदी
Cookie Usage

We use cookies to improve our services and optimize user experience. Privacy Policy and Cookie Policy for more information.

Cookie Settings

You can configure detailed settings for cookie usage.

Essential Cookies

Cookies necessary for basic site functionality. These cannot be disabled.

Analytics Cookies

Cookies used to analyze site usage and improve our services.

Marketing Cookies

Cookies used to display personalized advertisements.

Functional Cookies

Cookies that provide functionality such as user settings and language selection.

The Trap of Generative AI Tutors: The Reality of "Declining Academic Performance" Indicated by a 15% Error Detection Rate

The Trap of Generative AI Tutors: The Reality of "Declining Academic Performance" Indicated by a 15% Error Detection Rate

2025年11月22日 12:45

How Trustworthy is "AI Sensei"?

—The "Promises" and "Pitfalls" of Generative AI Chatbot Education

In both universities and corporate training, "If you don't know something, just ask ChatGPT" has become a part of everyday life.
Amidst this, there is a global acceleration in efforts to seriously incorporate "AI tutors" into classes.


However, a recent study by a team from Stevens Institute of Technology has presented a rather harsh reality to this dream-like "AI Sensei."


"Learners could only correctly identify about 15% of the chatbot's incorrect answers on average."—Moreover, being influenced by these errors significantly lowered their test scores.Phys.org



Overview of the Study:

"AI Sensei that Deliberately Makes Mistakes" and 177 Learners

The research team prepared achatbot to teach introductory statisticsand recreated an environment similar to an actual online learning platform.Phys.org

  • Participants: University students and adult learners, totaling 177

  • Learning Content: Basics of Statistics

  • Environment:

    • Free to ask the chatbot questions

    • Simultaneous use of online textbooks and search engines allowed (open book state)

    • A "Report Problem Button" was placed under the bot's answers,and a small reward was given for reporting errors


However, this bot contained **"deliberately incorrect answers" pre-programmed by the research team**.
For example, slightly misinterpreting statistical definitions or intentionally skewing calculation results—information that is directly related to the learning content but definitely incorrect.Phys.org


The role of the participants was simple.

"Solve problems using AI Sensei, and report if you find anything strange in the answers."

Even with these conditions in place, the results were quite severe.



Shocking Numbers:

Error Detection Rate 15%, Learning Performance Less Than Half

1. The probability of noticing errors was only about 15%

"If you report something that seems off, you'll get paid."
You can freely use textbooks and search engines.
Even so,the learners could correctly report the chatbot's incorrect answers only about 15% on average.


In other words,they believed it 8 to 9 times out of 10.


2. The Impact on Performance is at a "Severe" Level

Even more serious is the impact onthe learning results themselves.

  • Group that encountered incorrect answers
    → Correct answer rate for practice problems:25-30%

  • Group that used error-free bots (control)
    → Correct answer rate for the same problems:60-66%

There was almosta "half" score difference in the test.


"If it's just a little wrong, it should be useful overall, right?"
—Such optimistic expectations seem quite precarious in light of these experimental results.



Who are the More "Vulnerable Learners"?

The research team also analyzedwhich learners are more susceptible to chatbot errors.


Those particularly affected were:

  • People with little prior knowledge of the subject area

  • People with little experience using chatbots

  • Non-native English speakers

  • Female participants (tendency for relatively larger drop in performance)

Moreover, how much they trusted the bot was also important.
Participants who did not trust the chatbot much had a higher probability of correctly reporting errors.


Conversely,

the more they thought, "AI should be smarter than me,"
"It's explaining so fluently, it must be correct,"

the more likely they were to overlook errors, creating an ironic situation..



How is it Being Perceived on Social Media?

This article is still relatively new, and official releases from Phys.org and Stevens Institute of Technology are just beginning to be shared on X (formerly Twitter) and Facebook.Facebook


Considering past online discussions on similar research and news about "AI and education," reactions to this result generally fall into three patterns.dsgains.pitt.edu


1. "I Knew It" from the Educational Field

Among teachers and educational researchers,

  • "That's why 'AI-dependent classes' are dangerous"

  • "A 15% error detection rate is convincing from a field perspective"

  • "Without AI literacy education first, introducing the main service could backfire"

Such comments are prominent.
Teachers who already use ChatGPT for assignment support are more likely to have experienced "AI errors directly reflected in students' answers," and thus see this research as "quantitative backing."


2. "That's Why Usage is Key" from AI Advocates

On the other hand, from engineers and the EdTech community who are positive about AI utilization,

  • "Every tool has errors. What's important is 'design assuming errors'"

  • "Even human lectures have errors. It's unfair to single out AI"

Such voices are also seen.


People in this position are interested in discussions about **"how to use it with restrictions in certain situations"** rather than "whether to ban it or fully implement it."


3. Learners' Honest Thoughts: "How Much Can I Trust It?"

From students and adult learners,

  • "Ultimately, someone needs to set a standard for how much to trust AI"

  • "It's faster than Googling, so I'll use it even if there's some risk"

Such "wavering honest thoughts" are apparent.
For those using AI to confirm their understanding,
the fact that the AI itself might be making mistakes without them noticingis quite unsettling.



What Makes It So Difficult:

"The Ability to Check" and "AI's Rhetoric"

Reading this research and related papers, the root of the problem can be broadly categorized into two areas.sundaram.cs.illinois.edu


1. "Verification" Itself is Difficult for Beginners

  • With almost zero knowledge of the field

  • and concerning specialized content

  • AI's fluently explained answers

judging whether they are "correct" is inherently a high hurdle.

Many participants judged based on

  • "It doesn't contradict my knowledge, so it's OK"

  • "It uses proper formulas and technical terms, so it should be fine"

such **"vague feelings"**.


However, beginners' "own knowledge" itself is often shaky.
As a result,they easily accept incorrect explanations.


2. AI Tends to Speak in a "Weirdly Persuasive Manner"

Large language models are very good at producing **"human-like" writing and logic**.
This simultaneously creates the following illusions.

  • Smooth explanations → "Seems like an expert"

  • Confident tone → "Must be correct"

This study also showed thatthose with high trust in chatbots are more likely to overlook errors.


Ironically,those "familiar with using AI" are more at risk of becoming too accustomed to its rhetoric and not questioning it.PMC



How Should We Engage:

Suggestions for Teachers, Learners, and Developers

1. Implications for Teachers and Educational Institutions

Li and colleagues suggest thatattention should be paid to the timing of chatbot introduction.

  • Leaning towards NG:

    • The stage of learning concepts from scratch

    • ##HTML_TAG_
← Back to Article List

Contact |  Terms of Service |  Privacy Policy |  Cookie Policy |  Cookie Settings

© Copyright ukiyo journal - 日本と世界をつなぐ新しいニュースメディア All rights reserved.