The Trap of Generative AI Tutors: The Reality of "Declining Academic Performance" Indicated by a 15% Error Detection Rate

2025年11月22日 12:45

How Trustworthy is "AI Sensei"?

—The "Promises" and "Pitfalls" of Generative AI Chatbot Education

In both universities and corporate training, "If you don't know something, just ask ChatGPT" has become a part of everyday life.
Amidst this, there is a global acceleration in efforts to seriously incorporate "AI tutors" into classes.

However, a recent study by a team from Stevens Institute of Technology has presented a rather harsh reality to this dream-like "AI Sensei."

"Learners could only correctly identify about 15% of the chatbot's incorrect answers on average."—Moreover, being influenced by these errors significantly lowered their test scores.Phys.org

Overview of the Study:

"AI Sensei that Deliberately Makes Mistakes" and 177 Learners

The research team prepared achatbot to teach introductory statisticsand recreated an environment similar to an actual online learning platform.Phys.org

Participants: University students and adult learners, totaling 177
Learning Content: Basics of Statistics
Environment:
- Free to ask the chatbot questions
- Simultaneous use of online textbooks and search engines allowed (open book state)
- A "Report Problem Button" was placed under the bot's answers,and a small reward was given for reporting errors

However, this bot contained **"deliberately incorrect answers" pre-programmed by the research team**.
For example, slightly misinterpreting statistical definitions or intentionally skewing calculation results—information that is directly related to the learning content but definitely incorrect.Phys.org

The role of the participants was simple.

"Solve problems using AI Sensei, and report if you find anything strange in the answers."

Even with these conditions in place, the results were quite severe.

Shocking Numbers:

Error Detection Rate 15%, Learning Performance Less Than Half

1. The probability of noticing errors was only about 15%

"If you report something that seems off, you'll get paid."
You can freely use textbooks and search engines.
Even so,the learners could correctly report the chatbot's incorrect answers only about 15% on average.

In other words,they believed it 8 to 9 times out of 10.

2. The Impact on Performance is at a "Severe" Level

Even more serious is the impact onthe learning results themselves.

Group that encountered incorrect answers
→ Correct answer rate for practice problems:25-30%
Group that used error-free bots (control)
→ Correct answer rate for the same problems:60-66%

There was almosta "half" score difference in the test.

"If it's just a little wrong, it should be useful overall, right?"
—Such optimistic expectations seem quite precarious in light of these experimental results.

Who are the More "Vulnerable Learners"?

The research team also analyzedwhich learners are more susceptible to chatbot errors.

Those particularly affected were:

People with little prior knowledge of the subject area
People with little experience using chatbots
Non-native English speakers
Female participants (tendency for relatively larger drop in performance)

Moreover, how much they trusted the bot was also important.
Participants who did not trust the chatbot much had a higher probability of correctly reporting errors.

Conversely,

the more they thought, "AI should be smarter than me,"
"It's explaining so fluently, it must be correct,"

the more likely they were to overlook errors, creating an ironic situation..

How is it Being Perceived on Social Media?

This article is still relatively new, and official releases from Phys.org and Stevens Institute of Technology are just beginning to be shared on X (formerly Twitter) and Facebook.Facebook

Considering past online discussions on similar research and news about "AI and education," reactions to this result generally fall into three patterns.dsgains.pitt.edu

1. "I Knew It" from the Educational Field

Among teachers and educational researchers,

"That's why 'AI-dependent classes' are dangerous"
"A 15% error detection rate is convincing from a field perspective"
"Without AI literacy education first, introducing the main service could backfire"

Such comments are prominent.
Teachers who already use ChatGPT for assignment support are more likely to have experienced "AI errors directly reflected in students' answers," and thus see this research as "quantitative backing."

2. "That's Why Usage is Key" from AI Advocates

On the other hand, from engineers and the EdTech community who are positive about AI utilization,

"Every tool has errors. What's important is 'design assuming errors'"
"Even human lectures have errors. It's unfair to single out AI"

Such voices are also seen.

People in this position are interested in discussions about **"how to use it with restrictions in certain situations"** rather than "whether to ban it or fully implement it."

3. Learners' Honest Thoughts: "How Much Can I Trust It?"

From students and adult learners,

"Ultimately, someone needs to set a standard for how much to trust AI"
"It's faster than Googling, so I'll use it even if there's some risk"

Such "wavering honest thoughts" are apparent.
For those using AI to confirm their understanding,
the fact that the AI itself might be making mistakes without them noticingis quite unsettling.

What Makes It So Difficult:

"The Ability to Check" and "AI's Rhetoric"

Reading this research and related papers, the root of the problem can be broadly categorized into two areas.sundaram.cs.illinois.edu

1. "Verification" Itself is Difficult for Beginners

With almost zero knowledge of the field
and concerning specialized content
AI's fluently explained answers

judging whether they are "correct" is inherently a high hurdle.

Many participants judged based on

"It doesn't contradict my knowledge, so it's OK"
"It uses proper formulas and technical terms, so it should be fine"

such **"vague feelings"**.

However, beginners' "own knowledge" itself is often shaky.
As a result,they easily accept incorrect explanations.

2. AI Tends to Speak in a "Weirdly Persuasive Manner"

Large language models are very good at producing **"human-like" writing and logic**.
This simultaneously creates the following illusions.

Smooth explanations → "Seems like an expert"
Confident tone → "Must be correct"

This study also showed thatthose with high trust in chatbots are more likely to overlook errors.

Ironically,those "familiar with using AI" are more at risk of becoming too accustomed to its rhetoric and not questioning it.PMC

How Should We Engage:

Suggestions for Teachers, Learners, and Developers

1. Implications for Teachers and Educational Institutions

Li and colleagues suggest thatattention should be paid to the timing of chatbot introduction.

Leaning towards NG:
- The stage of learning concepts from scratch

← Back to Article List

Cookie Usage