The Illusion of "Thinking AI" ─ Where Does AI Hit a Wall? The Core Issues Highlighted by the Apple Paper and CNBC Report

2025年06月27日 00:50

1. Introduction: Growing Unease Behind the Enthusiasm

The keyword heralding the second act of the generative AI boom was "Reasoning." When ChatGPT and Google Gemini introduced Chain-of-Thought, which seemed to "think out loud," the backstage of product launches was filled with applause, claiming they had come closer to human-like thinking. However, on June 26, a question posed by CNBC poured cold water on the celebratory mood.――"Have models that demonstrate thinking truly become smarter?" It took almost no time for this question to stir up a storm.

2. The Framework of the CNBC Article: The Critical Point of Accuracy Collapse

According to CNBC's digital edition and TV program TechCheck, multiple independent studies have reproduced the phenomenon where "the accuracy rate plummets sharply when models perform step-by-step reasoning beyond a certain level of complexity." The symbol of this is Apple's paper The Illusion of Thinking. The research team compared over 20 state-of-the-art LLMs across three domains: mathematics, programming, and unknown concepts, presenting a graph showing that models generating thought traces experience a vertical drop in performance beyond the threshold of "critical complexity." These models make the blunder of "thinking but missing," producing lengthy reasoning but incorrect final answers.theverge.com

CNBC named this behavior "accuracy collapse," reporting that investors have begun questioning whether it is worth incorporating costly reasoning models.

3. What Are Reasoning Models?—Convenient "Long Monologues"

While the standard mode of LLMs is "next token prediction," reasoning models are marketed for "displaying thoughts step-by-step." They lay out formulas, call functions, and enumerate knowledge, leaving a human-readable process that has been praised for improving debuggability and accountability. However, Apple's paper asserts, "what is visible is not thought but merely traces of a 'self-association game.'" Even if seemingly logical, when complexity exceeds a threshold, models suddenly shorten their thinking, sometimes outputting a shortcut equivalent to "I don’t know."itpro.com

4. Storm of Rebuttals: Anthropic and Meta's Counterattack

In response to Apple's provocative conclusion, Anthropic immediately rebutted, saying "the benchmark is wrong," and conducted a re-examination in collaboration with Open Philanthropy. They claimed the results showed that "the losses were merely due to format specifications and timeouts," emphasizing that **"thinking models are alive and well."** Amid reports of its failed acquisition of Safe Superintelligence, Meta explained the delay of its in-house developed model Behemoth as being "for accuracy adjustments," while commenting in an additional CNBC interview that "thinking types are essential in the long run."rcrwireless.com

5. Social Media Reaction: The #ReasoningGate Scandal

On X (formerly Twitter), within 24 hours of the article's publication, <#ReasoningGate> became a trending topic. Comments like "Models are not 'Thinking Out Loud' but 'Guessing Out Loud'" and "Were we fooled by AI's monologues?" were rampant. Particularly, Venture Capitalist @AIThesis posted "Transparency does not necessarily mean correctness. Investments should discern the substance," garnering 23,000 likes. The official media account @CNBC also posted a short video with the headline "Why 'thinking' models may not actually be smarter," which surpassed 145,000 views.

Meanwhile, on Reddit /r/ArtificialIntelligence, the thread <Are current AI models really reasoning, or just predicting the next token?> resurfaced, leading to a heated debate between "It's just a lengthened autocomplete" vs. "Association is also a form of thinking." The thread recorded over 1,200 comments within 48 hours.reddit.com

6. Technical Issues: Overthinking and Critical Complexity

An analysis by IEEE Spectrum reports that "the more reasoning models 'overthink,' the lower their success rate becomes." Increasing thought steps does not necessarily lead to improved accuracy and may instead invite an "overthinking penalty." Furthermore, research from NUS and Johns Hopkins points out that "due to the lack of a human-like working memory mechanism, they cannot retain intermediate paths and end up in self-contradiction."spectrum.ieee.orgarxiv.org

7. Industry Impact: Reorganization of Funding and Roadmaps

On the investor side, it has been estimated that the additional cost of computing reasoning chains corresponds to "token unit price +30-50%," shaking price competitiveness. Some VCs have begun to reorganize their portfolios, suggesting that "instead of blindly increasing the number of models, it's more resilient to risks to connect existing LLMs at the API level." Models like DeepSeek, which are low-cost and high-efficiency, are gaining recognition, and Google's and Nvidia's massive GPU investments are once again being questioned.reuters.com

8. Alternative Routes: Hybrid and Neural-Symbolic

Emerging as a key to problem-solving are Neural-Symbolic AI and Modular Agents. This approach combines the strengths of "symbolic rules" and "deep learning," clearly delineating the 'thinking' part. Proposed by Gary Marcus and others, this approach is said to "guarantee causal reasoning at the circuit level," and has seen a surge in startup funding post-ReasoningGate.en.wikipedia.org

9. What is "Thinking"?—A Philosophical Perspective

John Mark Bishop, as early as 2020, in a paper titled "AI is stupid and causal reasoning won’t fix it," asserted that "computation is not understanding." The current uproar updates this warning to the 2025 version. The gap between human "meaning understanding" and LLM's "statistical association" remains unbridged.arxiv.org

10. Future Outlook: Beyond the Dichotomy of Transparency vs. Validity

In the short term, (1) redesigning benchmarks, (2) automated verification tools for thought traces, and (3) hardware optimization are urgent. Meanwhile, as a long-term vision, the design philosophy itself will likely be reconsidered with ideas like "incorporating working memory in AI," "native implementation of causal modeling," and "error correction through multi-agent collaboration."

Reference Articles

The AI Reasoning Problem—Why "Thinking" Models May Not Actually Be Smarter
Source: https://www.cnbc.com/2025/06/26/ai-reasoning-models-problem.html

← Back to Article List

Cookie Usage