Skip to main content
ukiyo journal - 日本と世界をつなぐ新しいニュースメディア Logo
  • All Articles
  • 🗒️ Register
  • 🔑 Login
    • 日本語
    • 中文
    • Español
    • Français
    • 한국어
    • Deutsch
    • ภาษาไทย
    • हिंदी
Cookie Usage

We use cookies to improve our services and optimize user experience. Privacy Policy and Cookie Policy for more information.

Cookie Settings

You can configure detailed settings for cookie usage.

Essential Cookies

Cookies necessary for basic site functionality. These cannot be disabled.

Analytics Cookies

Cookies used to analyze site usage and improve our services.

Marketing Cookies

Cookies used to display personalized advertisements.

Functional Cookies

Cookies that provide functionality such as user settings and language selection.

Is ChatGPT-5 Really Disappointing? ── A Comprehensive Review of What the Previous Model Couldn't Do and Where Other Companies Still Excel

Is ChatGPT-5 Really Disappointing? ── A Comprehensive Review of What the Previous Model Couldn't Do and Where Other Companies Still Excel

2025年08月12日 23:11

1. First, Organizing the "Disappointment" Argument

  • From the moment of its announcement, there was a mix of opinions on social media and in the media. Opinions such as "users had overly high expectations" and "practicality has improved but it's not revolutionary" were prevalent. Axios pointed out reports of errors in math and geography, dissatisfaction with delays, and the gap with the "Ph.D. level" statement. Axios

  • This atmosphere is also due to the fact that OpenAI's shift towards model integration and a focus on safety and practicality was out of sync with the audience expecting a spectacular "leap." OpenAI



2. What GPT-5 Can Do That Previous Models Couldn't

2-1. Integrated System: Automatically Optimizing "Amount of Thinking" and Pathways

GPT-5 is an integration of a lightweight response model + deep reasoning model (GPT-5 Thinking/Pro) + real-time router. Depending on user instructions and the difficulty of the task, it variably switches from fast responses to deep thinking. This makes the experience of "quick tasks are fast, difficult problems are deeply thought out" the default. OpenAI



2-2. Significant Enhancements in Coding and Agent-like Tasks

In the developer announcement, SWE-bench Verified 74.9% and strength in long tool chains (including parallel) were demonstrated, and new parameters such as verbosity and reasoning_effort, which control the length and amount of thought in responses, and **"custom tools callable in plain text"** were also added. The ability to "finish the job" in practical work has improved. OpenAI



2-3. Chat Experience: Personality Presets and Voice Evolution

Fortune introduced customizable "personality" presets such as Cynic, Robot, Listener, Nerd and enhanced voice experience. Tone adjustments have become easier, making it easier to switch conversation styles according to the purpose. Fortune



2-4. Expansion in Practical Areas (Enterprise Use)

OpenAI emphasizes improvements in accuracy, speed, and reasoning in major tasks such as writing, creating, and researching. With automation and collaboration in mind for enterprise workflows, they are advocating for a **"new era of work."** OpenAI



3. Why the Discontent? ── The Gap in Initial Reactions

  • The point that it seemed like a **"large minor update"** compared to the expectations of dramatic leaps

    .
  • Initial confusion and reports of issues regarding router behavior and some inaccuracies..

  • A user base sensitive to differences in **"temperature" and "empathy"** compared to older models (like 4o)..
    .
    .Axios



4. Points Where Other Companies Are Still Superior (By Application)

4-1. Deep Thinking Handled by Users: Anthropic Claude

  • Extended Thinking can be turned ON/OFF, and developers can also set a **"thinking budget."..Anthropic+1

  • Furthermore, memory updates allowing cross-search and reference of past dialogues are progressing (prioritized for Max/Team/Enterprise). Convenient for resuming long-term projects. The Verge

How to Differentiate Use:

  • In situations like math, science, and design reviews, where you want to intentionally increase "thinking time" to pursue accuracy.

  • Teams prioritizing safety and policy compliance..Anthropic



4-2. Research, Integration, Long Context: Google Gemini

  • 2.0 Pro/Flash/Flash-Lite clearly differentiates between speed, cost, and capability.......blog.google

  • Deep Research and Canvas (workspace with code generation and preview), and the "thinking" enhancement of **2.5 Pro (experimental)** are also being developed...Geminiblog.google

How to Differentiate Use:

  • Research, planning, and documentation utilizing Google app integration

    (YouTube/Maps/Drive, etc.).
  • Mass document analysis and long-term project management.



4-3. Self-Hosting/Custom Freedom: Meta Llama (Open Systems)

  • Llama 3.1 (up to 405B) is reported as "the most promising in open systems," and subsequently, Llama 3.2 expands vision support and edge optimization...The VergeAI Meta+1

How to Differentiate Use:

  • Operation in on-premises/specific regulatory environments, focusing on fine adjustments and optimization of inference costs.

  • Real-time processing on mobile/edge.



5. Conclusion ── "Disappointment" or "Steady Evolution"

  • GPT-5 enhances "smoothness in practical work" with integrated intelligence operation (amount of thinking, routing). The foundational strength in coding, agents, and instruction following has certainly grown. OpenAI+1

  • However, expecting a "dramatic leap" might lead to disappointment.......AnthropicGeminiThe Verge

  • Conclusion: GPT-5 has matured into a "tool to reliably advance daily work" as a large minor update.....



6. Quick Reference Guide for Differentiation (Key Points)

  • Enhancing Accuracy with Deep Thinking: Claude (Extended Thinking, Thinking Budget) Anthropic

← Back to Article List

Contact |  Terms of Service |  Privacy Policy |  Cookie Policy |  Cookie Settings

© Copyright ukiyo journal - 日本と世界をつなぐ新しいニュースメディア All rights reserved.