Sam Altman Discusses Challenges of GPT-5 and the Revival of 4o: Is the Bench the Strongest, but the Experience Incomplete? "Chart Crime" and the Cost of Trust at the GPT-5 Launch Event

2025年08月10日 11:45

1. What Happened? A 48-Hour Timeline

On August 7th (US time), OpenAI announced its new flagship "GPT-5." The company shifted towards an "integrated model" centered on a real-time router that switches between quick response and "thinking" modes in the background. On the following day, CEO Sam Altman and others conducted an AMA on Reddit, explaining that "on the first day, there was a major router failure (sev), and there were times when the automatic switching was down. This led to perceptions that 'GPT-5 became dumb.'" They promised to double the rate limit for Plus and enhance the display of which model is responding. Additionally, they stated they would consider continuing to offer "4o" on Plus in response to strong requests for its return.

2. The Aim of GPT-5—Eliminating the "Burden of Model Selection"

In its official announcement, OpenAI defined GPT-5 as "a single integrated system." The router automatically determines whether to use the lightweight smart model for usual tasks or the "thinking" model (GPT-5 Thinking/Pro) for difficult problems. They also aimed to improve instruction adherence, factuality, and benchmarks in coding, creativity, and health domains. In essence, the core UX is designed so users don't have to think about "which model to choose."OpenAI

This design is welcomed by the majority of users. However, for heavy users who have been utilizing the "personalities of models," it may be perceived as a reduction in choice. The recent uproar highlighted this gap.

3. The "Bring Back 4o" Phenomenon—Experiences on Social Media

Immediately after the announcement, Reddit's r/ChatGPT and other platforms were flooded with posts like "GPT-5's writing is stiff," "the creative vibe has dropped," and "I miss 4o's voice and conversational feel." Calls to "Bring back 4o" and hashtags like "#keep4o" were shared, along with petition sites. On the other hand, there were also reactions stating, "5 is faster and more practical," and "the coding has improved." By August 9-10, threads reporting "4o is back in Plus's 'legacy display' and is being rolled out gradually" gained traction, with multiple users confirming they could switch back.TechRadarReddit

Media outlets have also reported that "4o is coming back," indicating a swift response to user feedback. However, user reports suggest there are time lags in reflecting these changes across regions, plans, and apps.Tom's Guide

4. The Blow of "Chart Crime"—Trust in Visualization

During the announcement event, errors such as mismatches between numbers and bar heights sparked discussions about "chart crime" (statistically egregious charts). Altman acknowledged this as a "huge blunder" on X, and apologies were issued from within the company. Although the charts on the official blog were correct, the visualization errors during the live presentation were damaging as a product that prides itself on factuality. The Verge and others examined specific inconsistencies, fueling online sarcasm.The Verge

This incident leaves two implications. First, how AI vendors "narrate numbers" directly impacts the trust in the product itself. Second, even if the model is smart, the "human visualization and verification process" is separate, and ultimately, humans must safeguard it.

5. Is the "Content" Still Evolving?

From the perspective of developers and researchers, GPT-5 is noted for improvements in instruction adherence, code generation stability, and factuality during long reasoning. However, reviewer Simon Willison shared a "weakness example" where he relied on the model for automatic table alignment, which resulted in errors, prompting him to redo it using Python. This suggests that while it is not "omnipotent," practical operation insights, including how to avoid and compensate for failures, become crucial.OpenAISimon Willison’s Weblog

6. OpenAI's "Moves" and Future Focus

During the AMA, they indicated plans to double the rate limit for Plus, increase transparency in model selection, and consider continuing the provision of 4o. Although the "first blow of experience" due to the router failure at launch was significant, if the switching algorithm improves, the benefits of the integrated UX should come to the forefront. The challenge is how to coexist "diversity of preferences" with "simplicity of integration." Legacy selection and "fine-grained customization of style and personality" will likely be key.

7. Practical Tips for Different Users

Individual Creators: If you prioritize the "temperature" of stories and conversations, compare 4o (if available) and 5 for the time being. Clearly convey styles to GPT-5, such as "in this tone" or "more emotional." In case of failure, use tool integration as a backup, like "create a table → sort it with Python."RedditSimon Willison’s Weblog
Developers: Do not rely solely on the router; explicitly indicate "thinking mode" for critical processes and assume verification processes (type, test, lint). Do not leave visualization entirely to auto-generation; double-check units, axes, and ratios.The Verge
Enterprise Adoption: Initially, anticipate "experience fluctuations" and monitor with SLO rather than SLA. Formalize model selection policies and design displays for users to show "which model is currently in use" and escalation paths (forced long reasoning, human verification).

8. Conclusion—Even in the Age of Integration, the Freedom to Choose is a Strength

GPT-5 has clearly set a direction towards "making it easier for many people to use." However, AI experiences depend as much on "preferences" as on "performance." The recent uproar was not just about malfunctions or visualization errors. It vividly exposed the unavoidable design challenge in the popularization of AI—how to balance "integration" and "individuality."

Reference Articles

Sam Altman Addresses the "Bumpy" GPT-5 Rollout, Bringing Back 4o, and the "Chart Crime"
Source: https://techcrunch.com/2025/08/08/sam-altman-addresses-bumpy-gpt-5-rollout-bringing-4o-back-and-the-chart-crime/

Sam Altman Discusses Challenges of GPT-5 and the Revival of 4o: Is the Bench the Strongest, but the Experience Incomplete? "Chart Crime" and the Cost of Trust at the GPT-5 Launch Event

1. What Happened? A 48-Hour Timeline

2. The Aim of GPT-5—Eliminating the "Burden of Model Selection"

3. The "Bring Back 4o" Phenomenon—Experiences on Social Media

4. The Blow of "Chart Crime"—Trust in Visualization

5. Is the "Content" Still Evolving?

6. OpenAI's "Moves" and Future Focus

7. Practical Tips for Different Users

8. Conclusion—Even in the Age of Integration, the Freedom to Choose is a Strength

Has GPT-5 Become "Gentler"? — Behind the Scenes of Temperature Tuning

The Era of Consulting AI for Politics and Shopping: What’s Happening Behind Persuasive Chatbots

Generating 1 minute of audio in 1 second with 1 GPU: Microsoft's internal AI announcement ─ Will the "heart" of Copilot be self-developed?

Why Does the Accuracy of AI Detection Depend on Humans? The Reason Human-Likeness is Key

The Illusion of "Thinking AI" ─ Where Does AI Hit a Wall? The Core Issues Highlighted by the Apple Paper and CNBC Report

Cookie Usage

1. What Happened? A 48-Hour Timeline

2. The Aim of GPT-5—Eliminating the "Burden of Model Selection"

3. The "Bring Back 4o" Phenomenon—Experiences on Social Media

4. The Blow of "Chart Crime"—Trust in Visualization

5. Is the "Content" Still Evolving?

6. OpenAI's "Moves" and Future Focus

7. Practical Tips for Different Users

8. Conclusion—Even in the Age of Integration, the Freedom to Choose is a Strength

Has GPT-5 Become "Gentler"? — Behind the Scenes of Temperature Tuning

The Era of Consulting AI for Politics and Shopping: What’s Happening Behind Persuasive Chatbots

Generating 1 minute of audio in 1 second with 1 GPU: Microsoft's internal AI announcement ─ Will the "heart" of Copilot be self-developed?

Why Does the Accuracy of AI Detection Depend on Humans? The Reason Human-Likeness is Key

The Illusion of "Thinking AI" ─ Where Does AI Hit a Wall? The Core Issues Highlighted by the Apple Paper and CNBC Report