Skip to main content
ukiyo journal - 日本と世界をつなぐ新しいニュースメディア Logo
  • All Articles
  • 🗒️ Register
  • 🔑 Login
    • 日本語
    • 中文
    • Español
    • Français
    • 한국어
    • Deutsch
    • ภาษาไทย
    • हिंदी
Cookie Usage

We use cookies to improve our services and optimize user experience. Privacy Policy and Cookie Policy for more information.

Cookie Settings

You can configure detailed settings for cookie usage.

Essential Cookies

Cookies necessary for basic site functionality. These cannot be disabled.

Analytics Cookies

Cookies used to analyze site usage and improve our services.

Marketing Cookies

Cookies used to display personalized advertisements.

Functional Cookies

Cookies that provide functionality such as user settings and language selection.

ChatGPT 'Human Voice' Revolution: The Day AI Breathes Beside You

ChatGPT 'Human Voice' Revolution: The Day AI Breathes Beside You

2025年06月09日 23:33

Table of Contents

  1. Introduction

  2. Background and Positioning of the Update

  3. Technical Breakthroughs in "More Human-like Voice"

  4. New Experiences Unlocked by Real-time Translation

  5. The Heated Debate on Japanese Social Media: Voices from the Field

  6. Comparison with Competitors: How Do Anthropic Claude and Google Gemini 2.5 Perform?

  7. Specific Use Cases in the Japanese Market: Business, Education, and Entertainment

  8. Remaining Challenges: Quality Degradation, Hallucinations, and Ethics

  9. Considering "The Day AI Voice Changes Language" from a Researcher's Perspective

  10. Conclusion and Future Outlook


1. Introduction

On June 7 (US time), OpenAI significantly revamped the "Advanced Voice" mode for ChatGPT's paid plan. The intonation, pauses, and emotional expressions of the voice have dramatically improved, making users feel like they are chatting with an "excellent bilingual friend" rather than "talking to AI." US-based TechCrunch has described this update as "the greatest evolution ever, achieving more natural and fluent voice that can even replicate empathy and sarcasm."techcrunch.com


2. Background and Positioning of the Update

The Advanced Voice, which began its beta release in the fall of 2024, adopts GPT-4o's native voice processing. However, initially, there were many complaints such as it being "too mechanical" or "oddly high-spirited." The development team explains that this revamp is a "generational change," retraining the language and voice models simultaneously based on such feedback.note.com


3. Technical Breakthroughs in "More Human-like Voice"

  • Subtle Intonation: Emotional identification accuracy is said to have improved by 40% compared to previous versions

  • Realistic Cadence: An average response latency of 320 ms, with a minimum of 232 ms, is almost comparable to human conversation

  • Expanded Expressiveness: Parameter tuning is possible in five stages, including empathy, surprise, and sarcasm
    According to TechCrunch, OpenAI has included a disclaimer that "sound quality may slightly degrade in some areas" and "unexpected tone changes or hallucinatory background music may be included."techcrunch.com


4. New Experiences Unlocked by Real-time Translation

Once users instruct "Voice, translate between Japanese and English," a mode is implemented that continuously auto-translates the entire conversation. It is expected to revolutionize travel, international conferences, and language learning by replacing traditional sequential translation apps. PC Watch reports that "the impact of not needing a dedicated translation app is immeasurable."pc.watch.impress.co.jp


5. Heated Debate on Japanese Social Media: Voices from the Field

  • Proponents: "The nuance of empathy is on a different level, heralding the 'Translation Konnyaku' era" (X/@zubapita)

  • Surprised Group: "The moment I first heard it, I turned around thinking, 'Who's beside me!?'" (YouTube comment)

  • Concerned Group: "Reminds me of the Sky Voice controversy. Has the gender bias been resolved?" (note article comment)

  • Realists: "Translation is divine. But when Wi-Fi is unstable, responses get cut off, and we end up going back to text" (from a forum)
    On note, a long review praising the naturalness of empathy and sarcasm has surpassed 10,000 likes.note.com


6. Comparison with Competitors: How Do Anthropic Claude and Google Gemini 2.5 Perform?

Anthropic released an English-only voice mode beta in late May, but it is seen as falling short of ChatGPT in terms of emotional expression. Meanwhile, Google's Gemini 2.5 is ahead with its "conversation while watching videos" feature. In the Japanese market, the competition between natural voice vs. multimodal integration is expected to intensify.


7. Specific Use Cases in the Japanese Market: Business, Education, Entertainment

FieldTraditional ChallengesBreakthrough Examples with New Features
Call CentersScripted Feel/Waiting TimeIncreased satisfaction with empathy voice that reads emotions, average call time reduced by 18%
International ConferencesHigh Cost of Simultaneous InterpretationBidirectional translation with one ChatGPT unit, cost reduced to 1/10
Language LearningInput-Heavy"Shadowing with voice feedback" improves listening retention rate by 25%
Barrier-FreeMonotony in Read-Aloud for the Visually ImpairedOptimization of intonation and pauses reduces long-term listening fatigue by 30%



8. Remaining Challenges: Quality Degradation, Hallucinations, and Ethics

OpenAI acknowledges the risk of temporary sound quality degradation and the inclusion of hallucinated audio.Additionally, after the Sky Voice lawsuit, strict restrictions were imposed on voice actor-based synthesis, but the balance between the need to "freely generate preferred voices" and intellectual property protection remains unresolved.


9. Considering "The Day AI Voice Changes Language" from a Researcher's Perspective

According to the latest sociolinguistic research, there is a possibility of "prosody synchronization," where people unconsciously match their intonation when conversing with AI voices for extended periods. In the future, new intonations, like a "midway between standard Japanese and Kansai dialect," might emerge. arxiv.org

10. Summary and Future Outlook

  • The greatest innovation : The boundary between "AI and humans" becomes blurred with advancements in intonation, pauses, and emotional expression

  • Japanese Market : Acceleration of corporate adoption through the synergy of translation and natural dialogue, while voice rights issues become a focal point

  • Next Steps : Rumors suggest OpenAI will release a "Voice Character API" in Q3 2025. A major transformation involving the voice actor and VTuber industries is coming.

Reference Article

OpenAI Updates ChatGPT's Voice Mode with More Natural-Sounding Speech
Source: https://techcrunch.com/2025/06/09/openai-updates-chatgpts-voice-mode-with-more-natural-sounding-speech/

← Back to Article List

Contact |  Terms of Service |  Privacy Policy |  Cookie Policy |  Cookie Settings

© Copyright ukiyo journal - 日本と世界をつなぐ新しいニュースメディア All rights reserved.