Skip to main content
ukiyo journal - 日本と世界をつなぐ新しいニュースメディア Logo
  • All Articles
  • 🗒️ Register
  • 🔑 Login
    • 日本語
    • 中文
    • Español
    • Français
    • 한국어
    • Deutsch
    • ภาษาไทย
    • हिंदी
Cookie Usage

We use cookies to improve our services and optimize user experience. Privacy Policy and Cookie Policy for more information.

Cookie Settings

You can configure detailed settings for cookie usage.

Essential Cookies

Cookies necessary for basic site functionality. These cannot be disabled.

Analytics Cookies

Cookies used to analyze site usage and improve our services.

Marketing Cookies

Cookies used to display personalized advertisements.

Functional Cookies

Cookies that provide functionality such as user settings and language selection.

Company X to Prohibit Use of Its Content for AI Training! What Impact Will This Have on Japan?

Company X to Prohibit Use of Its Content for AI Training! What Impact Will This Have on Japan?

2025年06月06日 20:15

Company X Completely Bans Use of Its Content for AI Model Training

――Shock and Reorganization Scenarios in Japan's Generative AI Ecosystem――





1. Introduction──The Sudden "Closing of Doors"

On June 5, 2025, social network X (formerly Twitter) revised its developer terms, completely banning third parties from using posts on X or data obtained via API for "foundation/frontier model training or fine-tuning." TechCrunch first discovered this, and The Verge followed up, causing a stir in the global AI developer community.  



2. Understanding the Changes──"Reverse Engineering and other Restrictions"

The new clause was added as a single line under "Reverse Engineering and other Restrictions," but its impact is significant. Crawling and scraping via the X API are no exceptions, and the terms "for research purposes" or "for non-profit purposes" are explicitly excluded. The previously open "API culture" ensuring data portability has turned into a blockade overnight.



3. Background──Acquisition by xAI and "Grok's" Own Learning Needs

In March 2025, xAI, led by Elon Musk, acquired X for about $33 billion and brought its own LLM called "Grok" to the forefront. Company X continues to use platform data for training its own models, while pivoting to a "walled garden strategy" that closes doors to other companies. This structure aligns with the trend of monopolizing data as a "resource" and aiming for revenue by licensing it at high prices, similar to Reddit and The New York Times.



4. Global Trend──Reddit Lawsuit and the Rise of the "License Business"

In May 2025, Reddit sued Anthropic for "over 100,000 crawl attempts." While monetizing data and entering into a $200 million-level license agreement with Google, Reddit showed a tough stance against unauthorized use. X's actions accelerate this global trend of "content enclosure."



5. Severe "Data Famine" Facing Japanese AI Development Companies

The performance of large language models (LLMs) depends on the volume and diversity of data. Japanese social media data, which includes slang, dialects, and domestic topics, is essential for training Japanese language models. However, the terms of use of major domestic SNS are being revised to "AI learning not allowed," and the cost and legal risks of obtaining data will soar. As a result,


  • compared to large overseas companiestraining costs will be higher

  • there is a possibility of falling behind in model performance

  • opportunities for innovation by ventures will shrink



this triple burden looms.



6. Alternative Sources of Data—Public Corpora and In-house Data

As practical solutions to circumvent restrictions, options include: ① public corpora from the National Institute for Japanese Language and Linguistics, ② paid contracts with newspapers and broadcasters, ③ refining proprietary data such as in-house chat logs and FAQs, and ④ generating synthetic data. However, public corpora have diverse licenses, and even when complying with **Article 30-4 of the Copyright Act (provisions for information analysis)**, it is necessary to individually confirm secondary use clauses.



7. Current Legal Landscape—The Boundary Between Copyright Law and robots.txt

In Japan, the 2018 amendment to the Copyright Act made "reproduction for information analysis purposes" subject to rights restrictions, but whether "commercial LLM training" falls under this remains a gray area. Additionally, the Newspaper Association issued a statement on June 4, 2025, asserting that "the intention to refuse AI training indicated by robots.txt should be respected,"and clearly stated that ignoring such indications is unjust.



8. Are Individual Posts Protected by "Opt-out"?

X provides an opt-out option in user settings to "refuse learning by Grok," but the current terms are a blanket ban against "third parties,"and it should be noted that posts are still used for X's own learning.



9. Strategic Responses of Companies and Research Institutions

  1. Early Initiation of Data License Negotiations

  2. Inventory of Legal Risks for Contracted Datasets

  3. Implementation of Transparency in Generative AI (Source Traceability)

  4. Synthetic Data and High-quality Small-scale Learning "Small Data Strategy"



These are short-term measures, and in the long term,a cross-industry foundation for collaboratively maintaining Japanese open datais required.



10. Impact on Startups—Changes in Funding and Evaluation

VCs have traditionally emphasized "technological superiority = model performance," but going forward, **"how much data secured through legitimate licenses is held"** will be key to corporate value. Japanese startups need to incorporate data strategies into their pitches early and revise business plans to account for rising capital costs.



11. Dilemma in Academic Research—Open Science and Intellectual Property Protection

Universities and public research institutions are fundamentally in a position to disclose their findings, but when models are trained using corporate data,there is a risk that disclosing model parameters could violate licenses.It is essential to sign an MOU with data-providing companies and clearly define the rules distinguishing between "publicly available parts" and "non-public parts."



12. The Temperature Gap with Overseas Platforms──"Open vs. Closed"

Meta extensively uses CC-licensed web data for Llama 3, while YouTube has yet to clearly state AI learning restrictions. In the U.S., the concept of **"fair use"** serves as a certain shield, whereas in the EU, the AI Act is set to be enforced in 2026, imposing transparency obligations. The closure of X symbolizes the arrival of an era where "even in the U.S., data is not free,"and the cross-border data governance waris intensifying.



13. The Japanese Government's Position and Policy Recommendations

The Ministry of Economy, Trade and Industry (METI) includes "respect for data providers' intentions" in its draft "Guidelines for the Utilization of Generative AI," while also aiming to secure AI industry competitiveness. Moving forward,


  • machine-readability and free secondary use of public data

  • development of shared clouds/data lakes by universities and public research institutions

  • subsidies for data acquisition for SMEs and startups



are the three key points.



14. "Unique Data" as a Competitive Advantage──A New Value Chain

As platformers enclose data,the value of "undiscovered data" such as operational logs, supply chain data, and customer chats hidden within companies skyrockets. Japanese companies have the opportunity to refine data that is difficult for overseas entities to access due to language and business practice barriers, and to differentiate themselves globally by leveraging "niche but deep expertise."



15. Conclusion──"Data Quality and Access" Determines AI Competitiveness

The revision of X's terms of use may seem like a mere policy change at first glance, but it actually marks the beginning of a new chapter in the "data acquisition war" that fundamentally shakes the power balance of the generative AI industry. Japanese AI developers, companies, and policymakers must


  1. diversify data procurement and manage legal risks

  2. jointly build open data infrastructure

  3. differentiate through the creation of unique data



These three pillars must be urgently established, or else competitiveness in the global market may be lost. Conversely, companies that overcome this crisis and achieve **"high-quality unique data × highly efficient models"** will be the winners in the next era of generative AI.




TechCrunch

  • The Verge

  • Reuters

  • Japan Newspaper Publishers & Editors Association

  • Digital Agency

Reference Article

changes its terms to bar the training of AI models using its content
Source: https://techcrunch.com/2025/06/05/x-changes-its-terms-to-bar-training-of-ai-models-using-its-content/

← Back to Article List

Contact |  Terms of Service |  Privacy Policy |  Cookie Policy |  Cookie Settings

© Copyright ukiyo journal - 日本と世界をつなぐ新しいニュースメディア All rights reserved.