Skip to main content
ukiyo journal - 日本と世界をつなぐ新しいニュースメディア Logo
  • All Articles
  • 🗒️ Register
  • 🔑 Login
    • 日本語
    • 中文
    • Español
    • Français
    • 한국어
    • Deutsch
    • ภาษาไทย
    • हिंदी
Cookie Usage

We use cookies to improve our services and optimize user experience. Privacy Policy and Cookie Policy for more information.

Cookie Settings

You can configure detailed settings for cookie usage.

Essential Cookies

Cookies necessary for basic site functionality. These cannot be disabled.

Analytics Cookies

Cookies used to analyze site usage and improve our services.

Marketing Cookies

Cookies used to display personalized advertisements.

Functional Cookies

Cookies that provide functionality such as user settings and language selection.

Is AI "Learning" or "Stealing"? — Publishers Join Lawsuit Against Google, Shifting the Frontlines of Copyright

Is AI "Learning" or "Stealing"? — Publishers Join Lawsuit Against Google, Shifting the Frontlines of Copyright

2026年01月17日 16:20

The moment publishers "joined the fray," the battle over AI training entered the next phase

Copyright lawsuits surrounding generative AI have often been framed as "individual creators vs. tech companies." However, on January 15, 2026 (local time), a development emerged that shook this framework. Major publisher Hachette Book Group and educational content giant Cengage Group sought to "intervene" in a class-action lawsuit concerning Google's AI training by filing with the court.


The aim of the intervention is clear: to elevate the lawsuit's focus from "complaints of image and text creators" to "the business foundation of the publishing industry," thereby increasing the scope and weight of the lawsuit. If granted, not only would the risk, including damages, increase, but it would also impact how AI companies design their future data acquisition for training.



What's at issue: The focus is on the handling of "copies for training"

The core of the publishers' argument is the point that "they copied without permission and used it to build AI capabilities." In other words, even before considering whether the text output by generative AI is similar, the act of massively reproducing copyrighted works for training itself constitutes a rights violation.


In this filing, it is claimed that Hachette's general books and Cengage's textbooks were used without permission and repurposed for training Google's large language model (LLM), Gemini. Furthermore, the citing of multiple authors' works as specific examples has fueled the debate. This brings to the forefront not only the "issue of individual creators" but also the "industry stock of publishers and educational content."



Why "publishers now": Three practical reasons

There are at least three practical realities behind why publishers have stepped forward at this timing.


1) Generative AI has begun to engulf distribution through "search and summary"
As the trend of AI summarizing and presenting answers strengthens, the risk of readers not accessing the original sources increases. For publishers, there is concern not only about sales but also about the circulation of returns to authors and editorial investments.


2) The stage has been set for precedent formation over whether "training is fair use"
Multiple lawsuits concerning AI training are ongoing in parallel, and as the points of contention are organized, the "side that moves first" tends to have an advantage in rule formation. Publishers want to avoid "precedents being set without them."


3) Educational content is prone to "substitution"
Textbooks and educational materials are likely to clash with AI's strengths in summarization, example generation, and explanation. For companies like Cengage, leaving the training data issue unaddressed could jeopardize their medium- to long-term business.



How will Google respond? What "no comment" implies

At the time of reporting, Google has not issued an immediate comment on the publishers' intervention request. However, silence does not necessarily equate to weakness. Rather, the more multifaceted the issues, the more likely companies are to adopt a stance of "first organizing in court" as part of their litigation strategy.


The main points of contention are likely to be as follows:

  • Does reproduction for learning constitute copyright infringement?

  • Even if it does, can defenses like fair use be established?

  • How to allocate the burden of proof for "which data was used"

  • How to handle damage assessment (lost profits, statutory damages, unjust enrichment, etc.)


The participation of publishers significantly raises the resolution of these issues. Conversely, from this point on, it becomes a "battle of evidence and procedure" rather than a "clash of sentiments."



Another reference point: The shadow cast by Anthropic's massive settlement

In discussing this dispute, the settlement of Anthropic's copyright lawsuit, which became a topic in 2025, cannot be ignored. The reported massive settlement amount has strengthened the perception among rights holders that "AI training is a winnable theme." It is one of the factors that led publishers to begin seeing a "winning strategy" as a reality.


However, a settlement is merely a settlement and does not immediately determine the illegality of other companies. Nonetheless, the psychological impact on the market is significant. The mere "possibility of having to pay" can change the design of fundraising, contracts, and data acquisition.



Reactions on social media: A clash of justice and realism among "three camps"

Reactions to this news itself are largely divided into three on social media (organized here mainly based on reactions confirmed on social news/comment communities).


① "Pay for it" camp: A "natural demand" to protect the circulation of creation

This group has the highest emotional temperature.
"If a product that generates profit cannot exist without someone else's work, isn't it a derivative?"
"It's illegal if an individual does it, but 'innovation' if a company does it, which is absurd."
Such claims are prominent.


The point is not that they are denying AI, but rather opposing "growth that relies on externalizing costs." If copyrighted works are used as "fuel," then pay for the fuel—a very market-oriented demand.


② "Learning is fair" camp: Like human learning, banning it is a societal loss

On the opposite side, there is strong opposition to categorically labeling "training = illegal."
"Market substitution is limited. AI users seek something different from the original."
"The problem is not the training itself, but the acquisition of pirated copies or the theft of outputs."
This group believes that uniformly restricting training would halt the development of research and industry.


This group also has a strong practical sense that "permission is ideal, but negotiating every case is unrealistic." They emphasize the overall benefit to society and argue that rules should be made with "balance" rather than "prohibition."


③ "In the end, only big companies benefit" camp: Distrust and cynicism

The third group is harsh on both parties involved.
"Copyright tends to be used conveniently by large corporations."
"Have publishers adequately compensated authors?"
"In the end, it becomes a battle of law, lobbying, and capital."
With a cynical perspective, they view the debate as a "structural issue."


Here, the simplification of publishers as justice and tech as evil is rejected. Therefore, the discussion tends to shift to institutional theory, asking, "How can we design a system where authors, editors, readers, and researchers can coexist?"



Future focus: The "compromise" rather than victory will shape the industry

In this type of lawsuit, more than winning or losing, the "compromise" shapes the market. There are three main realistic scenarios:

  1. Expansion of the licensing market: Publishers and rights management organizations enter into comprehensive agreements with AI companies, establishing a system for distributing compensation.

  2. Standardization of "legitimate procurement" of training data: The source, permission, and deletion response of data become basic requirements of the product.

  3. Field-specific delineation: In areas requiring high precision, such as education, healthcare, and law, permission and verification become more stringent.


Whether the publishers' intervention is granted could become a turning point. If granted, it would signal that "the battle over AI training has moved beyond individual creators to 'industry negotiation.'"



Reference URLs

  • Original Reuters article: Primary information on Hachette and Cengage's intervention request, Judge Eumi Lee's decision, claims about Gemini training, etc.

  • Publishers Weekly: To understand the supplementary context from the publishing industry's side (the significance of the publishers' actions).

  • Reuters (another lawsuit over AI search summaries): Reference for the ongoing dispute by publishers over Google's AI features on a different axis (search and summary).

  • Reuters (Anthropic's $1.5B settlement): Reference point for the "monetary impact" and market psychology of lawsuits over AI training.

  • AP News (supplementary information on the Anthropic settlement): General public-oriented summary of the settlement content (distribution design, court's view, etc.).

  • Reference to SNS reactions (Hacker News thread): Confirmation of "raw expressions" of pros and cons regarding AI training and copyright.

  • Reference to comment communities (Search Engine Roundtable): Fragments of reactions from the venue where the same news was shared (surrounding context). 


Reference Articles

Publishers seek to join the lawsuit against Google over AI training
Source: https://www.investing.com/news/stock-market-news/publishers-seek-to-joinlawsuit-against-google-over-ai-training-4450416

← Back to Article List

Contact |  Terms of Service |  Privacy Policy |  Cookie Policy |  Cookie Settings

© Copyright ukiyo journal - 日本と世界をつなぐ新しいニュースメディア All rights reserved.