Will Cultural Evolution Halt in the AI Era? The "Homogenization" Progressing Behind Convenience — A Future Where Only AI-Friendly Expressions Remain

Will Cultural Evolution Halt in the AI Era? The "Homogenization" Progressing Behind Convenience — A Future Where Only AI-Friendly Expressions Remain

As generative AI spreads, the concern that "the internet will be filled with AI-generated content, eventually leading to AI learning from AI and deteriorating" has been voiced repeatedly. However, the latest research introduced by Phys.org shortcuts that "future worry" by a step. The conclusion is this: Cultural homogenization can already occur without waiting for relearning. Moreover, the cause lies more in the repetition of operation itself than in whether the model is smart or not .


1) The experiment was a "visual telephone game" — tried between AIs

What the researchers conducted was not a bizarre prompt battle. The setup is simple, and that's why it's scary.
They connected an AI that creates images from text and an AI that converts images into text (captions) , repeating the following cycle.

  • Text (prompt) → Image generation

  • Generated image → Captioning (description)

  • That description → Becomes the next image generation prompt

  • This is repeated dozens or hundreds of times


The point here is that

the AI was not additionally trained . It's not a story of feeding new data and "the quality of learning deteriorating," but rather just by repeating the usage, the output is drawn towards a similar direction .


In reality, no matter how diverse the starting prompts were, or how the randomness settings of generation were changed, the final output converged into **a very limited "safe appearance"**. The name given by the researchers is excellent; they call this endpoint

"visual elevator music" . It's well-arranged but lacks substance, comfortable yet leaves nothing behind —it's the epitome of "seemingness."


2) Why does it converge? — AI only retains "what's easy to explain"

This phenomenon doesn't end with "AI's creativity is low." It's more structural.

In the transformation from image to text to image,

elements that are easy to verbalize
survive, while elements that are hard to verbalize tend to be trimmed away.

For example, the tension of a political drama, irony, the scent of the historical background, the "gaps" in character relationships, the meaning of glances, the atmosphere of the scene. These are "hard to explain" at the captioning stage, so they inevitably get replaced by general nouns and safe adjectives.

Then, in the next generation,

typical images

that match those "easy-to-explain word groups" are selected. The more this is repeated, the original intent that was supposed to be at the core of the work evaporates, leaving only
an average scene that doesn't resonate with anyone .

Interestingly, this occurs before the so-called "mode collapse" or "training data contamination." In other words, before future AI learns from AI-generated content, the

"generation → summarization → regeneration" cultural pipeline
naturally works towards diluting meaning.

3) Is the research "extreme"? — But in reality, loops are more everyday

You might think, "Such an infinite loop between AIs wouldn't happen in reality." But around us, loops in different forms have become commonplace.

Images are summarized into short descriptions (captions/alt text)

Articles are summarized into bullet points
  • Summaries are repackaged into thumbnails or short videos

  • "Strong types" remain in rankings, and similar types are mimicked

  • The process of humans "just choosing" from AI-generated candidates increases
  • What is being repeated here is passing the value of a work through "multi-layered translation" and "selection." Translation and selection are necessary, but the more they are repeated, the more likely it is that

    typical, safe, explainable, commercially safe
  • things remain. The research can be said to have miniaturized that pressure.
4) Response to "Cultural stagnation is moral panic"


There have always been voices saying "culture will be destroyed" with the advent of new technologies. It's true that photography didn't kill painting, movies didn't kill theater, and digital gave birth to new expressions.

But the focus this time is not on "new tools emerging." It's on the point that culture is being reprocessed repeatedly on a global scale through similar generation, summarization, recommendation, and ranking devices

.


Past technologies increased means of expression. But now, not only do means of expression increase, but

meaning is compressed and regenerated multiple times "in the middle of distribution"

. Moreover, that compression tends to unintentionally head towards "average." Therefore, this is not just a nostalgic anxiety, but

a design issue of the modern information ecosystem

.

5) "Mass production = innovation" is not the case


Generative AI can produce a large number of variations. However, what the research points out is the fact that

an explosion of quantity does not equal an expansion of the exploration space

. No matter how many are produced, if it's only a slight difference within the same "narrow valley," the cultural landscape doesn't expand. Rather, the average valley only deepens.

6) So what to do? — Design against the "gravity towards average"


There's no need to end in pessimism. What the research shows is that "if left alone, it will converge," not "it will inevitably stagnate." The direction for countermeasures is visible.

Reward deviation

: Introduce an evaluation axis that prioritizes unusual compositions, vocabulary, and materials (a "counter-safe score," so to speak)
Distribution that doesn't discard context

: Attach production intent, references, and background notes to works to supplement information lost in summarization


Design the selection process

: Define "what to break/what to avoid" before generating from candidates

Diversity constraints in recommendations
    : Impose a lower limit on diversity in exposure to prevent the same types from continuing to line up
  • In other words, the more conveniently AI is run as an "average generator," the more the average is reinforced. Conversely, it's necessary to create a circuit where deviating from the average becomes beneficial

    .
  • 7) Reactions on SNS — Divided into "I get it," "It's all about data," and "Humans are needed"

  •  

  • When this topic spread on SNS, reactions were generally divided into three temperature zones.

(1) The "I get it, from experience" group


Many posts showed strong agreement with the metaphor "visual elevator music." The more people have interacted with generative AI, the more they express the "thinness" with phrases like "it's well-arranged, but feels somewhat ignored" or "leaves an eerie déjà vu." It was a moment when the experimental results connected with users' tactile experiences.

(2) The "It's all about data and bias" group

There is also a strong reaction viewing the convergence of 12 motifs shown by the research as a reflection not only of "the nature of the model" but also of "the bias in photos taken by humans (what tends to be photographed)." In other words, the view that AI is not creating from scratch but
amplifying human society's "photography habits" and "best-sellers"

. This includes a reverse reflection of "wasn't human cultural consumption already homogeneous before blaming AI?"

Some perceive that the more autonomously AI operates, the more the value of

human editing, aesthetic judgment, and contextualization

increases. The discussion tends to progress towards "it's not okay because humans choose in the end," but rather "a design is needed so that meaning doesn't disappear along the way, allowing for a final choice."


Overall, the atmosphere on SNS seemed to converge not on AI praise or AI rejection, but on

where humans take the helm, assuming the "force of returning to the average"

.

8) Conclusion — Flattening culture is about "how it's run," not performance

The scariest point of this research is that no malice or conspiracy is needed. Even without waiting for relearning contamination, just by conveniently and repeatedly using it, meaning becomes diluted.

Therefore, the question should not only be "Is AI creative?" but also

whether we are building distribution circuits that erode creativity under the name of convenience
.

"Seemingness" becomes an anesthetic. Comfort lulls the power to doubt. The reason the term "visual elevator music" resonates is that our current timelines sometimes flow like background music. To prevent culture from becoming average, what might be needed is a design philosophy that re-transports meaning rather than a speed race in generation.

Source URL

The article text on which this article is based. The positioning of the research (homogenization can occur even before relearning, AI-mediated pipelines draw culture towards "familiarity"), etc.

https://phys.org/news/2026-01-ai-cultural-stagnation-longer-speculation.html


Patterns (ScienceDirect article page): Primary information such as experimental setup and main results (SDXL and LLaVA loop, 700 trajectories, temperature settings, 100 iterations, 12 motifs, the term "visual elevator music," implications), etc.

https://www.sciencedirect.com/science/article/pii/S2666389925002995


Tech Xplore (Cell Press introduction article): News-oriented organization of research content ("visual telephone" experiment, 12 themes, example prompts, long-term loop behavior, researcher comments), etc.


https://techxplore.com/news/2025-12-generative-ais-game-visual.html


Gizmodo (general explanation of the research): A summary for the general public, including 100 rounds, 12 motifs, metaphors like hotel paintings, etc.

https://gizmodo.com/ai-image-generators-default-to-the-same-12-photo-styles-study-finds-2000702012


Reddit (r/MediaSynthesis thread): Referenced as an example of SNS reactions (a one-word comment "Pinterest before the AI filter").
  • https://www.reddit.com/r/MediaSynthesis/comments/1pya6n3/autonomous_languageimage_generation_loops/


    X (examples of reaction posts): Examples of how it was received on SNS (empathy with metaphors, discomfort as a tactile experience, questions about model differences, etc.). *Note: X text may not be retrievable, so referenced based on search result snippets.
  • https://x.com/smek2/status/2010402156407623871


    https://x.com/h_okumura/status/2002884726273843286

  • https://x.com/Kyukimasa/status/2002549485294268789

  • https://x.com/DanielPAldrich/status/2003652984832667839