Have We Discovered Why Zoom is Exhausting? Conversations Aren't Just "Exchanging Words" — They Are a Collaborative Dance of the Brain and Body

Have We Discovered Why Zoom is Exhausting? Conversations Aren't Just "Exchanging Words" — They Are a Collaborative Dance of the Brain and Body

Conversations Are Not Made of "Words" Alone

Have you ever experienced talking to a friend about recent events and changing the way you say things, adding explanations, or rushing to the conclusion midway? The other person's eyebrow raises slightly. Their gaze shifts momentarily. They lean in. By observing such subtle changes, we unconsciously alter the shape of our conversation.


This nature of "changing midway" is neither a flaw nor a bonus of conversation. Rather, it is the essence of conversation itself. A comprehensive review emphasizing this perspective has been compiled by researchers in psycholinguistics. The assertion is clear: conversation is not a linear exchange of words but a "cooperative activity" where two (or more) people move simultaneously, read each other, and continuously adjust.


"Speaking" is a Full-Body Activity: Gaze, Facial Expressions, and Posture Convey Meaning

In face-to-face conversations, voice is only part of the information. Gestures, pointing, eye movements, facial expressions, posture, and slight vocal interjections (like "uh-huh" or "really") constantly flow as "signals" that establish conversation.


The important point is that these are not mere embellishments but actually influence the progression of the conversation itself. The listener is not passive. Through the speed and amount of nodding, the way they make eye contact, and the timing of their interjections, they convey to the speaker whether to "continue as is," "elaborate on that," or "that might be incorrect." The speaker then changes their phrasing, inserts analogies, or conversely, wraps up the conversation.


In other words, understanding and correction occur simultaneously during the conversation, not after the speech ends. When a conversation flows smoothly, a highly dense feedback loop is at work.


It's Not Your Fault That Online Meetings Are Exhausting

Here, the "difficulty of online conversations" that many people experience can be seen from a different angle. There is lag (delay). It's hard to make eye contact. Interjections overlap. It's difficult to read the mouth or subtle facial expressions. Meetings with only audio make it even more so.


If face-to-face conversation is established through "immediate feedback," then in an environment where that circuit is degraded, conversation naturally becomes awkward. With fewer clues to gauge the other person's level of understanding, the speaker adds unnecessary explanations, the listener loses the timing to interject, and as a result, everyone ends up "trying harder." Online fatigue is a structural issue that is difficult to solve with enthusiasm or personality.


"One System" Rather Than "Two Brains"

In traditional psycholinguistics, speaking (production) and listening (comprehension) have often been studied as separate tasks. Of course, it's true that this separation has yielded results. However, in real conversations, production and comprehension are intertwined, proceeding simultaneously while anticipating each other's moves.


The speaker anticipates the listener's reactions. The listener prepares their response even before the speaker finishes talking. They don't "process in turns" but run parallel, finding and correcting discrepancies. The keyword here is "joint management." Meaning doesn't reside as a finished product within words but emerges through interaction.


From this perspective, conversation is more akin to an ensemble or dance than information transmission. Tempo, timing, breathing, and reading the next move of the other person—such elements penetrate even the formation of linguistic meaning.


The Concept of "Multimodal Language" Changes Research Design

The review emphasizes viewing conversation as "multimodal" (multiple modes). It broadens the narrow definition of language as sound or text to include body movements and visual information as important components of language processing.


This perspective also affects research methods. For example, tasks like showing words in a lab to measure reaction time or playing recorded audio to measure comprehension are easy to control but cannot fully replicate the "prediction and adjustment" of face-to-face conversation. The message is that future research design should be based on situations where multiple people are actually interacting and multiple signals are flying simultaneously.


Implications for Conversational AI and Assistive Technologies: Conversation Is More Than "Turns"

In recent years, conversational AI has become increasingly natural. However, many dialogue systems are still designed around "turn-taking" as the basic unit. The user speaks, and the AI responds. This is the basic operation.


But if human conversation is a "collaborative work that is continuously fine-tuned," then to truly aim for natural conversational support, it is necessary to include areas beyond turns—such as handling interjections, silences, non-verbal feedback like eye contact and nodding, allowing interruptions, and managing multi-person conversations.


For example, just returning a signal of "I understand" while the user is explaining might reduce the conversational load. Conversely, instead of stopping in the form of a "question" when something is not understood, a design that picks up signs of confusion early and prompts could be possible. Viewing conversation as a dance also affects the evaluation criteria for conversational AI. Not only accuracy and grammatical naturalness but also "whether it was cooperative" will be questioned.


Effective in Clinical, Educational, and Everyday Settings: "Aligning" Rather Than "Conveying"

If conversation is a cooperative action, conversational skills are not determined solely by "language ability." Observational skills, adjustment skills, sense of timing, and the ability to estimate the other person's load are significantly involved. This can also lead to support for people who have difficulties with communication.


For example, difficulty in reading changes in the other person's facial expressions or gaze, difficulty in timing interjections, and challenges in handling interruptions or silences—such discrepancies are often dismissed as "personality" or "lack of effort." However, if conversation is inherently a "multi-channel synchronization" task, it is natural for difficulties to arise there. The focus of support might shift from "correct phrasing" to increasing ways of "synchronizing."


It's practical in daily life too. When stuck in a conversation, instead of desperately choosing words, adjusting by slowing down the tempo while observing the other person's reaction, stating the main points first, supplementing with gestures, or clarifying interjections can actually save the conversation. Being "good at aligning" rather than "good at speaking." If conversation is a dance, being skillful is more about the ability to "make it work as a pair" than solo prowess.


Reactions on Social Media: "Exactly How I Feel," "Useful for AI Design," "Want to Read"

This topic has been shared within the research community, and several types of reactions have been observed on social media.


First, there is a lot of agreement with the metaphor of "conversation as a dance." There is a shared sense of empathy for the point that the "flow" and "timing" felt in everyday life have been re-centered in language processing research. Especially those who have experienced discomfort or fatigue in online meetings find it easy to speak in the context of re-recognizing "how significant face-to-face fine-tuning was."


Next, there are reactions to the researchers' own postings. Positive responses like "I want to reflect the view of language as a collaborative work in future psycholinguistic experiment designs" and "I want to discuss it" line up, with short agreements like "added to my reading list" and "sounds good!" in the comments section. The impression is more about reading and discussing rather than debating the "merits of the content" of the research.


And thirdly, there are associations with conversational AI and interface design. Many people are beginning to feel the limitations of reducing conversation to turn-based responses. How to handle non-verbal feedback, timing, interruptions, and the organization of multiple people will be the differentiators in future dialogue technology. The message of the review is received as a push in that direction.


Conclusion: Conversation Is a Movement to Create "Meaning" Together with Others

When viewing conversation as "transporting words," the reasons for failure seem to be vocabulary or explanatory skills. However, when viewing conversation as "cooperation," the nature of the problem becomes more three-dimensional. Are you picking up the other person's reactions? Is the tempo aligned? Are the predictions off? Are the signal channels sufficient?


Conversation is not the task of handing over completed meaning. Meaning "emerges" each time in the process of the other person's body and your body, the other person's predictions and your predictions clashing, aligning, and then diverging and being adjusted again. That's why conversation is similar to dance. When we dance well, we might feel that we "created a space together" rather than just "talked."



Source URL