Workshop “Embodied Interaction & Embodied Intelligence”, Essen (Germany), August 21-22, 2025

Co-organized by Akiko Yamazaki (Tokyo University of Technology), Keiichi Yamazaki (Saitama University),and Karola Pitsch (University of Duisburg-Essen)

Reviewed by Damien Rudaz (University of Copenhagen)

This workshop aimed to reassess a number of theoretical and empirical debates in conversation analysis in light of recent advances in robotics and conversational AI. With an emphasis on video-based studies, it brought together computer scientists, roboticists, and a majority of conversation analysis practitioners—some of whom occupied multiple roles. The event took place over two days, during which eleven presentations and two roundtables were held.

This initiative combined (1) a descriptive endeavor, namely to document whether and how new technologies shape social interaction; and (2) an “applied” endeavor, that is, to outline how conversation analysis might inform the prototyping of new technological systems. During these two days, the guiding thread of presentations and discussions rested on the tension between these two orientations. It confronted the thorny problem of moving from investigating embodied interaction to designing robots, voice-agents, or, more broadly, any technological artifact. That is, can the description of participants’ endogenous organization within a specific setting—as a situated and continually renewed accomplishment—be of use in the design of technologies intended to be embedded within “similar” environments (or categorized as such)?

Theme 1: What robots do to Conversation Analysis

A central theme in this workshop’s discussions was whether situations involving “robots” warrant a distinct analytic treatment. Indeed, robots offer a peculiar possibility compared to human conversationalists: any researcher can “look under the hood” and access a robot’s discretization of its environment. For instance, it is possible to determine whether, in its code, an ongoing gesture from a robot is labeled as randomly “raising its hand” or as “producing a greeting gesture in response to a human”. It is also possible to access, in real time, the objects or humans a robot recognizes and discretizes within the data stream from its cameras. In short, a robot’s code provides additional forms of accounts of what this robot is “doing”.

However, these accounts are typically not available to a robot’s interlocutors during interaction. Hence, some of the workshop debates revolved around the question of “what should be done with that information?”. Should we limit ourselves to what is publicly available to participants in interaction—i.e., if a robot raises its arm and this gesture is treated as a greeting wave by co-present humans, then the robot is producing a greeting, for all practical purposes? Or should we use this additional information to flesh out the analysis—for example, by indicating that, according to its data streams, the robot had not detected any human, and merely raised its arm as part of a randomly generated animation?

Theme 2: Using CA concepts to account for “human”-“robot” “interaction”

A second debate revolved around whether existing CA concepts and categories can adequately account for situations pre-categorized as “human–robot interaction” (HRI). Should CA rely on analytic tools developed for and from human interaction to analyze situations involving robots and conversational agents? Do concepts or labels such as pre-closings or greetings capture what is demonstrably relevant in human–robot interactions, or do they function only as heuristic, good-enough “placeholders”?

Theme 3: Embodiment and “AI”

In line with the workshop’s title, participants discussed the relevance of embodiment and multisensoriality in interactions across different physical environments (museum exhibits, grocery stores, etc.), with a priori different categories of participants (here pre-labelled as “robots”, “voice based conversational agents”, and “humans”), and various forms of broadly defined activities (doing remote collaborative work, cooking together, adjusting patients’ medication in psychiatric medical teams, etc.). Based on these presentations, the ensuing discussions addressed how various forms of technological devices are embodied in their respective settings—as an accomplishment continuously renewed and produced in and through participants’ conduct. The discussions focused particularly on the material features of these devices that emerged as relevant parameters for participants in situ… and for researchers a posteriori when designing such systems.

Indeed, many presentations assessed the practical insights that might be drawn from their analytic results. An instance of such a transition from empirical analysis to design was illustrated by Akiko Yamazaki, Keiichi Yamazaki, and Haiyin Chen. They began by providing a moment-by-moment analysis of how a museum’s material configuration (such as the presence of protective glass in front of an exhibit) is managed by groups of participants through their spatial positioning, talk, and overall practices of examination. Then, they offered a set of design proposals as to how a guide robot might similarly adapt its positioning in relation to a group’s unfolding activity, whether in a museum setting or in other environments. Put differently, these presenters examined “space” as an emergent property of social interaction, and used this analysis to inform how a robot might appropriately position itself in relation to human participants.

Workshop “Transforming Interaction: A Workshop on Augmentative and Alternative Communication, Social Robotics, and Conversational AI”, Groningen (Netherlands), April 2-4, 2025.

Lead organizers: Jeffery Higginbotham (University at Buffalo), Francesco Possemato (University of Groningen), and Jenna Bizovi (University at Buffalo)

Reviewed by Damien Rudaz (University of Copenhagen)

Without claiming a different justification from that advanced by numerous workshops in recent years, this event was motivated by recent developments in deep learning and large language models. It aimed to evaluate whether recent conversational technologies can be used to improve the design of (1) socially interactive robots, and (2) augmentative and alternative communication (AAC) systems, that is, methods or technologies that support or replace spoken language for individuals with speech or communication impairments.

Given the scope of this inquiry, the workshop brought together participants from diverse backgrounds in order to take stock of shared issues. It offered a cross-disciplinary exchange among practitioners in conversation analysis (CA), augmentative and alternative communication, design, and social robotics, together with specialists in machine learning and language models. More straightforwardly, the stated objective of many participants was to assess whether “anyone in the room knows where they are going”, in the midst of the rapidly evolving landscape of AI

The workshop was structured into ten sessions over three days. These sessions addressed the potential consequences of recent developments in AI across a wide variety of topics, covering both social robotics and AAC-mediated interactions. Those topics included (but were not limited to) turn-taking and multimodal communication in AAC-mediated interaction and social robotics; the context dependency and adaptation of dialog systems’ outputs during talk-in-interaction; and the role of ethnomethodological and conversation-analytic approaches in the design of AAC technologies or social robots. Among these presentations and the extensive debates that followed them, two major themes can be outlined:

Theme 1: Multiple temporal orders and misalignment

A first theme broadly concerned conflicting temporalities in interaction. For instance, as illustrated by Jeffery Higginbotham’s presentation, a difference of a few milliseconds before the production of a turn-at-talk can change what this turn is understood to index, and what action it produces. This ordinary situation is, however, often a constraint for AAC users. For example, users with cerebral palsy may use their gaze to select letters on a virtual keyboard in order to form sentences. In doing so, they may produce responses more slowly, within a different temporal order, than the fast-paced timing of typical conversationalists.

Alongside Higginbotham’s work, several presentations directly or indirectly addressed the interactional consequences of such differences in response timing. They detailed the competence of AAC users in repairing or pre-empting trouble as they design their turns at a pace different from that of their interlocutors. By contrast, it was shown that, to date, social robots and state-of-the-art voice-based conversational agents do not produce such practices when the timing of their turns-at-talk conflicts with the temporal order of typical conversationalists.

Theme 2: LLM-powered AAC interfaces and Goffmanian “authorship”

A second major theme concerned the promise of large language models (LLMs) pretrained on a user’s biographical data to generate responses aligned with that user’s “personality” or typical conversational practices. In such cases, the user—particularly an AAC user—might choose among AI-generated responses rather than typing them out in full. In light of this prospect, some discussion centered around the relevance of the Goffmanian notion of “authorship” to account for these situations: namely, “who is speaking” when a language model generates candidate responses that are, then, selected by a user? Can such situations be described as a triadic interaction in which the generative AI constitutes a participant in its own right? Does the Goffmanian participation framework (animator, author, principal) adequately account for the role and agency of the AI in such interactions?

By way of conclusion. Generative AI and Conversation Analysis: New practical problems, long-standing theoretical debates.

Although both workshops brought together practitioners from different research fields, they were permeated by the same hopes, anxieties, and practical difficulties. These concerns appear to mirror widespread preoccupations in academic research on robotic and conversational technologies, whether regarding their development or the study of their use.

First, as was mentioned repeatedly during these workshops, it is currently difficult to shake the feeling that fundamental technological changes are being driven primarily by large companies, in relative independence from ongoing research in academia. Technologies (robots, dialog systems used in AAC-mediated interaction, turn-taking models, etc.) that are carefully and painstakingly prototyped in a research laboratory may suddenly be rendered obsolete after being produced at scale—or claimed to be—with far greater resources, by a major industrial firm. Laboratories and small research teams are, more than ever, dependent on software and hardware they are not directly able to modify and whose evolution they do not control. They are, so to speak, riding a wave whose direction can suddenly shift, devaluing years of research.

Second, these workshops’ discussions also expressed a lingering concern about the relevance of “applied CA” (or any empirical approach to interaction as a situated phenomenon) for current conversational technologies. Put bluntly, is it still useful to discretize the stream of human practical action, to “put into words” what humans are witnessably doing, in an era of language models directly trained on massive (and sometimes non-annotated) datasets? As conversational agents shift from a reliance on basic computational rules—directly specified by a designer or programmer—to complex probabilistic modes of operation, can CA’s findings still inform the design of such agents?

Indeed, programming and scripting practices are progressively evolving from merely writing deterministic rules (through which, for instance, a robot will trigger a greeting animation upon detecting a human face) to providing instructions to a robot or voice agent (e.g., “be polite and concise in your greetings”). Both the previous workshops documented whether this shift transforms the kinds of empirical insights that an ethnomethodological and conversation-analytic approach might offer to the designers of such technologies, and whether this change is also consequential for the concepts, analytic categories, or methodological tools used to account for interactions (labeled as “social” or not) involving these technologies.

5 1 vote
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest
Inline Feedbacks
View all comments