Transcript preparation for multi-lingual data sessions

A data session report by Stamatina Katsiveli-Siachou

Data sessions are a central feature of research in ethnomethodology (EM) and especially conversation analysis (CA). Although data session practices vary across institutions, fields, or even linguistic communities, the core idea remains the same; one or more participants makes a set of empirical data available to a group of people with shared research interests and they all contribute ideas, or alternative perspectives. 

Even though the idea is always more or less the same, there are important differences in how people run data sessions, as well as challenges to be overcome. The Data session report will focus on the mechanics of data sessions across various institutional or linguistic contexts. Some of the questions we will be discussing are the following: 


Data sessions all over the world foster communities of like-minded researchers and Data session report seeks to bring those communities together, creating room for discussions on how to make the most out of this amazing research practice. We invite you to share your data session experiences, tips or formats, or talk about specific challenges you have faced.

  • How many times do people play a recording?
  • How do they take turns?
  • How are people invited?
  • How open is a session and how are people who are not acculturated into data sessions encouraged to participate?
  • How do people prepare transcripts?
  • How do they deal with varied linguistic capacities in the room? 
  • How do presenters prepare transcripts for multi-lingual sessions? What kind of transcription conventions do they use? Are 3 line transcripts always effective? Etc.

Please email your contribution to

Among all the advantages of data sharing, what I personally appreciate the most about data sessions is the wide range of participants that will sit at the table ready to exchange ideas. You will always find people with different levels of expertise, scholars with various research interests, or even researchers working on a given phenomenon in another language. The ‘magic’ of a data session lies at the shared goal everyone entering the room has; to discuss, brainstorm, suggest and learn. Heterogeneity and collaboration can lead to engaging discussions, interesting ideas and – most importantly – great inspiration.

As a Greek CA researcher working with Greek data at a university in the UK, I have faced many challenges when preparing transcripts for data sessions. Translation can be tricky in many ways and decisions are always consequential. Importantly, CA’s emic approach foregrounds the dataset over researcher’s assumptions. This means that every detail in how people design their turn can be consequential for what is going on, but might also not be treated as such; it all depends on interlocutors’ orientation. This highlights the importance of turn design but also the risk for overinterpretation. When the language of analysis is the same as the language of the dataset, there is no particular problem; what has happened is already there, accurately represented thanks to the widely used Jeffersonian Transcription System. The researcher(s) can then focus on participants’ understanding, receipt of turns and negotiation of meaning, in order to make sense of what is going on in a given interaction. But what happens when the analyst faces the need to make this interaction available to a multi-lingual audience? Immediately, there appear three different analytical levels to be considered.

  • What a speaker is saying/doing and how they are saying/doing it
  • How the researcher presents what is being said/done, which is reflected in decisions regarding translation
  • What the researcher thinks is important to be translated and what they leave untranslated

The dilemma is then obvious. Should one stick to a speaker’s turn, trying to minimize their intervention and, consequently, overloading the extract with all the possible meta-linguistic comments regardless of their analytical significance? Or is it that they should pick up what they think is of importance and translate this instead? The first option entails the danger for the extract to become inaccessible and the data session to become unsuccessful. The second alternative risks to be misleading, driven by the analyst’s understanding or research interest, which again threatens the success of the session. 

The so called interlinear gloss text (IGT) attempts to give a solution to this problem by providing a literal, word-for-word translation of the source text, before the final translation in the target language. However, there is still a couple of challenges to be considered. First of all, how systematic can IGT be? Is it the case that we always use the same way of glossing regardless of our analytical interests? Secondly, is IGT able to give stylistic information, an unquestionably important part of turn design? 

I constantly face similar challenges every time I prepare a transcript for a non-Greek speaking audience. A typical case is the use of Greek pronouns. In my PhD I am looking at how Greek LGBT experience the intersectionality between their sexuality and (Greek) national belonging. I focus on identity construction in talk-in-interaction and, among other things, I look at instances of self-reference. Greek is a pro-drop, null-subject language, which means that the inflection of the verb already indexes information about the person. So the use of the pronoun (I, me, you etc.) in Greek is not necessary and when it occurs it does something (e.g. gives emphasis). Obviously, analysis of self-reference in Greek talk-in-interaction has to be sensitive in the occurrence of pronouns and the translation of the respective instances will take this information into consideration. In this case, my research questions are consequential for the way I choose to translate an extract for the data session, given that I always mark the occurrence of the pronoun with a [^me]/[^us] etc. before translating the verb itself. However, this piece of information doesn’t necessarily appear in other Greek IGTs, which raises questions about neutrality; by marking the pronoun, am I representing more accurately what is being done in the interaction? Or is it that I am switching the audience’s attention to my own interpretation and interests? And to reverse the question; if pronouns are not coded (e.g. in a study with a different analytical focus), is the final translation misleading, since it withholds information? Or is it effective, since it just focuses on what one needs to know to understand the extract and nothing more?

The following lines are drawn from a focus group I conducted for my PhD:

In the example presented, Urania states that she hasn’t understood a question I asked. Maria takes the floor in line 2, to give a potential interpretation (the question is probably about sexuality in the Greek context). Although the 1st person singular is indexed in the verb’s inflection (‘pistevo’), she also uses the pronoun (‘ego’), probably taking distance from Urania’s lack of understanding. This means that I definitely need to mark the occurrence of the pronoun when translating the extract. However, similar constructions (e.g. ego pistevo [[^me] I believe] and pistevo [I believe], are commonly translated in the same way (e.g. I believe) in other situations.

Undoubtedly, it is impossible for all the grammatical and other linguistic information to be included in an IGT and, even if this were possible, the outcome would be a dense extract, inaccessible to a multi-lingual audience. For instance, if I decide to include comments on every particularity of Greek language, I would also need to analyse and account for all the Greek particles, some of which are not directly translatable in English. In the example above, Maria’s turn also includes two particles (‘ts’, ‘e’) which are left untranslated. Normally, ‘ts’ (click) introduces problematic turns and ‘e’ calls upon shared knowledge or sensibility. However, their meaning/use is contextually dependent so that analysis is needed every time they occur in a different environment. An accurate representation of Maria’s turn would need to include this type of information as well. But, I wonder, would an extract full of such particles and their respective explanation be still accessible to a multi-lingual audience for a discussion on self-reference? 

What I have learned so far is that preparing an extract for multi-lingual sessions always entails important decisions regarding IGT and translation, and that every decision intensifies researcher’s responsibility. In any case, honesty is the key to a successful session:

  • Give a short introduction to the linguistic particularities included in the transcription system
  • Open the floor for relevant questions before listening to the recording
  • Be ready to explain phenomena that appear in the extract even if they are not encoded.

I guarantee you’ll enjoy all the ideas suggested by your multi-lingual audience’s different perspective. 

Additional links: Saul Albert’s blogpost pulls together some of the benefits data sessions can offer to those participating, as well as some tips for everyone to get the most out of them.

0 0 vote
Article Rating
Notify of
1 Comment
Inline Feedbacks
View all comments
3 months ago

Since Italian is also a pro-drop language, I face the exact same problem with subject pronouns when I translate my data into English. Working with Italian video recorded data, I also have to deal with translating gestures accompanying talk: how can I transcribe the gesture without imposing my interpretation of it but, at the same time, conveying what seems to be the culturally shared meaning of the gesture?

If transcription is a form of translation itself, translating transcriptions is a very challenging, and therefore interesting, experience.