Mastering disaster: Semi-Automatic Interview Transcription

Written by Benedict Lang MSc Responsibility in Science, Engineering and Technology (RESET), Lab Intern, and Junior Researcher

The series mastering disaster is a research diary that accompanies the process of writing my master thesis. It contains rants, tools, experiences, and more. Written in the hope that someone may find it inspiring or helpful. Transcribing Interviews is not the most beloved task of scientists. While some say that they start to interact closely with their material during the transcription process it still remains a huge amount of work. For my master thesis project, I had in total a bit more than 5 hours of interviews to transcribe. Being a part of ETHOS Lab and having access to the resources of the lab meant that I could use an online tool to semi-automize my transcription process.


What is the process?

1) Upload your audio and let the magic happen After an easy signup process and the creation of a project that will contain the transcripts, later on, you can upload your audio or video document. After some magic, the tool provides you with a simple text editor, where the recognized text is displayed.

2) Check the automatic transcript It is recommendable to quickly skim the transcript to see whether everything from the beginning to the end was uploaded and recognized by the tool properly.

3) Correct errors Now comes the semi-part of the semi-automatic translation. Double-clicking a single word in the text editor that holds your transcripts allows for jumping through the interview. Pressing /ESC/ will then play the audio at that very position so you can correct words or parts of sentences that were not translated correctly.


How much time do I save?

This question obviously depends very much on your situation. It depends on how quickly you usually are with manual transcription. And it depends a lot on your audio files. The better the sound quality, the easier it is for the tool to transcribe things correctly in the first place. Also if you are using a lot of domain-specific words that are difficult to recognize for the AI, then this might generate a lot of manual corrections. So there might be paragraphs that are completely right in the first place but there might also be paragraphs where you have to correct so much that it can be quicker to just transcribe them manually.


What should I pay attention to?
  • Privacy: Make sure that there is no contradiction between your informed consent form that the participants signed and the privacy terms of the transcription provider. Also, make sure to delete your interviews on the server after the transcription process to make sure that there are not leftovers of your data after the research project is finished.
  • Level of transcription: What do you want to transcribe? Do you only want to get the words right or do you also need to transcribe emotions and touches of laughter? AI is not so good at labeling ironic comments – at least not yet.
  • Make sure to save your progress: Although the online tool that was used in this process has an auto-save feature, it lost some parts of the progress twice during the transcription. So make sure to save or export your work additionally to the auto-save function of the cloud provider, as you might lose data if, for example, the internet connections is not 100% stable.
  • Think about transcription when you record your audio. You could, for example, ask your interviewee to try to speak as clearly as possible and to speak not too fast so words do not overlap for example. But make sure, that this does not behave the interviewee to behave “naturally” so you still get the data in the interview that you would get otherwise.
  • Semi-Automatic is not Automatic: You will still need to put a considerable amount of effort into the transcription process.
Would you recommend it?

After transcribing one interview without any help and three interviews with the semi-automatic process, I would definitely recommend using a tool like this. The manual correction lets you interact with the interview in a different way than having to type all of the sentences, so you can also make sense of the interview in another way, as you have more capacity to think about the content, as you don’t need as much capacity to write up the whole text. And if you have interviews where the audio quality is high and the speaker gets recognized by the automatization properly, this really really saves you valuable time within your research project.

The tool I used was You can try it out for free with a 30-minutes-file. Scrintal did not pay for this review.