Can Speech Recognition Contribute to Danish Municipalities?

This is a blog post by Silja Vase and Alona Vibe on their research on Speech Recognition Technology in the context of Danish municipalities. Alona and Silja are third semester DIM students and Junior Researchers at ETHOS Lab for Spring/Fall 2017.

Can Speech Recognition Contribute to Danish Municipalities?

Danish municipalities continue to implement speech recognition, despite challenges that affect both employees and the citizens creating concerns regarding the choice of software.

When asking municipalities why they implement rather newly introduced technical software – at least new in public organizations, we often experience the answer: ‘because, we are a modern municipality.’ Leaving us to wonder what ‘modern’ includes and how rational the decision-making within implementation of software is. Since the Agency for Digitization states that Danish municipalities are "first movers" in relation to the use of speech recognition technology (SRT), one might understand the ‘modern’ as something ‘new’, and perhaps not always pragmatic, or at least not combined with the use of a mature technology. In order to understand the ‘first mover’ implementation of technology and the decision making of this, we use the example of SRT by studying the process in a Danish Municipality, that has initiated a pilot project in fall 2016 using SRT as a part of their daily workflow. In 2016 Stanford University proved SRT to be up to 3 times faster than typing, hence why it makes sense for the municipality to purchase the software to reduce the amount of time spent on paperwork.

As Jr. Researchers in ETHOS LAB at the IT University of Copenhagen, one might expect us to already know the basics of organizational change in regard to SRT or similar software. Unfortunately, this is not the case, why the motivation for this project is sparked by the limited knowledge aiming to contribute with findings that can lead to a more evidence-based implementation of SRT in the future. We focus on what identifies and characterizes the social and technological challenges that appear in the process of implementing SRT in a Danish Municipality.

Technical Info-box

The Danish Municipality initiates the pilot project, estimating the implementation process to run in a period of two years within the Center for Family, Social and Employment (CFSE). The pilot project uses the Danish SRT software developed by Mirsk Technologies. Mirsk ensures us in the beginning of the project that SRT is working as intended, and that SRT can lower the number of hours needed for journals and reports, with an hour a week for each employee. We will come back to the earned hour later in this blog article, given we start by considering how SRT is working as intended by the Municipality.

SRT converts speech into text on a screen, using an analog-to-digital converter on a digital device, such as a smartphone, by examining the data through a model containing a language library specified for CFSE. The library is based on thousands of files sent to Mirsk before the pilot project starts. The software then determines what the user has said and the data is sent back to the computer as text. Because of the transmitting to and from the server, the connection is affecting the quality of the SRT outcome. The software does not rely on machine learning, which is why it can be used in sensitive cases. According to Mirsk, a new user's recognition rate is 95.9%, increasing to 98% after 6 months of training. Since the pilot project is expected to last two years, it should be possible to reach the highest rate.

So… How does it work?

We asked two mentors (the employees who are meeting with citizens and writing the reports on the development of their case using SRT) how the SRT has affected their daily workflow, and to demonstrate how they use the technology. Since Mirsk states that the software is working as intended, it is hard to define what ‘intended’ encompasses when realizing that SRT challenges the daily workflow according to the mentors. It turns out that the recognition rate is poor, why the software causes extra time writing reports compared to typing, which subsequently takes valuable time away from the citizens, instead of releasing time from paperwork. Since the reason for the implementation is to lower the time spent on documentation, this causes frustration among the mentors in the municipality.

According to CFSE management, several reasons can be causing this unfortunate start of the pilot project, such as the quality of the connection and the skills of the mentors to dictate clearly. However, the mentors are not given time to learn the optimal use of SRT, creating a frustration among the employees and leading them to either spend time of interest or simply give up on the software due to a lack of purpose. The accuracy of the reports is crucial since these determine the citizens’ status and future. The current use of SRT is concluded to lower the accuracy of the reports, creating harmful consequences of the care given. CFSE contains numerous mentors who are excited about the possibility to decrease time spent on documentation and spend this precious time on the citizens. Throughout our study, we found that there is a lack of common ground, which is established through interaction, due to the mentors down prioritizing the software. Furthermore, the employees are not certain of the specific goal with SRT and fear the decreased time spent on documents will be replaced with less employees.

A Common Ground

The use of SRT provides an interaction between human and machine, relying on both to understand the goal or share intentions in order to get a product of a quality good enough to depend on it. Professor Lucy A. Suchman explains this human-machine communication as actions that are situated in social and physical circumstances, and that the situation is crucial to these actions’ understanding. Interpreting human-machine communication determines all factors affecting the expectations to influence the understanding of common ground. In our case with the Municipality, the understanding is further affected since mentors are obliged to take part of a procedure of improving the software by reporting errors, without being informed about the different phases of which the procedure is running. An example of such phase is that it takes a certain amount of errors before the errors are being corrected in the system. While this amount is being collected no change will happen in regard to improving the system. The lack of insight of this phase and external technical influences lead the mentors to experience the same errors repeatedly, subsequently creating a resistance towards the use of SRT.

According to professor Marc Berg, the process of implementation of IT in public organizations is complex and needs a common ground from all parts contributing to it. Since CFSE is challenged in making the software work as intended, and are unclear of common direction and goal for the use, the pilot project is challenged in becoming a successful implementation of SRT.

When the time is right

Due to the strain of accuracy shown in the analysis, different success criteria and actions of meaning are applied. Perhaps, the software is mature but challenged, given the specific professional conditions within the municipality, and counteracts a possible successful use of SRT. The challenges connected to the aspect of time, seen as care, as well as the interconnectivity between time and accuracy, and therefore also accuracy and care, is a continuous problem loop. Perhaps, this stands as the reason for the municipality to decide to relaunch the software a few months after the pilot project was initiated and the study conducted. Perhaps the municipality was too eager to be ‘modern’.

PA Consulting Group points out that nearly a third of the Danish municipalities have experience with SRT, estimating a great potential for mentors to visit citizens in their private homes as work practice. By making SRT a part of the daily workflow within social practices in the public sphere, it could become a technological frame, invested to discharge time and resources.

What we found most interesting, was that if SRT is successfully implemented, it could allow mentors to use the software in interaction with citizens, without disrupting their current work practices. If the mentors work with the software in conjunction with the citizens, it could possibly create a better understanding of the language in the citizen reports, and the software could act as a medium, creating a new connection between citizens and government.

Bibliography:

Berg, Marc (2003) Health Information Management - Integrating information technology in healthcare work. London.

MIRSK (2016) Talegenkendelse – Brug, Introduktion til MIRSK Talegenkendelse Version 0.2, Slutbruger træning.

Suchman, Lucy A. (2007) Human-Machine Reconfigurations, Plans, and Situated Actions, Cambridge university press, UK, 2nd Edition.

PA Consulting Group (2014) Digitaliseringsstyrelsen - Forudsætninger og barrierer for effektiv anvendelse af talegenkendelse i kommunerne

Can Speech Recognition Contribute to Danish Municipalities?

Can Speech Recognition Contribute to Danish Municipalities?

Share this: