Blogpost written by Anette Chelina Møller Petersen, Junior Researcher in ETHOS Lab (Spring semester, 2017).

As part of my Master in Digital Innovation and Management at ITU, I recently did a project on conversational alignment in human-computer interaction by studying how two-way interactions between people and the Google Home are taking shape in practice. My project was done as part of an elective during my third semester and as Junior Researcher in the ETHOS Lab at ITU. In this blog post, I will introduce the background, key findings, and limitations of my study, as well as key reflections I made during the process.


Google wants to be your conversational partner… but can it live up to its promise?

Voice recognition is taking off

Voice recognition have finally started to bear fruit! By identifying recordings of the human voice and converting it into a machine-readable format to recognize patterns and learn from them, the algorithms have become better at understanding people and provide answers to our commands [1]. When Google released their “Home” device in late 2016, they helped move the technologies into the home and changed the way we think about voice-controlled assistants.

Google Home has been positively received by most. The media calls it “intelligent and attractive”, with a “great understanding” that can “answer almost any question” [2] and together with its competitor Amazon Echo, Google Home has been predicted to sell more than 24 million units combined in 2017 [3]. It is easy to assume that Google enhance the expectations from users when they claim that their device is capable of “conversational actions” and can fulfil user requests through a “two-way dialogue” [4]. And it is not only Google and Amazon’s technologies that have been said to be able to have a natural conversation. Not long ago, researchers from Microsoft announced that their voice recognition software now has become as accurate as humans [5].

…but it’s not taking over

Despite widespread adoption and prediction of success of Google Home and other voice-controlled technologies, previous research [6] argue that these systems cannot yet have a natural conversation with people, partly due to a limited focus on understanding how users address the systems and on how people communicate. No matter how sophisticated they have become, they do not understand the actual meaning of a language [1]. Interaction is widely known as being complex and tentative process, where people naturally align themselves to accommodate their conversational partner [7,1]. In line with findings from previous studies, my initial analysis showed me that; to understand how Google is experienced as a “conversational” partner, we need first to understand the ways in which users align to the device.“

“To understand how Google is experienced as a “conversational” partner, we need first to understand the ways in which users align to the device.”

How well does these voice-controlled technologies actually understand us on our terms, and how successful are they in responding to our commands and keep an interaction going? Have we really come so far, that we can have an actual conversation with them? If we believe the companies behind some of the recent technologies, Google Home is ready to talk!  

Experiment: Use the help of Google Home to solve a task

Using a systematic approach to Grounded Theory (where theory is inductively derived from the phenomenon it represents and fits the empirically collected data) I set up four experiments to answer my research question and find out; why and in what ways alignment occur in an interaction with Google Home and what it means for the user’s experiences of the interaction. The experiments took place at the IT University of Copenhagen on the 5th of April 2017 and each experiment lasted for 45-60 min.

The four participants who volunteered for the experiment were individually seated in a room with the Google Home device located on a table next to them. I provided them all with instructions and made them fully aware about the experimental setup and their role in it. This was important, as I wanted them to know what I was looking to achieve from the experiment. Secondly, I introduced them to three scenarios I had prepared, each of them including one task which they needed to use the help of Google Home to solve:


Question 1:  â€śImagine you want to take a trip to France, but you’re still not sure where in France you want to spend your holiday. Use Google Home to help you make the decision.”

Question 2: “Imagine you had a day off and wanted to spend your free-time exploring the city of Copenhagen. Use Google Home to help you plan out your day of activities.”

Question 3: “Imagine you have a friend coming over for dinner in a few hours and you need help to figure out what to cook. Use Google Home to plan your meal.”


The scenarios were introduced one by one and after each introduction, the actual experiment took place. It was completely up to the participants to choose what questions to ask of Google Home and what answers they needed from it. They were furthermore in control of deciding when a task was complete or when they believed there was nothing more to gain from the interaction.

How can it help?

Once the experiments had finished, I asked the participants to evaluate their experience during an interview. This was to get a better understanding and familiarity with their expectations of, experiences with, and reflections on their
interaction with Google Home. Findings from my analysis showed that while Google Home is branded as a “conversational partner”, none of the participants experienced as something they could have a conversation with.

From conversation to “Googling”

Another finding from my study showed how alignment is displayed and adjusted not only in response to an utterance, but also before an interaction takes place. The way, in which participants approached Google Home, was based on their expectations of what it is capable of and their prior expectations compared to their actual experience was substantial. The participants rated their expectations up to 10 times higher than their actual experience and the high expectations lowered the initial degree of alignment and led the participants to try to do more with Google Home than what it proved to be capable of.

“The participants rated their expectations up to 10 times higher than their actual experiences.”

Among other things, and in line with the branding of the device, most of them thought Google Home could fulfill the needs of a dialogue (such as by remembering what was said earlier in the interaction). However, as soon as they found out that Google Home was not able to respond to their questions, they began adapting the way it spoke. For instance, one participant said that she ”needed to formulate my questions like I would do a Google Search” and “that’s not how you would normally speak”. After some failed attempts of interacting with the device, something similar happened to another participant who started to approach Google Home by asking it to “search” for information.

Ok, Google. Let’s have it your way!

As mentioned, alignment was required from the first time Google Home responded to a request. In many ways, it ruined the participants’ ability to have a “natural” conversation with the device. Among other things, I noticed how the participants readjusted to Google Home as they experienced difficulties, by using shorter sentences and more basic words. They were willing to go a long way to solve the tasks I gave them and achieve some kind of success in the interaction, by adjusting to the way Google Home spoke.

Constant battles of finding the “right” questions, made them adapt to the device to the extent that they changed the subject to what Google Home “preferred” to talk about. In relation to this, one participant explained how she ended up doing research on Paris as part of task 1 (use Google Home to help you plan 
a trip to France) instead of where she actually wanted to go: “because that was the only thing she (Google Home) understood”.

Who to blame?

A desire for the participants to achieve interaction success, created concerns in relation to who is responsible for when the interactions went wrong or off track. Based on the experiments, I found that the “right” question seemed to be equal to when Google Home was able to match it with a satisfactory answer. Also, some of the participants were blaming themselves for Google Home’s inability to understand them. One participant reflected on this by saying how “I think it’s my pronunciation, how I pronounce things or how I talk to it. I think you have to learn to talk like how you would write” and another mentioned that “It’s weird. Sometimes it gets what you’re saying and sometimes it doesn’t. But maybe I’m not speaking very clear. You never know.” In many ways, the participants questioned the unresponsiveness of Google Home as if it was evidence for trouble in their own performance. As a consequence, and as with all the other examples included in this blog post, it had an impact on the interaction as their level of alignment got bigger.

Reflections

A significant finding from my study, that extended previous research, was the substantial difference in participants’ prior expectations of Google Home, compared to their actual experience with the device. It was made clear how the branding of the device as well as the media and fictional movies have had a great influence of the participants’ view of voice-controlled technologies. For the same reason, they were all very disappointed with the interaction they had with Google Home and felt that they had to align with the device to a large degree. The unrealistic image that Google have gained from media coverage had an influence on the level of alignment as well as a negative impact on the experience, as the gap between that and the expectations was so big. I therefore argue that the branding of Google Home as a “conversational partner” creates unrealistic expectations about its capability, as none of the participants experienced it as something they could have a conversation with.

 â€śThe branding of Google Home as a “conversational partner” creates unrealistic expectations about its capability.”

Google Home is trained on previously gathered data [1] and the more information it gets, the better it becomes at knowing you. And, while I recognize that Google Home can provide a more personalized experience by being connected to other devices, it requires both willingness and trust for people to want to share their private information with Google. There are consequences involved in giving Google Home access to your personal calendar, contacts, private email and so on. For future research, it could therefore be interesting to ask the question of how far we are willing to go in terms of giving companies like Google access to our private information, in order to personalize the experience we have with our voice-controlled devices?

Limitations

Even though Google Home is created with the commitment of fulfilling the tasks created for this study, it can also be used for many other purposes. My study only examined Google Home’s capability when being detached from the user as I reset it to its default settings prior to the experiments and only set it up with language and location restrictions. Because the device had no prior knowledge of the participants and could not perform any tasks that required it to be connected to other devices, my study was using tasks and scenarios that were restricted and not necessarily related to real-life application. Using its full capability might have a different impact on alignment and improve the experience of the interaction as seen from the perspective of the user.

The artificial settings of my experiments may also have altered the behaviors or responses of participants. Even though 3 out of 4 agreed that they would use Google Home for the same purposes involving the same questions as my experiments consisted of, one of the participants expressed how he would only use Google Home for action-based tasks (such as adding appointments to calendar, play music or turning on the TV).

It can also be argued that the experiments were “pushing the limits” in terms of what Google Home was supposed to do or what the participants expected it to be able to do, as well as clearly being influenced by my own assumptions of what I expected it to be able to do. However, as this research attempted to understand the experience of Google Home through the meanings that people assign to it in a given context, it is believed that prejudgment is both an important and necessary part of our understanding.

There is still plenty of work to be done and as our understanding of how users approach voice-controlled technologies increase, so will our ability to improve the systems and develop better devices.

 

I would like to thank all the participants who volunteers in my study as well as my project supervisor and everyone at the ETHOS Lab, for sharing their thoughts and ideas and guiding me through the process.  


[1] Turner, J. H. (2013). Theoretical Sociology : 1830 to the present. SAGE Publications, Inc. The Economist. (2017, January 05). TECHNOLOGY QUARTERLY: FINDING A VOICE. Retrieved January 25, 2017 from The Econimist: http://www.economist.com/technology-quarterly/2017-05-01/language#section-2

[2] Gibbs, S. (2017, May 10). Google Home review: the smart speaker that answers almost any question. Retrieved May 12, 2017 from The Guardian: https://www.theguardian.com/technology/2017/may/10/google-home-smart-speaker-review-voice-controlled

[3] Marchick, A. (2017, January 15). The 2017 Voice Report by VoiceLabs. Retrieved May 10, 2017 from http://voicelabs.co/2017/01/15/the-2017-voice-report/

[4] Google (2017, January 27). Conversation Actions. Retrieved April 25, 2017 from Actions on Google: https://developers.google.com/actions/develop/conversation

[5] Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., et al. (17. October 2016). ACHIEVING HUMAN PARITY IN CONVERSATIONAL SPEECH RECOGNITION. Microsoft Research, 1-12.

[6] Koulouri, T., Lauria, S., & Macredie, R. D. (2016). Do (and Say) as I Say: Linguistic Adaptation in Human–Computer Dialogs. Human-Computer Interaction , 31, 59-95.

[7] Gregory, Jr., S. W. (1986). Social Psychological Implications of Voice Frequency Correlations: Analyzing Conversation Partner Adaptation by Computer. Social Psychology Quarterly , 49 (3), 237-246.