Minna has done a project about the optimisation of data visualisations. Using questionnaires and interviews with a variety of different interest holders, laymen and experts, her research sheds some light on some different views on data visualisation. Below, you can read about about her choices of methods, the process of her work, and what she discovered during her research.
Making Data Visualizations Understandable for the Average Citizen
These days the word data is commonly used in everyday conversations throughout the western world. “They collect my data”, “Do you have anymore mobile data?” “The terms and services are asking to collect your data, did you know that?” and “It’s based on data” are all sentences that you might hear doing an average day. Most ordinary people feel they can use the word data and know what they are talking about, but is this really the case? Data are after all not just one thing. Data are everything and everywhere, making it extremely important to everyone. So how do we go about making this concept – and the content; the actual data – understandable for the average person? My motivation for asking this question comes from experiencing the countless computer programs, technology hardware and data visualizations created by those in the know; the software developers, data scientists and tech moguls who without a doubt has a set of very different preconditions for understanding and using their creations than that of an average person. With data becoming more and more an essential part of our everyday life, it also becomes important to make sure that not only the people educated within data science understand the data, can read the data and use the data. In order to investigate if this is a real problem, and if so, how we change it, I’ve conducted a study that leans on empirical research of a case as well as interviews with specialists within the field of data.
The case, which this research is built on, is a real time data visualization of CO2 emissions in Europe, called the electricity map. The subjects of climate change and environmental issues are ones that use a lot of data in order to explain the situations to citizens all over the world. Furthermore these causes seem to be ones that have somewhat succeeded in getting their data into the mainstream media and gaining the attention of average people. Leaning on this, my motivation for choosing this data visualization as a case study was the fact that most people would have some idea what the subject was about – thereby ensuring that the only thing which might confuse was the visualization of the data itself, which was to be the focus of this study.
In the following few pages I explain my methodological approach to examine how data visualizations become understandable or confusing for average people without prior knowledge of data science. I present my findings as well as what the process of doing this short research have taught me. Finally I suggest ideas for further investigating this subject.
This study took an mixed method empirical approach to researching the correlation between data visualizations and the understanding of the data behind it. The methodology can be divided into two separate steps.
- Case study of the electricity map (kilde) using a quantitative survey
- Qualitative expert interviews
The case study was done in collaboration with another junior researcher at ETHOS Lab at the IT-university of Copenhagen, Adam Pantkowski.
As an initial part of the study, the focus was on understanding the data visualization. This was done by using time in the discussion forums of the developers of the electricity map learning which problems they faced, how and why changes were made to the map, and what their perception of the purpose of the map was.
After gaining an initial understanding of the purpose of the data visualization, which was described by the creator as: “The idea was just to visualize some data that I found”, this knowledge was used to create a quantitative survey. With the realization that the electricity map did not have a specific purpose beyond being a visualization, the survey questions was focused on the different, physical elements of the visualization instead of the purpose as seen below:
- When first looking at the electricity map, did you understand what it was trying to display?
- On a scale from 1 to 5, how structured did you find the website displaying the map?
- Rate the elements of the website according to their relevance for your understanding of the electricity map
- What parts of the map do you think needs improvement?
The survey was distributed by being embedded on the website displaying the electricity map, in order for recipients to have seen the visualization before answering. As the website is quite unknown the link was shared on Facebook groups relevant to environmental issues and climate change as well as in various network groups of university students. Only 42 people answered the survey even though it was open for submissions for two weeks. As a result, I chose to do further empirical research, focusing on qualitative interviews instead of the previous quantitative approach.
Three interviews were conducted as expert interviews, meaning that the participants had great knowledge of the interview subject. The interview questions were created on the base of the answers from the survey, in which I found that the more elements were added to the visualization, the less understandable it became to people. With this in mind, the three experts were asked about the complexity of data and how to get around this. While two of the interviewees were expects within the field of data collecting and storing, the last was an expert within the field of environmental issues.
All interviews were semi structured, thereby allowing for the conversation to change a bit, depending on the answers from the participants.
In the following section I explain my findings from above-mentioned interviews and survey.
The quantitative survey conducted over two weeks did not return the expected amount of submissions, however the answers were still used to see patterns and raise further questions. The biggest finding was the inconsistencies between an early question asking: “When first looking at the electricity map, did you understand what it was trying to display?” where the majority answered yes (see figure 1) and the question asking participants to rate the elements of the visualization according to their relevance for understanding the map as a whole. In this question, the more elements the participants were made aware of, the more they tended to answer along the lines of “It’s hard to understand” (see figure 2) despite the positive answers to the above-written question. This might indicate that viewers of the electricity map have an perception of understanding what they are seeing, but the more they are made aware of what the map actually shows, the less correct their original understanding was. This hypothesis – and these two survey questions leading to the hypothesis – became the foundation for the expert interviews. In other words, the interview questions were looking into this inconsistency between these two parts of the survey in order to explore it further.
Figure 1 (Survey question asking whether or not the electricity map was understandable)
Figure 2 (Survey question asking to rate elements of the electricity map)
While the interviews were focused on highlighting why data visualizations are not always understandable, and how this can be improved, the results varied from this an into other territories. In all 4 major themes were present in the three interviews as indicated here with quotes as examples.
Throughout the interview all three interviewees raised questions of whether or not it is ethical to share data with the public, or in other words, if we should even try to make the data understandable using visualizations. One participant stated:
“We’re collecting all of this data for a reason, and some people agree and some don’t. Either way it doesn’t do anyone any good to be constantly aware of how much data is collected about them, which they will be if we for instance make visualizations showing everything. Imagine if this map [the electricity map] showed the exact city, no… Actually… the exact street where a lot of energy was being used and helping to turn the country into one of the energy eaters… I’m sure that is possible, but should anyone do that? I don’t think that would help anything expect freaking people out. We have a responsibility when we understand this creepy thing called data, and not everyone need to have this on their shoulders”. (Quote, Interview with specialist in collecting public data)
Knowledge is power:
While the theme of ethics pointed to the experts thinking about the citizens and how data visualizations might not be in their best interest, this theme was different. The idea that knowledge is power came up doing two interviews where participants highlighted that if everyone had access to data, it would be useless for competitive reasons, whether between governments or between businesses. An interview with a database manager working with sorting and keeping big amounts of data stated:
“So if we make these visualizations for everyone to understand… why would we even collect data then? Let’s be honest… all this data is collected about us to know stuff that no one else does.” (Quote, interview with database manager)
Nobody knows enough yet:
This theme showed that two of the participants were not against easily understandable data visualizations but rather thought that it might be too early to know just how to make data completely understandable. An example from the interviews is this quote:
“It hasn’t been very long that we have been talking about data like this you know. And yes, people talk about it at the dinner table sometimes, but those of us who work with it still don’t know enough about how to use it and how not to use it. We shouldn’t make all data available to everybody before we understand what exactly that would mean.” (Quote, Interview with specialist in collecting public data)
Data visualizations are already great:
The last theme was only prominent in one of the interviews, but it did form the whole conversation. This interviewee stated that there was no need to really improve the data visualizations, because they are already understandable to most people:
“Obviously visualizations are all that stuff we just talked about… scary and creepy, but I don’t think they are hard to understand for most people. Look at this map [The electricity map].. Germany is black… everyone is going to understand that is bad. Really, it’s quite simple, and most visualizations like this are.” (Interview with environmental specialist)
Overall the interviewees were much more concerned about the question of whether or not we should make data visualizations understandable, compared to how we would do so.
Reflections and further research
When looking at the themes that arose doing the three interviews, it is clear that the question I intended to answer in this research might not be the right one. Thereby not saying that the question is not important, however it would seem other questions need answering first. Questions such as should we even make data understandable to everyone? It is however important to keep in mind that this research used expert interviews, meaning that their opinion might be very different to that of a average citizen. As in most cases, the experts tend to want to stay experts in their field, and therefore it could be interesting to do the same type of research using another approach; interviewing people without existing knowledge of the world of data. As this research is still in the early stages, and the empirical data could still be analyzed in depth, the next step might be to include further empirical data in order to go a comparative route. I would suggest further research to move away from looking into the specifics of data visualizations and instead focus on the different understandings and expectations to the future of data between experts and ordinary people. That way the research would gain an insight into whether or not people actually find the need for understandable data. I assumed this was the case, but looking at my empirical data, I hypothesize that this assumption might not be correct after all.