Keeping computation open for interpretation
Ethnographers, step right in, please.
When is data science?
We recently held a workshop at ETHOS Lab and the Data as Relation project at ITU Copenhagen, as part of Stuart Geiger’s seminar talk on “Computational Ethnography and the Ethnography of Computation: The Case for Context” on 26th of March 2018. Tapping into his valuable experience, and position as a staff ethnographer at Berkeley Institute for Data Science, we wanted to think together about the role that computational methods could play in ethnographic and interpretivist research. Over the past decade, computational methods have exploded in popularity across academia, including in the humanities and interpretive social sciences. Stuart’s talk made an argument for a broad, collaborative, and pluralistic approach to the intersection of computation and ethnography, arguing that ethnography has many roles to play in what is often called “data science.”
Based on Stuart’s talk the previous day, we began the workshop with three different distinctions about how ethnographers can work with computation and computational data: First, the “ethnography of computation” is using traditional qualitative methods to study the social, organizational, and epistemic life of computation in a particular context: how do people build, produce, work with, and relate to systems of computation in their everyday life and work? Ethnographers have been doing such ethnographies of computation for some time, and many frameworks — from actor-network theory (Callon 1986, Law 1992) to “technography” (Jansen and Vellema 2011, Bucher 2012) — have been useful to think about how to put computation at the center of these research projects.
Second, “computational ethnography” involves extending the traditional qualitative toolkit of methods to include the computational analysis of data from a fieldsite, particularly when working with trace or archival data that ethnographers have not generated themselves. Computational ethnography is not replacing methods like interviews and participant-observation with such methods, but supplementing them. Frameworks like “trace ethnography” (Geiger and Ribes 2010) and “computational grounded theory” (Nelson 2017) have been useful ways of thinking about how to integrate these new methods alongside traditional qualitative methods, while upholding the particular epistemological commitments that make ethnography a rich, holistic, situated, iterative, and inductive method. Stuart walked through a few Jupyter notebooks from a recent paper (Geiger and Halfaker, 2017) in which they replicated and extended a previously published study about bots in Wikipedia. In this project, they found computational methods quite useful in identifying cases for qualitative inquiry, and they also used ethnographic methods to inform a set of computational analyses in ways that were more specific to Wikipedians’ local understandings of conflict and cooperation than previous research.
Finally, the “computation of ethnography” (thanks to Mace for this phrasing) involves applying computational methods to the qualitative data that ethnographers generate themselves, like interview transcripts or typed fieldnotes. Qualitative researchers have long used software tools like NVivo, Atlas.TI, or MaxQDA to assist in the storage and analysis of data, but what are the possibilities and pitfalls of storing and analyzing our qualitative data in various computational ways? Even ethnographers who use more standard word processing tools like Google Docs or Scrivener for fieldnotes and interviews can use computational methods to organize, index, tag, annotate, aggregate and analyze their data. From topic modeling of text data to semantic tagging of concepts to network analyses of people and objects mentioned, there are many possibilities. As multi-sited and collaborative ethnography are also growing, what tools let us collect, store, and analyze data from multiple ethnographers around the world? Finally, how should ethnographers deal with the documents and software code that circulate in their fieldsites, which often need to be linked to their interviews, fieldnotes, memos, and manuscripts?
These are not hard-and-fast distinctions, but instead should be seen as sensitizing concepts that draw our attention to different aspects of the computation / ethnography intersection. In many cases, we spoke about doing all three (or wanting to do all three) in our own projects. Like all definitions, they blur as we look closer at them, but this does not mean we should abandon the distinctions. For example, computation of ethnography can also strongly overlap with computational ethnography, particularly when thinking about how to analyze unstructured qualitative data, as in Nelson’s computational grounded theory. Yet it was productive to have different terms to refer to particular scopings: our discussion of using topic modeling of interview transcripts to help identify common themes was different than our discussion of analyzing of activity logs to see how prevalent a particular phenomenon, which were different than our discussion a situated investigation of the invisible work of code and data maintenance.
We then worked through these issues in the specific context of two cases from ETHOS Lab and Data as Relation project, where Bastian and Michael are both studying public sector organizations in Denmark that work with vast quantities and qualities of data and are often seeking to become more “data-driven.” In the Danish tax administration (SKAT) and the Municipality of Copenhagen’s Department of Cultural and Recreational Activities, there are many projects that are attempting to leverage data further in various ways. For Michael, the challenge is to be able to trace how method assemblages and sociotechnical imaginaries of data travel between private organisations and sites to public organisations, and influence the way data is worked with and what possibilities data are associated with. Whilst doing participant-observation, Michael suggested that a “computation of ethnography” approach might make it easier to trace connections between disparate sites and actors.
The ethnographer enters the perfect information organization
In one group, we explored the idea of the Perfect Information Organisation, or PIO, in which there are traces available of all workplace activity. This nightmarish panopticon construction would include video and audio surveillance of every meeting and interaction, detailed traces of every activity online, and detailed minutes on meetings and decisions. All of this would be available for the ethnographer, as she went about her work.
The PIO is of course a thought experiment designed to provoke the common desire or fantasy for more data. This is something we all often feel in our fieldwork, but we felt this raised many implicit risks if one combined and extended the three types of ethnography detailed earlier on. By thinking about the PIO, ludicrous though it might be, we would challenge ourselves to look at what sort of questions we could and should ask in such a situation. We came up with the following questions, although there are bound to be many more:
- What do members know about the data being collected?
- Does it change their behaviour?
- What takes place outside of the “surveilled” space? I.e. what happens at the bar after work?
- What spills out of the organisation, like when members of the organization visit other sites as part of their work?
- How can such a system be slowed down and/or “disconcerted” (a concept from Helen Verran that have found useful in thinking about data in context)?
- How can such a system even exist as an assemblage of many surveillance technologies, and would not the weight of the labour sustaining it outstrip its ability to function?
What the list shows is that although the PIO may come off as a wet-dream of the data obsessed or fetisitch researcher, even it has limits as a hypothetical thought experiment. Information is always situated in a context, often defined in relation to where and what information is not available. Yet as we often see in our own fieldwork (and constantly in the public sphere), the fantasies of total or perfect information persist for powerful reasons. Our suggestion was that such a thought experiment would be a good initial exercise for the researcher about to embark on a mixed-methods/ANT/trace ethnography inspired research approach in a site heavily infused with many data sources. The challenge of what topics and questions to ask in ethnography is always as difficult as asking what kind of data to work with, even if we put computational methods and trace data aside. We brought up many tradeoffs in our own fieldwork, such as when getting access to archival data means that the ethnographer is not spending as much time in interviews or participant observation.
This also touches on some of the central questions which the workshop provoked but didn’t answer: what is the phenomenon we are studying, in any given situation? Is it the social life in an organisation, that life distributed over a platform and “real life” social interactions or the platform’s affordances and traces itself? While there is always a risk of making problematic methodological trade-offs in trying to get both digital and more classic ethnographic traces, there is also, perhaps, a methodological necessity in paying attention to the many different types of traces available when the phenomenon we are interested in takes place both online, at the bar and elsewhere. We concluded that ethnography’s intentionally iterative, inductive, and flexible approach to research applies to these methodological tradeoffs as well: as you get access to new data (either through traditional fieldwork or digitized data) ask what you are not focusing on as you see something new.
In the end, these reflections bear a distinct risk of indulging in fantasy: the belief that we can ever achieve a full view (the view from nowhere), or a holistic or even total view of social life in all its myriad forms, whether digital or analog. The principles of ethnography are most certainly not about exhausting the phenomenon, so we do well to remain wary of this fantasy. Today, ethnography is often theorized as documentation of an encounter between an ethnographer and people in a particular context, with the partial perspectives to be embraced. However, we do believe that it is productive to think through the PIO and to not write off in advance traces which do not correspond with an orthodox view of what ethnography might consider proper material or data.
The perfect total information ethnographers
In the second group conversation originated from the wish of an ethnographer to gain access to a document sharing platform from the organization in which the ethnographer is doing fieldwork. Of course, it is not just one platform, but a loose collection of platforms in various stages of construction, adoption, and acceptance. As we know, ethnographers are not only careful about the wishes of others but also of their own wishes — how would this change their ethnography if they had access to countless internal documents, records, archives, and logs? So rather than “just doing (something)”, the ethnographer took a step back and became puzzled over wanting such a strange thing in the first place.
The imaginaries of access to data
In the group, we speculated about if ethnographer got their wish to get access to as much data as possible from the field. Would a “Google Street view” recorded from head-mounted 360° cameras into the site be too much? Probably. On highly mediated sites — Wikipedia serving as an example during the workshop — plenty of traces are publicly left by design. Such archival completeness is a property of some media in some organizations, but not others. In ethnographies of computation, the wish of total access brings some particular problems (or opportunities) as a plenitude of traces and documents are being shared on digital platforms. We talked about three potential problems, the first and most obvious being that the ethnographer drowns in the available data. A second problem, is for the ethnographer to believe that getting more access will provide them with a more “whole” or full picture of the situation. The final problem we discussed was whether the ethnographer would end up replicating the problems of the people in the organization they are studying, which was working out how to deal with a multitude of heterogeneous data in their work.
Besides the problems we also discussed, we asked why the ethnographer would want access to the many documents and traces in the first place. What ideas of ethnography and epistemology does such a desire imply? Would the ethnographer want to “power up” their analysis by mimicking the rhetoric of “the more data the better”? Would the ethnographer add their own data (in the form of field notes and pictures) and through visualisations, show a different perspective on the situation? Even though we reject the notion of a panoptic view on various grounds, we are still left with the question of how much data we need or should want as ethnographers. Imagine that we are puzzled by a particular discussion, would we benefit from having access to a large pile of documents or logs that we could computationally search through for further information? Or would more traditional ethnographic methods like interviews actually be better for the goals of ethnography?
Bringing data home
“Bringing data home” is an idea and phrase that originates from the fieldsite and captures something about the intentions that are playing out. One must wonder what is implied by that idea, and what does the idea do. A straightforward reading would be that it describes a strategic and managerial struggle to cut off a particular data intermediary — a middleman — and restore a more direct data-relationship between the agency and actors using the data they provide. A product/design struggle, so to say. Pushing the speculations further, what might that homecoming, that completion of the re-redesign of data products be like? As ethnographers, and participants in the events we write about, when do we say “come home, data”, or “go home, data”? What ethnography or computation will be left to do, when data has arrived home? In all, we found a common theme in ethnographic fieldwork — that our own positionalities and situations often reflect those of the people in our fieldsites.
Concluding thoughts – why this was interesting/a good idea
It is interesting that our two groups did not explicitly coordinate our topics – we split up and independently arrived at very similar thought experiments and provocations. We reflected that this is likely because all of us attending the workshop were in similar kinds of situations, as we are all struggling with the dual problem of studying computation as an object and working with computation as a method. We found that these kinds of speculative thought experiments were useful in helping us define what we mean by ethnography. What are the principles, practices, and procedures that we mean when we use this term, as opposed to any number of others that we could also use to describe this kind of work? We did not want to do too much boundary work or policing what is and isn’t “real” ethnography, but we did want to reflect on how our positionality as ethnographers is different than, say, digital humanities or computational social science.
We left with no single, simple answers, but more questions — as is probably appropriate. Where do contributions of ethnography of computation, computational ethnography, or computation of ethnography go in the future? We instead offer a few next steps:
Of all the various fields and disciplines that have taken up ethnography in a computational context, what are their various theories, methods, approaches, commitments, and tools? For example, how is work that has more of a home in STS different from that in CSCW or anthropology? Should ethnographies of computation, computational ethnography, and computation of ethnography look the same across fields and disciplines, or different?
Of all the various ethnographies of computation taking place in different contexts, what are we finding about the ways in which people relate to computation? Ethnography is good at coming up with case studies, but we often struggle (or hesitate) to generalize across cases. Our workshop brought together a diverse group of people who were studying different kinds of topics, cases, sites, peoples, and doing so from different disciplines, methods, and epistemologies. Not everyone at the workshop primarily identified as an ethnographer, which was also productive. We found this mixed group was a great way to force us to make our assumptions explicit, in ways we often get away with when we work closer to home.
Of computational ethnography, did we propose some new, operationalizable mathematical approaches to working with trace data in context? How much should the analysis of trace data depend on the ethnographer’s personal intuition about how to collect and analyze data? How much should computational ethnography involve the integration of interviews and fieldnotes alongside computational analyses?
Of computation of ethnography, what does “tooling up” involve? What do our current tools do well, and what do we struggle to do with them? How do their affordances shape the expectations and epistemologies we have of ethnography? How can we decouple the interfaces from their data, such as exporting the back-end database used by a more standard QDA program and analyzing it programmatically using text analysis packages, and find useful cuts to intervene in, in an ethnographic fashion, without engineering everything from some set of first principles? What skills would be useful in doing so?
This text was co-authored the following day by Stuart Geiger (Ethnographer at Berkeley Institute for Data Science), Bastian Jørgensen (PhD fellow at Technologies in Practice, ITU), Michael Hockenhull (PhD fellow at Technologies in Practice, ITU), Mace Ojala (Research Assistant at Technologies in Practice, ITU).