What is a data sprint?

An inquiry into data sprints in practice in Copenhagen

By Cæcilie Laursen

Do you know what a data sprint is? Venturini et al. describes data sprints as “intensive research and coding workshops where participants coming from different academic and non-academic background convene physically to work together on a set of data and research questions” (Venturini et al. forthcoming). 

The term data sprint originates from the Digital Methods Initiative (DMI), who propose data sprints as an alternative to Big Data, that provide a different infrastructure and approach to data and research (Digital Methods Initiative 2015). While a quick google search turns up only a few articles about the term,[i] it has recently been taken up across a number of initiative in the Copenhagen research community, including the ETHOS Lab.

Having participated in a DMI summer school[ii] and in different data sprints hosted by ETHOS and other labs in Copenhagen, I was motivated to take a step back and approach the phenomena empirically and to investigate what a data sprint is in practice. Together with another ITU student, Benjamin Hervit, I have investigated how different labs in Copenhagen define and practice data sprints, by interviewing Directors/Managers of four Labs in Copenhagen, that works with data and digital methods. The labs are ETHOS Lab, Tant-lab, [HUMLab], and Digital Social Science Lab (DSSL)[iii], which were chosen because these are the labs in Copenhagen that occupy themselves with data, digital tools and digital methods, which was the decided scope of the research^[iv]. The interviews[v] touched thematically upon: definition and purpose of a data sprint, actors involved, and the structure of a data sprint.

To think with the tools of the labs I have experimented with visualising the data from the interviews with digital tools, and these visualisations will support my analysis. In the following sections I will compare the labs’ descriptions of data sprint, describe the course of a data sprint and what actors are involved and reflect on the implications of my findings.

Data sprint in practice

The visualisation shows that all four labs use the words collaboration and pedagogical/learning when they talk about data sprints. Other words that are shared by three of the labs are explorative, novelty (doing something new), and community building.The graph above reveals specific words the different labs use when talking about data sprints. It gives an overview of how the labs are connected through their discourse about data sprints and how their definitions differ. It is a combination of keywords the labs themselves identified as characteristics when defining data sprint and words identified during analysis of the interviews. From this I have made a network-graph that shows the labs and the terms they connect to. [vi]

Based on my experience I will argue that if we where to have a collective discussion with all the labs, they would agree on many more words. For instance, all the labs work with some kind of digital methods and digital tools during a data sprint. They investigate different kinds of data and they produce some kinds of visualisations to illustrate findings. You could also argue that some of the words have similar meanings, for example issue focused and issue solving along with experimental and explorative. Even though the labs use different language when talking about data sprint they might still share meanings. However, the visualisation still shows the initial focus and particular wordings of the labs, when being interviewed about data sprint.

So, what can we learn from the visualisation? And what is meant by the different word?

In the following sections I will elaborate on the definition and purpose of data sprints, the pace of a sprint and actors involved.

Data sprint defined

When asked to define data sprints, the labs reply:

“An intense period of working with data where you clean, explore, analyse, and visualise data. You could also add that you tell stories with data and that it is an iterative process.” (Michael Hockenhull, ETHOS Lab).

“A data sprint is people with the right competencies to do a data project who meet for a week to do things first and think about them later” (Anders Munk, Tant-lab).

“It is that you can work interdisciplinary and that you can learn something new; both technical but also about a subject that is not necessarily technical” (Lars Kjær, HUMLab)

“It is about creating an open space and a partial structure for a process of development, where you put some people within a framework an unleash the process and then see what happens” (Mads Korsgaard, DSSL)

The quotes vary in specificity and focus on very different aspects. ETHOS Lab’s is the most specific description with different phases or different actions you take during a data sprint. [HUMLab] attaches importance to interdisciplinary. DSSL focuses on creating a space that allows for exploration. Tant-Lab’s definition also entails an explorative aspect in regards to the focus on doing instead of thinking, and they also emphasise the participants need to have the right competencies.

Data sprints is an experiment where you explore and try new things, which means you might not end up with the outcome you expected. The course of a data sprint is described as a learning curve that gives you ideas for how you could do things differently. Learning that there is a pedagogical aspect to data sprint is something all the interviews touch upon. It has a pedagogical aspect for students and other participants that will for instance learn a new digital tool or how to work within digital methods. “Data sprints in a learning perspective is a way of creating a new system where skills and competencies can travel” (Mads Korsgaard, DSSL). [HUMLab] expresses that for them data sprints are a way to present students from arts and humanities to digital methods and to learn people the value of working interdisciplinary. However, Tant-lab stresses that it is people with the right competencies that participate in a data sprint, which indicates some kind of limitation to who can participate.

Several of the interviewees mention that data sprints function as a community building exercise. It is a way of building a community around the labs and engaging people, both students and people outside the universities.

“a side effect that we have been aware of has been that people have learned by doing in terms of data analysis and methods and so on. It has actually been a pretty essential part of it, and a kind of community building exercise for us in the lab”. (Michael Hockenhull, ETHOS Lab).

“We have deliberately adopted data sprints as a way to create lab culture. … When you start a new lab it is a very convenient way to get people to commit for some time. It is a convenient way to engage users on the outside, which is part of the reason why we have a lab. It is to make techno-anthropology relevant for potential employers and society around us. It is also a way in which we can involve students. There is a pedagogical point to data sprints in the Techno-anthropology lab” (Anders Munk, Tant-lab).

Data sprints is a way to set up a framework within the lab that engage people and inspire a learning environment. The data sprint gets people together for collaborating on a topic and by that sharing competencies and learning from each other. It becomes a community for learning. At the same time, as the quote from Tant-lab implies, it is a way to establish oneself to society by demonstrating skills.

This also seems to be a purpose of data sprints for HUMLab: “It can contribute to influencing the subject knowledge of arts and humanities. It can open peoples eyes to the possibilities that lies within digital tools” (Lars Kjær, HUMLab). [HUMLab] argues that humanities are not being appreciated in the current labour market, and my interpretation is, they believe that with knowledge of digital methods, humanistic scholars would be in higher demand. The purpose is also to demonstrate the strength of combining humanistic and technical skills. DSSL and [HUMLab] share the terms interdisciplinary and new competencies, which in my opinion makes sense, since they are affiliated with social sciences and arts and humanities at KU. Data sprints is a way for their students to work interdisciplinary and to gain new competencies within digital methods. DSSL perceives data sprints as a way to overcome the silo structure of academia by encouraging interdisciplinarity, which DSSL believes the current structures of academia do not.

Do you sprint at a data sprint?

The word sprint calls to mind doing something fast and at a certain pace. Data sprint has arguable grown out of what you could call the maker space. It has resemblance with hackathons, book sprints, and design sprints (Digital Methods Initiative 2015, Berry et.al 2015, Direkova et.al 2015). They are all short-form methods, that forces participants to work fast because of the limited timeframe, which is also the case for data sprints.

“Data sprints provides fertile ground for many things to grow in – that is perhaps more valuable than the final product. The fact you’re are doing a final product is just the means to get you there. … Doing a final product is the motivation, that is what will make people work hard. That is what will make them sprint” (Anders Munk, Tant-lab)

Where the other short-form methods deliver a product in the end, the process is just as much the product of a data sprint as the end product, as described in the quote above. The limited timeframe means that the results of a data sprint might not be finished or well defined, but it is the process and learning outcome that is more important for the labs. The timespan for the data sprints in the four labs differ. The data sprints in Tant-lab last around five days. They have modelled their data sprints with inspiration from DMI and the Sciences Po media lab in Paris. ETHOS Lab has done data sprints that last one day for 7-8 hours and also ones that lasts for 3-4 days, but spread over several months, similar to the data sprint series they co-coordinate with HUMLab. The one DSSL has held lasted one day for 8 hours. Regardless of the timeframe, all the labs go through some similar phases, which are illustrated in the figure below.

There exists a pre-phase prior to the data sprint where data are gathered and people are invited to participate. If the data sprint is in collaboration with an institution or company, the facilitators have an initial meeting to adjust expectations. During the data sprint you will have a welcome presentation with some sort of introduction to the issue in focus, the already gathered data, along with a brainstorm on other data sources, and group formations. Afterwards the ‘sprint’ begins and the participants work with and explore the data, create visualisations and tells stories with the data, as described in the definition quote from ETHOS Lab. Mid way in the sprint you can have a meeting to get an overview of what people are working on, if some of the groups have issues, and if there is synergy between the groups. Afterwards the sprint will continue until a set time limit of a final presentation, where people are to present the visualisations the have crafted and what the have learned, which is the product of the sprint. Perhaps after the sprint you will have a phase where you work on publishing the results or evaluate the outcome of the sprint.

Tant-lab compares data sprints with the prevalent way of doing research: “you get things done in days that would normally take you four months or maybe even a year” (Anders Munk, Tant-lab). On the other hand, it is described as being harder to publish the research after a data sprint.

Data sprints foreground the data

“Data prints foreground the data. …[they] are about the data and not about the hack. It is about exploring data, and about telling stories with data, and analysing data, and not about hacking your way to some instantaneous solution. […] Data sprints are not about fast solutions, but more [about] providing intense and rapid insights, explorations, and the ability to question things” (Michael Hockenhull, ETHOS Lab).

“Within academia almost all faculties are used to work with the concept of data because it is linked with empirical research, where the word hack can scare many people away. … I think the word data pave the way for a wider group of participants. … We use data everywhere so data sprints in a way indicate the distribution and perhaps also a sort of democratisation of these [digital] methods. … Conceptually, I believe data sprints as an expression is more including compared to hackathons”. (Mads Korsgaard, DSSL)

The quotes describe the importance of data in data sprints. It would not be a data sprint without data, as data sprints foreground the data, and data has agency that allows for many stories to be told. DSSL also suggests how the focus on data in a data sprint is more including, compared to a hackathon, because data is familiar to most faculties, where hackathons usually involve hacking something, which would require some technical skills.

The data used at data sprints is typically in the computer. It can be digitised data such as text and pictures, social media data, data from other domains, data from companies or public institutions, and so forth. The data in [HUMLab] tends to be historical data that has been digitised, such as maps and photographs from the Danish West Indies, which relates to the labs affiliation with Arts and Humanities.

I will argue it is important to note that, whereas the epistemology of digital methods contributes with new insights and ways of obtaining knowledge of societal trends, it also leaves other aspects invisible, in part because not all data is digital or digitised.

Data sprint competencies

The visualisation shows the different roles/skills the labs deem relevant during the course of a data sprint. It gives an estimate on how the labs weigh the importance of the different roles[vii].

People involved in data sprints has various competencies. Instead of merging for instance the role of data designer and the skill of design, I have kept the words expressed by the labs. Tant-lab has a clear idea of which roles needs to be present during a data sprint and they often invite people from the outside to fulfil the different roles. They for example source a data designer from outside of the lab. ETHOS Lab is primarily composed of students that either volunteer or take part in the community, and it is mostly students that fulfil the different roles. This means that the level of design technical expertise depends of the skills of the students participating.

All the labs mention the role of an issue expert; a person with knowledge about the topic being investigated. Tant-lab describes why that is:

“Sometimes we make wild and crazy claims in digital methods, which needs to be checked and balanced against someone who is actually in that field. This is basic STS sensibility – that we should make knowledge claims in the presence of those we make knowledge claims about. I think that is a more robust way [of doing research]” (Anders Munk, Tant-lab).

An actor with knowledge of the phenomena investigated can help to make sense of the findings in a data sprint and ground them. This also suggests that data sprints need thick data, to draw on the point from Wang (2013), in order to put the data in context and to enrich the data.

Concluding remarks and reflections

So, what is a data sprint? On the basis of my analysis a possible answer is that a data sprint is an explorative and collaborative method involving heterogeneous actors, where people with different competencies try something new and work together towards a common goal for an intense period of time in order to create valuable insights with data. However, the purpose of this paper is not to prescribe definitely, what a data sprint is or what it should be. On the contrary the aim of this research is to provide insights into what a data sprint can be, and how it is practiced in digital methods labs in Copenhagen.
I wonder, what do data sprints offer which have caused the method to spread and which makes it attractive? The labs mention that the format provides an alternative learning space and it is a way of building a community around the labs as well as gaining value in society. Is the data sprint practice then a response to a world where humanistic inquiry has been devalued? What role do the method and the digital tools play in the intensification and instrumentalisation of research and teaching? I recently discovered that Dagbladet information has hosted data sprints. This indicates that the practice resonates both inside and outside of academia. Does the attraction lie in the blurring of the distinction between inside/outside academia?

The strength of data sprint is that the method is not stable. It is experimental and can be used in various contexts. The data sprint is multiply produced because the method change depending on the context and actors involved. Different skills, different tools, different data make up very different data sprints, but it is this constellation of heterogeneous actors that allows the experimentation to take place. In a sense the data sprint constitutes a new kind of laboratory for experimenting with data and digital tools. It is up to the researchers organising a data sprint to define how they will do it, what they will investigate and what the purpose and outcome will be. Something to think about would then be, who is the data sprint for and who is invited to participate?[viii]

In ETHOS Lab data sprints has often been a way of getting people together to have fun and experiment with data, but sometimes it evolves into something more, and becomes more research oriented. Then perhaps, when we engage with the method, some ethical reflections are called for. Law and Urry (2005) argue that methods and practices are performative: “… they have effects; they make differences; they enact realities; and they can help to bring into being what they also discover” (Law and Urry 2005: 393). This suggests that performing a data sprint enacts the method, while simultaneously producing realities about the issue investigated. It leaves us with the question; what realities do we want to help make?
In ETHOS Lab we plan to unpack this question and keep tinkering with data sprint as a method.

I wish to extend a huge thanks to Anders Munk, Lars Kjær, Mads Korsgaard and Michael Hockenhull for taking the time to talk with Benjamin and I about their views on data sprints.
This paper is the product of a Junior Research project in ETHOS Lab. It is an edited version of an exam submission in the course ‘Innovation and Technology in Society’ on the MSc programme in Digital Innovation and Management.

Notes

[i] If you do a quick google search you will quickly realise that not many articles have been written about the topic. I have been able to locate three articles on data sprints, where two of them will be officially published in 2017. The articles are: Berry, D. M., Borra, E., Helmond, A., Plantin, J. C., & Walker Rettberg, J. (2015). The Data Sprint Approach: Exploring the field of Digital Humanities through Amazon’s Application Programming Interface. Digital Humanities Quarterly, 9(3).
Venturini, T., Munk, A., & Meunier, A. (forthcoming). Data-Sprint: a Public Approach to Digital Research. (C. Lury, P. Clough, M. Michael, R. Fensham, S. Lammes, A. Last, & E. Uprichard, Eds.) Interdisciplinary Research Methods (forthcoming).
Anders Kristian Munk, Axel Meunier, Tommaso Venturini (Forthcoming): Data Sprints: A Collaborative Format in Digital Controversy Mapping. Revised version will appear in the Digital STS Handbook, Princeton university Press, 2017.
The three articles identify the Digital Methods Initiative (DMI) and their summer and winter schools as the origin of data sprint. The DMI is an internet studies research group affiliated with the University of Amsterdam. The method of data sprint appears to have spread in the academic environment, and it has also reached Denmark. Where the efforts within digital methods appear to have centred around DMI in Amsterdam and Sciences Po Media Lab in Paris, in Denmark, or at least in Copenhagen the work seems to be spread out in a network of labs.

[ii] Every year DMI host a summer and winter school, where students participate in data sprints and learn to do research with digital methods (Rogers 2016).

[iii] ETHOS Lab is located at the IT University. Tant-lab is short for Techno-anthropology lab, and the lab is connected to the MSc in Techno anthropology at Aalborg University in Copenhagen. [HUMLab] is connected to the Faculty Library of Humanities. Digital Social Science Lab (DSSL) is connected to the Faculty Library of Social Sciences. I interviewed Michael Hockenhull, former Lab Manager in ETHOS Lab and Anders Munk, Lab Director in Tant-lab. Benjamin interviewed Lars Kjær from [HUMLab] and Mads Korsgaard, Lab Director of DSSL. To avoid confusion, I will refer to the labs and not the lab directors, when I include quotations. Tant-lab and ETHOS Lab opened in 2015 and [HUMLab] and DSSL in 2016. Both ETHOS Lab and Tant-lab have hosted several data sprints, and Anders Munk from Tant-lab is the co-author of two of the articles about data sprints (see Venturini et al. forthcoming, and Munk et al. forthcoming). DSSL has hosted one data sprint and [HUMLab] has hosted a data sprint series of three called ‘Representing History through Data’ in collaboration with ETHOS and the Royal Library.

^[iv] After performing our interviews it was brought to our attention that a data lab is opening at Kub nord. However, to our knowledge they have not been involved in any data sprints ( http://kub.kb.dk/datalab). I am also aware that there exist similar labs in other cities in Denmark e.g. DIGHUMLab in Århus (http://dighumlab.com/).

[v] The empirical data for this paper is primarily generated by semi-structured interviews. We created an interview guide prior to the interviews, but the semi-structured nature of the interviews allowed the questions to vary a bit depending on the interviewer and interviewee. I have adopted an exploratory approach and coded the interviews with inspiration from grounded theory, first developed by Glaser and Strauss in 1967 (Bryman 2012). From the coded interviews I have made visualisations to supplement the analysis and to explore the data from the interviews in a different way.

[vi] The graph-file is made by using Table2net and it is illustrated in the open-source visualisation tool Gephi. It is spatialized using the layout algorithm Forceatlas2 (See Jacomy et al. 2012) along with the algorithm Lableadjust, to keep the labels from over lapping. The nodes are sized according to degree (the number of edges). I have coloured the labs and the connected nodes in individual colours, thus the terms connected with more than one lab will be a mixture of those colours.

[vii] The visualisation is created with Raw. On the left is the name of the labs, which have a unique colour. The lines connect with the roles/skills they mention in the interviews. The thickness of the lines is an estimated guess, on the basis of the interviews, how high they deem the importance of the different roles (The thicker the line the higher importance).

[viii] People outside of academia are often invited to partake in data sprints, both to learn the value of data sprints and digital methods, but also for them to raise an issue they need help with and to provide extended knowledge about that topic. Working on an issue raised by people outside of academia is arguably a way of doing knowledge that people can use. It can be a way of giving them the tools that enable them to have a voice and raise their concern, which they might not have had before, because they did not have the data or did not know how to use the data. The data sprint methodology has the potential of a more normative-activist stance which Woodhouse et al. (2002) argues for. Some of the data sprints in the Copenhagen based labs has engaged societal issues such as the students housing situation and obesity. Nonetheless, assisting often forgotten voices (Woodhouse et al. 2002: 309) and working on more activist topics are not always the agenda of data sprint. Participants can also be a company interested in crunching data in order to improve a business aspect. This would be to work on behalf of those already privileged (Woodhouse et al. 2002).

References

Berry, D. M., Borra, E., Helmond, A., Plantin, J. C., & Walker Rettberg, J. (2015): “The Data Sprint Approach: Exploring the field of Digital Humanities through Amazon’s Application Programming Interface” in Digital Humanities Quarterly, 9(3).

Bryman, Alan (2012): “Qualitative data analysis”, in Social research methods 4th Edition, 4th edition ed., Oxford University Press, Oxford; New York, March 2012 (English)

Digital Methods Initiative (2015): “Data Sprint: The New Logistics of Short-form Method”, accessed 06/12/2016, https://wiki.digitalmethods.net/Dmi/WinterSchool2013 , Revision 17 Dec 2015 by UnknownUser

Direkova, Nadya and ‘The Google sprint Masters’ (2015): “Design sprint methods”, Accessed 06/12/2016, https://developers.google.com/design-sprint/downloads/DesignSprintMethods.pdf , Mountain View, March 2015

Jacomy, M., Heymann, S., Venturini, T., Bastian, M. (2012): “ForceAtlas2, A Continuous Graph Layout Algorithm for Handy Network Visualization”, DRAFT, Accessed 16/01/2017 http://medialab.sciences-po.fr/publications/Jacomy_Heymann_Venturini-Force_Atlas2.pdf

Law, John and Urry, John (2005): “Enacting the Social”, Economy and Society, 33: (3), 390-410

Munk, Anders K., Meunier, Axel, Venturini,Tommaso (Forthcoming): Data Sprints: A Collaborative Format in Digital Controversy Mapping. Revised version will appear in the Digital STS Handbook, Princeton university Press, 2017

Rogers, Richard (2016): “The Digital Methods Initiative – About Us”, accessed 14.12-2016, https://wiki.digitalmethods.net/Dmi/DmiAbout

Venturini, T., Munk, A., & Meunier, A. (forthcoming): “Data-Sprint: a Public Approach to Digital Research” (C. Lury, P. Clough, M. Michael, R. Fensham, S. Lammes, A. Last, & E. Uprichard, Eds.) Interdisciplinary Research Methods (forthcoming)

Wang, Tricia (2013): “Big Data needs thick data”, Ethnography Matters – May 13. 2013, http://ethnographymatters.net/blog/2013/05/13/big-data-needs-thick-data/

Woodhouse, E., Hess, D., Breyman, S., and Martin, B. (2002): “Science Studies and Activism: Possibilities and Problems for Reconstructivist Agendas” in Social Studies of Science 32 (2): 297-319.

Cæcilie Laursen – What is a data sprint?

What is a data sprint?

An inquiry into data sprints in practice in Copenhagen

By Cæcilie Laursen

Data sprint in practice

Data sprint defined

Do you sprint at a data sprint?

Data sprints foreground the data

Data sprint competencies

Concluding remarks and reflections

Notes

References

Share this: