What a difference a dataset makes? Data journalism and/as data activism

How and when might data journalism be viewed as a form of “data activism”? Data activism has been conceptualised as a set of practices which “interrogate the fundamental paradigm shift brought about by datafication”, including through resisting surveillance and mobilising data to denounce injustice and advocate for change (Milan and van der Velden, 2016). In this chapter we examine how data journalist practices might serve not just to reinforce and reify dominant regimes of datafication, but also to interrogate them and make space for public involvement and intervention around data infrastructures. We examine three ways in which such practices may be construed as interventions around datafication: (1) assembling data publics; (2) making data differently; and (3) investigating datafication. Attending to such practices may contribute to the understanding of emerging sites, objects and action repertoires of data activism.


Introduction
How and when might data journalism be viewed as a form of "data activism"?Data activism has been conceptualised as a set of practices which "interrogate the fundamental paradigm shift brought about by datafication", including through resisting surveillance and mobilising data to denounce injustice and advocate for change (Milan and van der Velden, 2016).In this chapter we examine three ways in which data journalism can serve not just to reinforce and reify dominant regimes of datafication -or ways of rendering life into data (van Dijck, 2014) -but also to interrogate them and make space for public involvement and intervention around data infrastructures.
Researchers and practitioners contend that the boundaries between journalism and activism may become porous, particularly when it comes to emerging digital technologies and media practices (Russell, 2017).A case in point is the Panama Papers, which has been described as both the "biggest leak in the history of data journalism" (Snowden, 2016), winning a top prize at the international "Data Journalism Awards", as well as the "coming-of-age of leaktivism", an emerging form of "social protest" (White, 2016).This is not the first time that data journalism has been associated with mega-leaks.Wikileaks was said to play a key role in obtaining broader recognition for data journalism as an emerging class of news work (Rogers, 2011(Rogers, , 2012)).Other studies indicate the entanglements between data journalism and fields such as civic hacking and open data advocacy (Baack, 2017).Data journalism has been broadly defined as "journalism done with data" incorporating a variety of different practices (Gray et al., 2012).Researchers and practitioners alike have discussed the relationship between data journalism and the promotion of facts, scientific norms and cultures of objectivity (Bounegru, 2012;Anderson, 2015;Gray et al., 2016).Many data journalism projects may be argued to embody a form of what Desrosières calls "proof in use realism", or an attitude that "'reality' is nothing more than the database to which they features of the Guardian Datablog was its prominent links to "download the data" or "get the data", along with questions such as "What can you do with it?"and a recurring open invitation to "Please post your visualisations and mash-ups on our Flickr group" (Guardian, 2011).Rather than focusing on a single story or visualisation, data is thus considered the basis for different ways of seeing and knowing which readers are encouraged to explore.
Databases are also viewed as journalistic outputs in themselves, which can be used not only to provide information, but also to mobilise action and facilitate interaction.ProPublica's "Dollars for Docs" project provides a database of payments from pharmaceutical companies to doctors and hospitals (ProPublica, 2016).The project invites visitors to print out and ask their doctors about payments they have received.In a similar vein, the ICIJ have released a dataset from the Panama Papers so that "regulators and ordinary citizens from around the globe [can] probe the newly available data and find new connections that may have escaped reporters" (ICIJ, 2016).Like the petition or the hashtag, databases can thus be considered as devices to assemble and mobilise publics around issues.
Databases can be used as crowdsourcing mechanisms to enrich existing data or generate new data.They may thus serve as devices to structure interaction and participation in specific ways in order to advance data journalistic reporting.The Guardian's MP's Expenses encouraged readers to classify and comment on documents about UK parliamentary expense claims (Gray et al., 2012: 36, 137-139).La Nacion in Argentina used their VozData collaboration platform to review 40,000 leaked audio recordings related to the death of a government lawyer (La Nacion, 2017).In such cases interaction and participation is highly conventionised in order to enable distributed collaboration at larger scales.

Data journalists also use various tools to open up their data work and collaborate with
others.Many data journalists use the GitHub platform to develop, collaborate around and publish datasets, analysis and code associated with their work (Bounegru, 2015).An investigation from BuzzFeed News into racial divisions in St. Louis County, US, used Github to publish data as well an open-source notebook showing the code for their analysis (BuzzFeed News, 2014)

Making data differently
When data journalists are not able to identify data around the issues and objects that they would like to report on, they may attempt to change how it is made or make it for themselves.
As alluded to above, one way of collecting data is through crowdsourcing mechanisms.When other avenues have failed, data journalists have used such mechanisms to request input from users, such as crowdsourcing data on water bills in France in order to investigate unfair pricing practices (Gray et al., 2012: 106-107).
Many data journalists have sought to create structured databases on the basis of official documents which can be scattered in different locations.A network of journalists associated with the FarmSubsidy project sought to create a single database of EU farm subsidy data by extracting information provided through freedom of information requests and PDF documents, so that they could look at how much large beneficiaries received across multiple countries (Gray et al., 2012: 121-122).A similar approach was used by the Financial Times to investigate the EU's structural funds (Gray et al., 2012: 64-66) as well as by The Bureau of Investigative Journalism to map and report on local funding cuts (Mair et al., 2017).Data journalists may also assemble data themselves using their own methods, techniques and devices.In the "Migrants' Files" project a network of European journalists gathered data about deaths of migrants en route to Europe through a combination of Non-Governmental Organisation data, lists from journalists and media monitoring (Gray, Lämmerhirt, & Bounegru, 2016).The Guardian's "The Counted" project used a similar combination of sources and strategies, as well as reader submissions, social media monitoring, Google Alerts and community-building efforts in order to compile a database of police killings in the US (Gray, Lämmerhirt, & Bounegru, 2016).The Bureau of Investigative Journalism's Dying Homeless project sought to "count those that die homeless on UK streets" organising a collaborative investigation with using a combination of online forms, chat channels and the #makethemcount hashtag on Twitter (TBIJ, 2018b).Beyond the screen, data journalists have also sought to gather structured data using sensors, drones and other devices (Pitt, 2014).
In many of these cases journalists were responding to the lack of data from other sources.
When infrastructures involved in the creation of data do not address or align with their interests or concerns, they may inventively repurpose information from other sources or create their own infrastructures for making data (Gray et al., 2018).This is not simply a case of "filling the gaps" in existing regimes of quantisation or datafication, but rather of rendering different aspects of collective life into data through fieldwork (whether through onsite reporting, sensors or drones), screen work or collaborations with others.

Investigating datafication
In some circumstances, data may become not just a "matter of fact", but a "matter of concern" for journalists (Latour, 2004), leading them to consider it an object of investigation: how and by whom it was generated, how it is used, what it shows and does not show, how it may be manipulated, and the different kinds of biases, inequalities and injustices that it may give rise to.As such journalists may not just take data for granted as "given", but also may consider its "scenography" (Latour, 2008), the conditions and settings in which it is created, used and shared.
Accepted manuscript version of: Gray, J. & Bounegru, L. ( 2019) "What a Difference a Dataset Makes?Data Journalism And/As Data Activism".In Data in Society: Challenging Statistics in an Age of Globalisation, J. Evans, S. Ruane and H. Southall (eds).Bristol: The Policy Press.
A huge amount of digital data is generated as a result of interactions with online platforms, apps and digital devices.While journalists can often tell stories with such digital data, more or less unproblematically, they may also tell stories about the production of digital data through investigations into platforms, algorithms and digital datafication.ProPublica's "Machine Bias" series reporters investigate "algorithmic injustice and the formulas that influence our lives", including price discrimination by online platforms and insurance companies; bias in criminal risk scores; platforms allowing advertisers to exclude users by race and how artificial intelligence engines are trained to be racist (ProPublica, 2018).
Techniques for these investigations include scraping data from platforms, obtaining data from advertising programmes and comparing predictions to outcomes.Sam Lavigne's "Infinite Campaign" for The New Inquiry playfully explores "the bizarre rubrics Twitter uses to render its users legible" by scraping data from Twitter's ad creation page and creating "a taxonomy of human beings according to Twitter and its data brokers" (New Inquiry, 2017).This is then used as the basis for video sequences combining phrases from this taxonomy with stock footage to create an endlessly scrolling array of clips displaying "the fantasies by which Twitter understands us" (Figure 1).In these cases data becomes problematised for journalists.Rather than being taken as a resource to be straightforwardly utilised, data becomes an object to be interrogated.As with the previous section on "making data differently" in these cases the social and technical conditions of creation become a "matter of concern".The aspects of data journalism discussed above can hence be mobilised in the service of what has been called "algorithmic accountability" (Diakopoulos, 2015), providing investigative reports on the operations of platforms, algorithms and other agents of digital datafication.While algorithmic accountability reporting often emphasises analytical operations and decision-making processes such as "prioritization, classification, association, and filtering" (400), it is worth noting that journalists may also attend to ways in which data is shaped through platforms and infrastructures, which make these algorithmic operations possible (Gray et al., 2018).

Conclusion
The cases we have examined suggest ways in which data journalists may facilitate broader public engagement and debate around datafication, rendering visible what is involved in making and using data.Although the question of when data journalism can fruitfully be considered as data activism requires situational analysis (looking at settings and relations, rather than features of projects), nevertheless we hope to have illustrated how the study of data journalism practices may enrich and complement research on the repertoires of data activism.Attending to such practices may also inform experimentation around how data journalism projects may serve not just to reproduce and communicate established facts and dominant forms of datafication, but also how they may constitute a site of collective inquiry, intervention and involvement in cultures and politics of data.
. Others use online services such as Google Docs and Sheets to Accepted manuscript version of: Gray, J. & Bounegru, L. (2019) "What a Difference a Dataset Makes?Data Journalism And/As Data Activism".In Data in Society: Challenging Statistics in an Age of Globalisation, J. Evans, S. Ruane and H. Southall (eds).Bristol: The Policy Press.
designers, innovators, auditors, witnesses and activists.In such cases datasets may not only provide factful representations of the world, but also facilitate the gathering of publics and the coordination and conventionalisation of data work, collective inquiry and other forms of social action.
Accepted manuscript version of: Gray, J. & Bounegru, L. (2019) "What a Difference a Dataset Makes?Data Journalism And/As Data Activism".In Data in Society: Challenging Statistics in an Age of Globalisation, J. Evans, S. Ruane and H. Southall (eds).Bristol: The Policy Press.
Accepted manuscript version of: Gray, J. & Bounegru, L. (2019) "What a Difference a Dataset Makes?Data Journalism And/As Data Activism".In Data in Society: Challenging Statistics in an Age of Globalisation, J. Evans, S. Ruane and H. Southall (eds).Bristol: The Policy Press.