The bird in hand: Humanities research data in the age of Open Data

O'Donnell, Daniel Paul

Traditionally, humanities scholars have resisted describing their raw material as “data.”

Instead, they speak of “sources” and “readings.” “Primary sources” are the texts, objects, and artifacts they study; “secondary sources” are the works

of other commentators used in their analyses; “readings” can be either the arguments that represent the end product of their research or the extracts and quotations they use for support.

This chapter explores the nature of Humanities Data, discussing that it is similar to and yet different from scientific data. 

The advent of the Digital Humanities allows for new types of research and improve the efficacy of some traditional approaches. But it also raises existential questions about longstanding practices. Traditionally, humanities researchers have tended to work with details from a limited corpus to make larger arguments: “close readings” of selected passages in a given text to produce larger interpretations of the work as a whole; or of passages from a few selected works to support arguments about larger events, movements or schools.

In the age of open data, it is tempting to see this as being, in essence, a small-sample analysis lacking in statistical power. But such data-centric criticism of traditional humanities arguments can be a form of category error. Humanities research is as a rule more about interpretation than solution. It is about why you understand something the way you do rather than why something is the way it is. It treats its sources as examples to support an argument rather phenomena to be observed in the service of a solution.

The real challenge for the humanities in the age of digital open data is recognizing the value of both types of sources: the material we can now

generate algorithmically at previously unimaginable scales and the continuing value of the exemplary source or passage. As the raw material of humanities research begins to acquire formal qualities associated with data in other fields, the danger is going to be that we forget that our research requires us to be sensitive to both object and observation, datum and captum, finch and note. In asking ourselves what we can do with a million books, we need to remember that we remain interested in the meaning of individual titles and passages. 

