Preprint Open Access
This article presents a proposal for data collection from textual resources in history and the social sciences. The data model and data collection practice we propose is based on detailed, yet flexible semantic encoding of the original natural-language syntactic structure and wording: translating texts line by line into structured data while preserving all of their vagaries, complexities, conflicting testimonies and the like. Our use case is the study of medieval Christian dissent and inquisition, founded on heresy trial records. We propose a thorough way of modelling the sources in order to make them accessible to all manner of quantitative and computational analyses. We frame our approach as "serial and scalable reading". Representing a new variety of "serial history", it allows us to understand and model texts as never before, and helps bridge the gap between quantitative and qualitative research in consequential ways.
Model the source first! Towards source modelling and source criticism 2.0.pdf