Project deliverable Open Access

CLS INFRA D5.1. Review of the Data Landscape

Mrugalski, Michał; Odebrecht, Carolin; Charvat, Vera; Börner, Ingo; Durco, Matej

This landscape review focuses on intellectual access, i.e. providing guidance for finding and sharing literary data, while D6.1 approaches the task from a more technological side, collecting and analyzing literary corpora, available formats, tools, and metadata in order to create an exploratory catalogue / inventory of literary corpora and to provide a transformation matrix/toolbox for solving common issues. Yet we coordinate our efforts – beginning with the compilation of the table of literary collections – therefore one can regard these as two sides of the same coin. The review’s point of departure is the abundance of existing data and their diversity or heterogeneity as regards corpus design and underlying concepts, for example the definitions of text (is it a source, an edition, a data set? see chapter 3), the purpose of a corpus (e.g. general, reference, or monitoring corpora, special purpose corpora; see chapter 4), central considerations or criteria regarding the construction of a corpus (sampling, balancing, representativeness, annotation model(s), data format(s); see likewise chapter 4). How can I go about obtaining data without transgressing ethical or legal boundaries (see chapter 5)? We ask: How can we assist literary scholars in searching for and finding existing data that are relevant to their own research questions? And additionally, what kind of research question is relevant concerning the present-day state of the data landscape and literariness and textuality?

Files (5.1 MB)
Name Size
Mrugalski et al. Review of the Data Landscape2.pdf
5.1 MB Download
All versions This version
Views 5656
Downloads 3030
Data volume 153.3 MB153.3 MB
Unique views 5050
Unique downloads 2525


Cite as