Published July 27, 2022 | Version v1
Project deliverable Open

CLS INFRA D6.1 Inventory of existing data sources and formats

  • 1. Austrian Academy of Sciences, Austrian Centre for Digital Humanities and Cultural Heritage
  • 2. University of Potsdam
  • 3. Humboldt University of Berlin

Description

This deliverable summarises the work done to compile a comprehensive overview of the landscape of literary corpora and sources currently available.
It describes the methodological approach of the work group and analyses the various challenges encountered in the effort to collect information about these resources and consolidate them into a structured form.
Based on an initial inventory of 86 corpora or corpus sets, the report exemplifies their wide variety with respect to structure, context and purpose, and consequently the differing modes of provisioning.
It also proposes a technological path towards making this information searchable via a central discovery catalogue by discussing principal design decisions regarding the data model and the technology stack needed for such a task.

Files

D6.1_Inventory_of_existing_data_sources_and_formats.pdf

Files (2.3 MB)

Additional details

Funding

European Commission
CLS INFRA - Computational Literary Studies Infrastructure 101004984