Published August 4, 2025 | Version v1
Conference paper Open

Making the Repository Programmable The TextGrid Repository as a multi-layered Research Environment

  • 1. Göttingen State and University Library

Contributors

  • 1. Nationale Forschungsdateninfrastruktur (NFDI) e.V.
  • 2. University of Amsterdam

Description

The TextGrid Repository (TGR) is a dedicated research-data repository for the humanities and cultural studies that specializes in XML/TEI–encoded texts. Developed in the DFG-funded TextGrid project from 2006 to 2015, TGR was a pioneering infrastructure, as it embraced the TEI format—a de facto standard in digital humanities and an essential foundation for compu-tational philology. Initially, TGR offered basic storage, download, archiving, and structured metadata for literary texts. Over time, it has evolved into a sophisticated research environ-ment that transcends conventional archival functions. To support advanced scholarly workflows, TGR integrates tools for automated text analysis and annotation—such as Voyant Tools [1], the Language Resource Switchboard [2], and the Annotation Sandbox [3]—with direct export capabilities, thereby lowering technical barriers and streamlining complex analyses. Its incorporation into the NFDI consortium Text+ ushers in a new era of modernization, component upgrades, and enhanced user engagement, open-ing TGR to emerging generations of researchers. Contemporary literary and linguistic scholars demand virtual research environments that dif-fer markedly from those of earlier years. Text-editing projects now emphasize rich presenta-tion layers and custom transformations for reading and highlighting annotated data. Computa-tional literary studies require straightforward access to plain text, programmatic interfaces, and libraries. Library-driven initiatives prioritize authority data integration. Some digital humani-ties inquiries hinge on author attributes—such as gender—while corpus linguistics projects center on detailed linguistic annotations. TGR addresses these diverse requirements by unifying multiple services and access modali-ties. From the end user's vantage point, data can be retrieved via direct reading links, faceted search in the portal's graphical interface, persistent identifiers (PIDs), or programmable inter-faces—including the Python client library tg_client [5]. Prospective data publishers receive expert guidance on metadata quality. In the Text+ context, TGR now offers new services—Notebook Actions [6], which provide a graphical import interface in Jupyter Notebooks, and tg_model [7], which generates the metadata documents required for data ingestion—alongside established tools (tg-crud [8] and tg_admin [9]) that handle repository maintenance and document management. Collectively, these enhancements simplify and accelerate data import and publication workflows. A clear indicator of TGR's transformation is the surge in new projects over recent years, which has greatly enriched the repository's content. Whereas TGR once catered primarily to German studies, it now houses materials in over one hundred languages and multiple script systems (including Coptic, Cyrillic, Arabic, Hebrew, Amharic, Chinese, Japanese, Korean, and Armenian), reflecting the needs of a broad spectrum of disciplines. In our presentation, we will demonstrate key new functionalities and illustrate how TGR's cur-rent multi-layered research environment departs from its original archival role. Special em-phasis will be placed on the latest automated processes, which not only facilitate but actively promote computer–assisted analyses, all while ensuring the highest standards of metadata quality.

Files

CoRDI_2025_paper_220.pdf

Files (398.9 kB)

Name Size Download all
md5:8c84d40689d49cb17f4428a510055603
398.9 kB Preview Download