Published August 4, 2025
| Version v1
Conference paper
Open
Making the Repository Programmable The TextGrid Repository as a multi-layered Research Environment
Authors/Creators
- 1. Göttingen State and University Library
Contributors
Editor (2):
- 1. Nationale Forschungsdateninfrastruktur (NFDI) e.V.
- 2. University of Amsterdam
Description
The TextGrid Repository (TGR) is a dedicated research-data repository for the humanities and cultural studies that specializes in XML/TEI–encoded texts. Developed in the DFG-funded TextGrid project from 2006 to 2015, TGR was a pioneering infrastructure, as it embraced the TEI format—a de facto standard in digital humanities and an essential foundation for compu-tational philology. Initially, TGR offered basic storage, download, archiving, and structured metadata for literary texts. Over time, it has evolved into a sophisticated research environ-ment that transcends conventional archival functions. To support advanced scholarly workflows, TGR integrates tools for automated text analysis and annotation—such as Voyant Tools [1], the Language Resource Switchboard [2], and the Annotation Sandbox [3]—with direct export capabilities, thereby lowering technical barriers and streamlining complex analyses. Its incorporation into the NFDI consortium Text+ ushers in a new era of modernization, component upgrades, and enhanced user engagement, open-ing TGR to emerging generations of researchers. Contemporary literary and linguistic scholars demand virtual research environments that dif-fer markedly from those of earlier years. Text-editing projects now emphasize rich presenta-tion layers and custom transformations for reading and highlighting annotated data. Computa-tional literary studies require straightforward access to plain text, programmatic interfaces, and libraries. Library-driven initiatives prioritize authority data integration. Some digital humani-ties inquiries hinge on author attributes—such as gender—while corpus linguistics projects center on detailed linguistic annotations. TGR addresses these diverse requirements by unifying multiple services and access modali-ties. From the end user's vantage point, data can be retrieved via direct reading links, faceted search in the portal's graphical interface, persistent identifiers (PIDs), or programmable inter-faces—including the Python client library tg_client [5]. Prospective data publishers receive expert guidance on metadata quality. In the Text+ context, TGR now offers new services—Notebook Actions [6], which provide a graphical import interface in Jupyter Notebooks, and tg_model [7], which generates the metadata documents required for data ingestion—alongside established tools (tg-crud [8] and tg_admin [9]) that handle repository maintenance and document management. Collectively, these enhancements simplify and accelerate data import and publication workflows. A clear indicator of TGR's transformation is the surge in new projects over recent years, which has greatly enriched the repository's content. Whereas TGR once catered primarily to German studies, it now houses materials in over one hundred languages and multiple script systems (including Coptic, Cyrillic, Arabic, Hebrew, Amharic, Chinese, Japanese, Korean, and Armenian), reflecting the needs of a broad spectrum of disciplines. In our presentation, we will demonstrate key new functionalities and illustrate how TGR's cur-rent multi-layered research environment departs from its original archival role. Special em-phasis will be placed on the latest automated processes, which not only facilitate but actively promote computer–assisted analyses, all while ensuring the highest standards of metadata quality.
Files
CoRDI_2025_paper_220.pdf
Files
(398.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:8c84d40689d49cb17f4428a510055603
|
398.9 kB | Preview Download |