Finding Long-Term Solutions for GRETIL, a Large Indologist Corpus
Description
Many digital pioneers in the humanities who started in the 1990s and 2000s are now struggling to keep up with the current digital world. Not only are expectations increasing, but many projects are finding it difficult to maintain their original functionalities. After 20 years or more since their inception, the difficulties are not only technological, but also a lack of funding, diminished enthusiasm and the fact that the original leaders are no longer active and some have passed away.
One example of this is GRETIL, a collection of digital texts developed between 2001 and 2020. This resource is the largest repository of machine-readable Sanskrit texts and includes texts in other Indian languages. The corpus remains popular with scholars for quick reference and text mining, and has been incorporated into several ground-breaking digital humanities projects in Indology.
Although GRETIL relied on TEI to encode its texts in the final phase of the project, the project found ad-hoc solutions for many other issues, such as its own website, its own conversion system to HTML and plain text, its own collection of secondary literature in PDF, and even its own OPAC. Not least due to its early development, at a time when most suitable e-texts were not encoded in Unicode, a major technological update was inevitable after its founder Reinhold Grünendahl retired in 2016.
In 2022, the Text+ consortium was launched as part of the German National Research Data Infrastructure (NFDI) initiative. The main objective of the consortia in this initiative is to ensure the long-term accessibility of research data, to integrate existing solutions and, in general, to improve the FAIR status of the resources. A user story by Buchholz suggested the integration of GRETIL into the Text+ portfolio. As part of the new developments of the TextGrid repository and the integration of existing corpora, we decided to publish the already converted TEI documents in this repository. We are also working on the transformation of HTML into TEI and on improving the quality of the metadata and thus its FAIR status, e.g. by using terms from the Authority control system of the German-speaking (GND). Other components of GRETIL will be published in other repositories (eDocs and DARIAH-DE Repository).
In its environment in TextGrid, GRETIL will offer new possibilities to comfortably search and compare all texts of the collection. The keywords and categories (languages, genre, religious affiliation) now standardized will be available as individual filters for the whole corpus, allowing a more flexible filtering and querying of the corpus.
Some aspects of the GRETIL will remain as they currently are. This means that the imbalances GRETIL exhibits in certain areas, e.g. the ratio of Sanskrit to Prakrit or Tibetan texts, will carry over. However, the new environment will make it easier for new projects to expand or enrich the text material in the future, thus affording the opportunity for further revitalisation of this text corpus.
Files
20250605_GRETIL-1.pdf
Files
(452.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:9188fda8c444c78ed2370d283191874f
|
452.8 kB | Preview Download |
Additional details
Dates
- Accepted
-
2025-06-01