Published June 29, 2023 | Version v1
Presentation Open

The GOLEM: an ontology and knowledge graph for fiction and reader response

Authors/Creators

  • 1. University of Groningen

Description

This paper presents the first release of a graph database of online fiction corpora taken from various online sources in five different languages (English, Spanish, Italian, Indonesian, Korean). The goal is to describe texts using  “derived data” (OECD, 2005) – or “mesodata” (Boot, 2009) – referring to various textual features, so that comparisons between documents could be done without accessing the full text of the documents. The idea is similar to that of the HathiTrust Extracted Features dataset (Jett et al., 2020), but the features encoded in the GOLEM project (“Graphs and Ontologies for Literary Evolution Models”) are much richer and also refer to narrative and stylistic elements and to reader response data (e.g. characters, relationships, topics, readability, sentiment of comments received by the story, etc.) (cf. Schöch et al., 2022; Pfeffer & Roth, 2019). During this presentation, I will show the challenges faced and the decision taken with respect to the following aspects:

  • developing an ontology for stories and reader response, taking into account the perspectives of both researchers and the communities of online readers, as well as cultural and linguistic differences;
  • extracting structured information about narrative features from the full text of the stories;
  • linking information derived from the stories with information extracted from Wikidata and other fan wikis (e.g. fandom.com);
  • legal and ethical issues related to copyright and personal data, including the licensing of the database for reuse by third parties;
  • possible use cases of the knowledge graph to study changes in fiction over time (Pianzola et al. 2020).

 

 

References

 

Boot, P. (2009). Mesotext: Digitised Emblems, Modelled Annotations and Humanities Scholarship. Amsterdam University Press.

Jett, J., Capitanu, B., Kudeki, D., Cole, T., Hu, Y., Organisciak, P., Underwood, T., Dickson Koehl, E., Dubnicek, R., & Downie, J. S. (2020). The HathiTrust Research Center Extracted Features Dataset (2.0) [Data set]. HathiTrust Research Center. https://doi.org/10.13012/R2TE-C227

OECD. (2005). Derived data element. In OECD Glossary of Statistical Terms. https://stats.oecd.org/glossary/detail.asp?ID=5130

Pfeffer, M., & Roth, M. (2019). Japanese Visual Media Graph: Providing researchers with data from enthusiast communities. Proc. Int’l Conf. on Dublin Core and Metadata Applications, 136–141.

Pianzola, F., Acerbi, A., & Rebora, S. (2020). Cultural accumulation and improvement in online fan fiction. CHR 2020: Workshop on Computational Humanities Research, November 18–20, 2020, Amsterdam, The Netherlands, 2723, 2–11. http://ceur-ws.org/Vol-2723/short8.pdf

Schöch, C., Hinzmann, M., Röttgermann, J., Dietz, K., & Klee, A. (2022). Smart Modelling for Literary History. International Journal of Humanities and Arts Computing, 16(1), 78–93. https://doi.org/10.3366/ijhac.2022.0278

Files

GOLEM_ACH_20230629.pdf

Files (1.8 MB)

Name Size Download all
md5:8cd351ae44d51e29575907faa472d0f2
1.8 MB Preview Download

Additional details

Funding

European Commission
GOLEM - Graphs and Ontologies for Literary Evolution Models 101040938