The GOLEM: an ontology and knowledge graph for fiction and reader response
Description
This paper presents the first release of a graph database of online fiction corpora taken from various online sources in five different languages (English, Spanish, Italian, Indonesian, Korean). The goal is to describe texts using “derived data” (OECD, 2005) – or “mesodata” (Boot, 2009) – referring to various textual features, so that comparisons between documents could be done without accessing the full text of the documents. The idea is similar to that of the HathiTrust Extracted Features dataset (Jett et al., 2020), but the features encoded in the GOLEM project (“Graphs and Ontologies for Literary Evolution Models”) are much richer and also refer to narrative and stylistic elements and to reader response data (e.g. characters, relationships, topics, readability, sentiment of comments received by the story, etc.) (cf. Schöch et al., 2022; Pfeffer & Roth, 2019). During this presentation, I will show the challenges faced and the decision taken with respect to the following aspects:
- developing an ontology for stories and reader response, taking into account the perspectives of both researchers and the communities of online readers, as well as cultural and linguistic differences;
- extracting structured information about narrative features from the full text of the stories;
- linking information derived from the stories with information extracted from Wikidata and other fan wikis (e.g. fandom.com);
- legal and ethical issues related to copyright and personal data, including the licensing of the database for reuse by third parties;
- possible use cases of the knowledge graph to study changes in fiction over time (Pianzola et al. 2020).
References
Boot, P. (2009). Mesotext: Digitised Emblems, Modelled Annotations and Humanities Scholarship. Amsterdam University Press.
Jett, J., Capitanu, B., Kudeki, D., Cole, T., Hu, Y., Organisciak, P., Underwood, T., Dickson Koehl, E., Dubnicek, R., & Downie, J. S. (2020). The HathiTrust Research Center Extracted Features Dataset (2.0) [Data set]. HathiTrust Research Center. https://doi.org/10.13012/R2TE-C227
OECD. (2005). Derived data element. In OECD Glossary of Statistical Terms. https://stats.oecd.org/glossary/detail.asp?ID=5130
Pfeffer, M., & Roth, M. (2019). Japanese Visual Media Graph: Providing researchers with data from enthusiast communities. Proc. Int’l Conf. on Dublin Core and Metadata Applications, 136–141.
Pianzola, F., Acerbi, A., & Rebora, S. (2020). Cultural accumulation and improvement in online fan fiction. CHR 2020: Workshop on Computational Humanities Research, November 18–20, 2020, Amsterdam, The Netherlands, 2723, 2–11. http://ceur-ws.org/Vol-2723/short8.pdf
Schöch, C., Hinzmann, M., Röttgermann, J., Dietz, K., & Klee, A. (2022). Smart Modelling for Literary History. International Journal of Humanities and Arts Computing, 16(1), 78–93. https://doi.org/10.3366/ijhac.2022.0278
Files
GOLEM_ACH_20230629.pdf
Files
(1.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:8cd351ae44d51e29575907faa472d0f2
|
1.8 MB | Preview Download |