Published July 25, 2023 | Version v1
Presentation Open

Comparing Feature Engineering Techniques for Time Period Categorization of Novels

  • 1. University of Borås, Sweden

Description

It is possible to improve item accessibility in online catalogues by categorizing literary works by time periods, i.e., a story based on when it is playing out, such as in the Middle Ages or World War I. Unfortunately, very few literary works are categorized by time period, resulting in limited accessibility. In addition, it is time - consuming for catalogers to create time periods manually. Thus, there is a need to investigate machine learning techniques for categorizing time periods. Consequently, this paper aims to investigate and evaluate the accuracy of three machine learning algorithms (Latent Dirichlet Allocation, TF-IDF, and Word embedding using SBERT) for categorizing literary works by historical period. The data consists of 35 works of historical fiction from Litteraturbanken written in Swedish. These techniques were analyzed using quasi-experiments, and their accuracy was evaluated using F1-score. The results of the evaluations demonstrate that TF-IDF outperforms both Latent Dirichlet Allocation and Word embedding.

Files

2023-07-25-slides-ISKOUK-Conference-FWestin.pdf

Files (602.5 kB)

Name Size Download all
md5:3a2e912a120cabb93c85c0101b6bdc6d
602.5 kB Preview Download