Published July 25, 2023 | Version v1
Presentation Open

Comparing Feature Engineering Techniques for Time Period Categorization of Novels

  • 1. University of Borås, Sweden


It is possible to improve item accessibility in online catalogues by categorizing literary works by time periods, i.e., a story based on when it is playing out, such as in the Middle Ages or World War I. Unfortunately, very few literary works are categorized by time period, resulting in limited accessibility. In addition, it is time - consuming for catalogers to create time periods manually. Thus, there is a need to investigate machine learning techniques for categorizing time periods. Consequently, this paper aims to investigate and evaluate the accuracy of three machine learning algorithms (Latent Dirichlet Allocation, TF-IDF, and Word embedding using SBERT) for categorizing literary works by historical period. The data consists of 35 works of historical fiction from Litteraturbanken written in Swedish. These techniques were analyzed using quasi-experiments, and their accuracy was evaluated using F1-score. The results of the evaluations demonstrate that TF-IDF outperforms both Latent Dirichlet Allocation and Word embedding.



Files (602.5 kB)

Name Size Download all
602.5 kB Preview Download