Comparing Feature Engineering Techniques for Time Period Categorization of Novels
Description
It is possible to improve item accessibility in online catalogues by categorizing literary works by time periods, i.e., a story based on when it is playing out, such as in the Middle Ages or World War I. Unfortunately, very few literary works are categorized by time period, resulting in limited accessibility. In addition, it is time - consuming for catalogers to create time periods manually. Thus, there is a need to investigate machine learning techniques for categorizing time periods. Consequently, this paper aims to investigate and evaluate the accuracy of three machine learning algorithms (Latent Dirichlet Allocation, TF-IDF, and Word embedding using SBERT) for categorizing literary works by historical period. The data consists of 35 works of historical fiction from Litteraturbanken written in Swedish. These techniques were analyzed using quasi-experiments, and their accuracy was evaluated using F1-score. The results of the evaluations demonstrate that TF-IDF outperforms both Latent Dirichlet Allocation and Word embedding.
Files
2023-07-25-slides-ISKOUK-Conference-FWestin.pdf
Files
(602.5 kB)
Name | Size | Download all |
---|---|---|
md5:3a2e912a120cabb93c85c0101b6bdc6d
|
602.5 kB | Preview Download |