Published June 30, 2023 | Version v1
Conference paper Open

Piloting A Machine Learning Approach to Identify English-Language Fiction in the HathiTrust Digital Library

  • 1. HathiTrust Research Center, Information Sciences, University of Illinois, United States of America
  • 2. English and Information Sciences, University of Illinois, United States of America
  • 1. University of Graz
  • 2. Belgrade Center for Digital Humanities
  • 3. Le Mans Université
  • 4. Digital Humanities im deutschsprachigen Raum

Description

In large digital libraries, such as the HathiTrust, metadata is insufficient to identify items of interest. Metadata records are often incomplete and challenging for fiction, where metadata categories, when present, are too broad. This project constructs a machine learning pipeline for fiction classification using the HTRC Extracted Features Dataset, and based on previous work from Underwood et al. We will detail the methodology, early results, and planned future work in generating this dataset.

Files

DUBNICEK_Ryan_Christopher_Piloting_A_Machine_Learning_Approa.pdf

Additional details

Related works

Is part of
Book: 10.5281/zenodo.7961822 (DOI)