Published June 30, 2023
| Version v1
Conference paper
Open
Piloting A Machine Learning Approach to Identify English-Language Fiction in the HathiTrust Digital Library
Creators
- 1. HathiTrust Research Center, Information Sciences, University of Illinois, United States of America
- 2. English and Information Sciences, University of Illinois, United States of America
Contributors
Data managers:
Hosting institution:
- 1. University of Graz
- 2. Belgrade Center for Digital Humanities
- 3. Le Mans Université
- 4. Digital Humanities im deutschsprachigen Raum
Description
In large digital libraries, such as the HathiTrust, metadata is insufficient to identify items of interest. Metadata records are often incomplete and challenging for fiction, where metadata categories, when present, are too broad. This project constructs a machine learning pipeline for fiction classification using the HTRC Extracted Features Dataset, and based on previous work from Underwood et al. We will detail the methodology, early results, and planned future work in generating this dataset.
Files
DUBNICEK_Ryan_Christopher_Piloting_A_Machine_Learning_Approa.pdf
Files
(119.8 kB)
Name | Size | Download all |
---|---|---|
md5:d94879c7d16f6724723824057fe9a0c3
|
93.3 kB | Preview Download |
md5:806a4bdfc9955e2fb3376b7956089460
|
26.6 kB | Preview Download |
Additional details
Related works
- Is part of
- Book: 10.5281/zenodo.7961822 (DOI)