Piloting A Machine Learning Approach to Identify English-Language Fiction in the HathiTrust Digital Library

doi:10.5281/zenodo.8107513

Published June 30, 2023 | Version v1

Conference paper Open

Piloting A Machine Learning Approach to Identify English-Language Fiction in the HathiTrust Digital Library

1. HathiTrust Research Center, Information Sciences, University of Illinois, United States of America
2. English and Information Sciences, University of Illinois, United States of America

Data managers:

Editors:

Hosting institution:

Centre for Information Modelling¹

1. University of Graz
2. Belgrade Center for Digital Humanities
3. Le Mans Université
4. Digital Humanities im deutschsprachigen Raum

In large digital libraries, such as the HathiTrust, metadata is insufficient to identify items of interest. Metadata records are often incomplete and challenging for fiction, where metadata categories, when present, are too broad. This project constructs a machine learning pipeline for fiction classification using the HTRC Extracted Features Dataset, and based on previous work from Underwood et al. We will detail the methodology, early results, and planned future work in generating this dataset.

Files

DUBNICEK_Ryan_Christopher_Piloting_A_Machine_Learning_Approa.pdf

Files (119.8 kB)

Name	Size	Download all
DUBNICEK_Ryan_Christopher_Piloting_A_Machine_Learning_Approa.pdf md5:d94879c7d16f6724723824057fe9a0c3	93.3 kB	Preview Download
DUBNICEK_Ryan_Christopher_Piloting_A_Machine_Learning_Approa.xml md5:806a4bdfc9955e2fb3376b7956089460	26.6 kB	Preview Download

Additional details

Is part of: Book: 10.5281/zenodo.7961822 (DOI)

	All versions	This version
Views	46	45
Downloads	40	39
Data volume	3.3 MB	3.2 MB

Piloting A Machine Learning Approach to Identify English-Language Fiction in the HathiTrust Digital Library

Creators

Contributors

Data managers:

Editors:

Hosting institution:

Description

Files

DUBNICEK_Ryan_Christopher_Piloting_A_Machine_Learning_Approa.pdf

Files (119.8 kB)

Additional details

Related works