There is a newer version of the record available.

Published February 6, 2026 | Version v2
Dataset Open

A Dataset of American Poetry by Poets from Historically Underrepresented Groups in the HathiTrust Digital Library

  • 1. ROR icon Indiana University
  • 2. ROR icon University of Illinois Urbana-Champaign

Description

This dataset provides American poetry data with poem-level page boundaries from selected poetry collections in the HathiTrust Digital Library. It encompasses 9,321 poems from 113 collections by American poets from historically underrepresented groups, including African Americans, Asian Americans, Pacific Islanders, Latin Americans, and Native Americans. Each CSV file represents each poetry collection, and each poem is identified by its start and end page numbers in HathiTrust. This dataset can be used for various computational analyses, including word frequency, topic modeling, word embeddings, and comparative analysis across poems from diverse communities.

Files

htrc_sections_updated.zip

Files (128.0 kB)

Name Size Download all
md5:d27d49a5c6c8c2c5255dc055cc2fdba6
128.0 kB Preview Download

Additional details

Funding

Institute of Museum and Library Services
Laura Bush 21st Century Librarian Program RE-252382-OLS-22