Published May 3, 2025 | Version 0
Dataset Open

Dataset for paper "Sameness entices, but novelty enchants in fanfiction online"

Description

Dataset for paper "Sameness entices, but novelty enchants in fanfiction online", a dataset of fanfiction collected from the platform Archive Of Our Own (AO3). Please refer to the paper for dataset details.

As authors on AO3 retain all rights for their works, we choose not to redistribute the fanfiction texts outside of AO3. Instead, for each fandom we analyze, we share the URLs of the works we collected in the fandom. Users can refer to this snippet for downloading contents from AO3 using the URLs. The dataset contains the following fields:

'URL'

We also share a derived dataset of metadata of the fanfiction, along with the LDA topic distribution from fitting an LDA model using the fanfiction, and the Jensen-Shannon divergence value between each fiction and the center (see paper for details). These fields can be used to reproduce results in the paper. The dataset contains the following fields:

'AdditionalTags'

'ArchiveWarnings'

'Author'

'Category'

'Chapters'

'Characters'

'Fandoms'

'Kudos'

'Language'

'Rating'

'Relationship'

'Title'

'Words'

'PublishDate'

'UpdateDate'

'CompleteDate'

'Comments'

'Hits'

'Bookmarks'

'URL'

'Dist': LDA topic distribution

'JSD': Jensen-Shannon divergence value

Additional code for analysis can be found in this Github repo.

Please direct questions about the dataset to the corresponding author of the paper.

Files

dataset_URLs.zip

Files (206.2 MB)

Name Size Download all
md5:ced179a914c5c7f8a5ab5f19f856e095
203.0 MB Preview Download
md5:5afae95249db7e00b57a4b85bc362c54
3.1 MB Preview Download

Additional details

Related works

Is supplement to
Journal article: arXiv:1904.07741 (arXiv)

Software

Repository URL
https://github.com/yzjing/ao3/
Programming language
Python