Dataset for paper "Sameness entices, but novelty enchants in fanfiction online"
Description
Dataset for paper "Sameness entices, but novelty enchants in fanfiction online", a dataset of fanfiction collected from the platform Archive Of Our Own (AO3). Please refer to the paper for dataset details.
As authors on AO3 retain all rights for their works, we choose not to redistribute the fanfiction texts outside of AO3. Instead, for each fandom we analyze, we share the URLs of the works we collected in the fandom. Users can refer to this snippet for downloading contents from AO3 using the URLs. The dataset contains the following fields:
'URL'
We also share a derived dataset of metadata of the fanfiction, along with the LDA topic distribution from fitting an LDA model using the fanfiction, and the Jensen-Shannon divergence value between each fiction and the center (see paper for details). These fields can be used to reproduce results in the paper. The dataset contains the following fields:
'AdditionalTags'
'ArchiveWarnings'
'Author'
'Category'
'Chapters'
'Characters'
'Fandoms'
'Kudos'
'Language'
'Rating'
'Relationship'
'Title'
'Words'
'PublishDate'
'UpdateDate'
'CompleteDate'
'Comments'
'Hits'
'Bookmarks'
'URL'
'Dist': LDA topic distribution
'JSD': Jensen-Shannon divergence value
Additional code for analysis can be found in this Github repo.
Please direct questions about the dataset to the corresponding author of the paper.
Files
dataset_URLs.zip
Files
(206.2 MB)
Name | Size | Download all |
---|---|---|
md5:ced179a914c5c7f8a5ab5f19f856e095
|
203.0 MB | Preview Download |
md5:5afae95249db7e00b57a4b85bc362c54
|
3.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Journal article: arXiv:1904.07741 (arXiv)
Software
- Repository URL
- https://github.com/yzjing/ao3/
- Programming language
- Python