Published May 29, 2023 | Version 2
Dataset Open

The Authorship of Stephen King's Books Written Under the Pseudonym "Richard Bachman": A Stylometric Analysis (data)

  • 1. Karel De Grote University
  • 2. University of Antwerp

Description

This data accompanies a paper for the 2nd Annual Conference for Computational Literary Studies: "The Authorship of Stephen King’s Books Written Under the Pseudonym 'Richard Bachman': A Stylometric Analysis".

Abstract:

Between 1977 and 1984, Stephen King published five novels under the pseudonym “Richard Bachman”. Reviewers noted similarities between King’s and Bachman’s writing styles when Thinner (1984) was published, ultimately leading to King’s unmasking. We investigate, using the Juola protocol, whether computational techniques can correctly identify King as the author of the Bachman books out of a selection of contemporary candidate authors – Dean Koontz, Peter Straub, and Thomas Harris. We also perform a post-hoc analysis of the use of pop-culture references and brand names in Bachman, King, Koontz, Straub, and Harris novels, based on comments in reviews of Bachman and King novels. The references extracted from the Bachman books occurred significantly more often in King’s texts than in the others’, showing that attentive readers could have “heard King’s voice” in the Bachman books through what a reviewer denigratingly called King’s “compulsion to list brand-name products and his affinity for pop-cult teenage junk”. These results contribute to the vexed issue of explainability, which is a recurrent challenge in author identification for literary texts.

 

Below is a description of each file in this repository:

bachman_segments_features_array_1000token_segments.csvbachman_segments_features_array_5000token_segments.csv, and bachman_segments_features_array_10000token_segments.csv contain the feature spaces created by vectorizing 1,000-, 5,000-, and 10,000-token segments of Bachman, King, Koontz, and Straub books. Each row of the csv files contains the vectorized segment, the segment's author, the book the segment was drawn from, the book's publication date, and the segment number. 

 

bachman_segments_author_candidate_cosine_distances_1000token_segments.csvbachman_segments_author_candidate_cosine_distances_5000token_segments.csv,and bachman_segments_author_candidate_cosine_distances_10000token_segments.csv contain the Bachman segment number, bootstrap iteration number (from 0 and 9,999), the distractor author of the randomly-sampled segment, and the cosine distance between the Bachman segment vector and the distractor author's randomly-sampled segment vector (calculated using the data stored in the bachman_segments_features_array_1000token_segments.csv, bachman_segments_features_array_5000token_segments.csv, and bachman_segments_features_array_10000token_segments.csv files).

 

bachman_segments_author_candidate_ranks_1000token_segments.csvbachman_segments_author_candidate_ranks_5000token_segments.csv,and bachman_segments_author_candidate_ranks_10000token_segments.csv contain the same columns as the 3 files described in the previous paragraph, but the cosine distance between Bachman segment and distractor author segment is converted to a ranking. For each bootstrap iteration there are 4 (one for each candidate author) rows containing the distance ranking between the Bachman segment and a candidate author segment. In a particular bootstrap iteration, if a King segment had the smallest cosine distance to a Bachman segment, King has the ranking "1", and if a Koontz segment had the second smallest distance to a Bachman segment, Koontz has the ranking "2", and so on. 

 

predicted_author_candidate_raw_counts_1000token_segments.csv, predicted_author_candidate_raw_counts_5000token_segments.csv, and predicted_author_candidate_raw_counts_10000token_segments.csv contain the total number of times King, Straub, Harris, and Koontz segments received a certain distance ranking in the files described in the previous paragraph. 

 

predicted_author_candidate_proportions_1000token_segments.csv, predicted_author_candidate_proportions_5000token_segments.csv, and and predicted_author_candidate_proportions_10000token_segments.csv contain a Bachman book title, and percentage of that book's segments that received the distance rankings 1-4 of each author. For example, in predicted_author_candidate_proportions_10000token_segments.csv, The Long Walk's segments were ranked as most similar (rank= "1") to King segments in 73.3% of bootstrap iterations. 

 

pop_culture_refs_counts_books_10000token_segments.csv contains the author and book title of a randomly-sampled 10,000-token segment from the aforementioned book, the iteration (from 0 to 99), and the number of pop culture references found in the segment that match those extracted from Bachman books.

 

 

 

Notes

Updated pop culture referencing counting experiment results to exclude names of brands extracted from Bachman books that were names of characters in other books - e.g., "Sears" (a character in Peter Straub's novel Ghost Story), Mace (a character in Dean Koontz' Warlock), etc.

Files

bachman_segments_author_candidate_cosine_distances_10000token_segments.csv

Files (8.8 GB)

Name Size Download all
md5:35621b57f90143f2519f39918266a22f
94.9 MB Preview Download
md5:c399a74b3648b03e35e719ce17424a7f
1.1 GB Preview Download
md5:13486cb8a6eda6bc9b791d717edb097c
198.7 MB Preview Download
md5:fa13cc6c95cf5fe5b7ab5c1bd5053a5f
14.1 MB Preview Download
md5:767c604b3a8f1faf3a6a306d06b9925d
155.0 MB Preview Download
md5:09643e6e569b9eef33262a470a83ef44
29.4 MB Preview Download
md5:58b8a1948593b87ad07c24b8be4d6f20
783.5 MB Preview Download
md5:5e3866cc8700241e16217ae92a146f19
5.1 GB Preview Download
md5:c242fab7fa067f8a54b5b7be999f4a80
1.4 GB Preview Download
md5:59311a193cf2e329c198ac6ef8c73f82
1.1 kB Preview Download
md5:3b0b0e482da6b9e9b21e2e292acb4768
1.1 kB Preview Download
md5:93fbfb8fb770c22862d78a2ffffa3319
1.1 kB Preview Download
md5:a88dece82972fb3fc5cd3e6434e10c50
150 Bytes Preview Download
md5:982717ab5c82029e02cfcef01cb8db9d
170 Bytes Preview Download
md5:cb16cb6b27a1928c2d1e909b8d46576a
154 Bytes Preview Download