The Authorship of Stephen King's Books Written Under the Pseudonym "Richard Bachman": A Stylometric Analysis (data)
- 1. Karel De Grote University
- 2. University of Antwerp
Description
This data accompanies a paper for the 2nd Annual Conference for Computational Literary Studies: "The Authorship of Stephen King’s Books Written Under the Pseudonym 'Richard Bachman': A Stylometric Analysis".
Abstract:
Between 1977 and 1984, Stephen King published five novels under the pseudonym “Richard Bachman”. Reviewers noted similarities between King’s and Bachman’s writing styles when Thinner (1984) was published, ultimately leading to King’s unmasking. We investigate, using the Juola protocol, whether computational techniques can correctly identify King as the author of the Bachman books out of a selection of contemporary candidate authors – Dean Koontz, Peter Straub, and Thomas Harris. We also perform a post-hoc analysis of the use of pop-culture references and brand names in Bachman, King, Koontz, Straub, and Harris novels, based on comments in reviews of Bachman and King novels. The references extracted from the Bachman books occurred significantly more often in King’s texts than in the others’, showing that attentive readers could have “heard King’s voice” in the Bachman books through what a reviewer denigratingly called King’s “compulsion to list brand-name products and his affinity for pop-cult teenage junk”. These results contribute to the vexed issue of explainability, which is a recurrent challenge in author identification for literary texts.
Below is a description of each file in this repository:
bachman_segments_features_array_1000token_segments.csv, bachman_segments_features_array_5000token_segments.csv, and bachman_segments_features_array_10000token_segments.csv contain the feature spaces created by vectorizing 1,000-, 5,000-, and 10,000-token segments of Bachman, King, Koontz, and Straub books. Each row of the csv files contains the vectorized segment, the segment's author, the book the segment was drawn from, the book's publication date, and the segment number.
bachman_segments_author_candidate_cosine_distances_1000token_segments.csv, bachman_segments_author_candidate_cosine_distances_5000token_segments.csv,and bachman_segments_author_candidate_cosine_distances_10000token_segments.csv contain the Bachman segment number, bootstrap iteration number (from 0 and 9,999), the distractor author of the randomly-sampled segment, and the cosine distance between the Bachman segment vector and the distractor author's randomly-sampled segment vector (calculated using the data stored in the bachman_segments_features_array_1000token_segments.csv, bachman_segments_features_array_5000token_segments.csv, and bachman_segments_features_array_10000token_segments.csv files).
bachman_segments_author_candidate_ranks_1000token_segments.csv, bachman_segments_author_candidate_ranks_5000token_segments.csv,and bachman_segments_author_candidate_ranks_10000token_segments.csv contain the same columns as the 3 files described in the previous paragraph, but the cosine distance between Bachman segment and distractor author segment is converted to a ranking. For each bootstrap iteration there are 4 (one for each candidate author) rows containing the distance ranking between the Bachman segment and a candidate author segment. In a particular bootstrap iteration, if a King segment had the smallest cosine distance to a Bachman segment, King has the ranking "1", and if a Koontz segment had the second smallest distance to a Bachman segment, Koontz has the ranking "2", and so on.
predicted_author_candidate_raw_counts_1000token_segments.csv, predicted_author_candidate_raw_counts_5000token_segments.csv, and predicted_author_candidate_raw_counts_10000token_segments.csv contain the total number of times King, Straub, Harris, and Koontz segments received a certain distance ranking in the files described in the previous paragraph.
predicted_author_candidate_proportions_1000token_segments.csv, predicted_author_candidate_proportions_5000token_segments.csv, and and predicted_author_candidate_proportions_10000token_segments.csv contain a Bachman book title, and percentage of that book's segments that received the distance rankings 1-4 of each author. For example, in predicted_author_candidate_proportions_10000token_segments.csv, The Long Walk's segments were ranked as most similar (rank= "1") to King segments in 73.3% of bootstrap iterations.
pop_culture_refs_counts_books_10000token_segments.csv contains the author and book title of a randomly-sampled 10,000-token segment from the aforementioned book, the iteration (from 0 to 99), and the number of pop culture references found in the segment that match those extracted from Bachman books.
Notes
Files
bachman_segments_author_candidate_cosine_distances_10000token_segments.csv
Files
(8.8 GB)
Name | Size | Download all |
---|---|---|
md5:35621b57f90143f2519f39918266a22f
|
94.9 MB | Preview Download |
md5:c399a74b3648b03e35e719ce17424a7f
|
1.1 GB | Preview Download |
md5:13486cb8a6eda6bc9b791d717edb097c
|
198.7 MB | Preview Download |
md5:fa13cc6c95cf5fe5b7ab5c1bd5053a5f
|
14.1 MB | Preview Download |
md5:767c604b3a8f1faf3a6a306d06b9925d
|
155.0 MB | Preview Download |
md5:09643e6e569b9eef33262a470a83ef44
|
29.4 MB | Preview Download |
md5:58b8a1948593b87ad07c24b8be4d6f20
|
783.5 MB | Preview Download |
md5:5e3866cc8700241e16217ae92a146f19
|
5.1 GB | Preview Download |
md5:c242fab7fa067f8a54b5b7be999f4a80
|
1.4 GB | Preview Download |
md5:59311a193cf2e329c198ac6ef8c73f82
|
1.1 kB | Preview Download |
md5:3b0b0e482da6b9e9b21e2e292acb4768
|
1.1 kB | Preview Download |
md5:93fbfb8fb770c22862d78a2ffffa3319
|
1.1 kB | Preview Download |
md5:a88dece82972fb3fc5cd3e6434e10c50
|
150 Bytes | Preview Download |
md5:982717ab5c82029e02cfcef01cb8db9d
|
170 Bytes | Preview Download |
md5:cb16cb6b27a1928c2d1e909b8d46576a
|
154 Bytes | Preview Download |