Fast Stylometry
Description
Fast Stylometry is a Python library for calculating the Burrows' Delta. Burrows' Delta is an algorithm for comparing the similarity of the writing styles of documents, known as forensic stylometry.
The library can also calculate the probability that two books were by the same author.
I wrote this library to improve my understanding, and also because the existing libraries I could find were focused around generating graphs but did not go as far as calculating probabilities.
Burrows' Delta algorithm
The Burrows' delta is a statistic which expresses the distance between two authors' writing styles. A high number like 3 implies that the two authors are very dissimilar, whereas a low number like 0.2 would imply that two books are very likely to be by the same author. Explanation of the maths and thinking behind Burrows' Delta and how it works.
The Burrows' delta is calculated by comparing the relative frequencies of function words such as “inside”, “and”, etc, in the two texts, taking into account their natural variation between authors.
Files
faststylometry-main (1).zip
Files
(10.8 MB)
Name | Size | Download all |
---|---|---|
md5:dcf194870cd8d84304b43019841b1e13
|
10.8 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/fastdatascience/faststylometry
- Programming language
- Python
- Development Status
- Active