Published July 3, 2018 | Version v1
Report Open

Identifying Different Writing Styles in a Document Intrinsically using Stylometric Analysis

  • 1. National University of Computer and Emerging Sciences.

Description

Abstract:

In this project, we developed an Artificial Intelligence (AI) that takes a document and classifies different writing styles within it using stylometric techniques. First, the document is divided into chunks of text using a standard chunk size (a chunk is comprised of a fixed number of sentences). Then for each chunk of text, a vector of stylometric features is computed. Afterward, the chunks are clustered using their stylometric feature vectors. That's where unsupervised machine learning comes into play. Using K-Means Clustering the chunks with similar styles are clustered together. The number of clusters made corresponds to the number of different writing styles that the document has. We also ran an experiment to demonstrate the validity of our approach. We fed a document containing text written by two authors as input to our system. The system was successfully able to distinguish that the document had two different writing styles by forming two distinct clusters of chunks. Our system can also be utilized in many other applied areas like authorship attribution, plagiarism detection, etc.

Notes

The complete code and detailed documentation is available on the attached Github Link: https://github.com/harismuneer/Writing-Styles-Classification-Using-Stylometric-Analysis

Files

Code.zip

Files (2.6 MB)

Name Size Download all
md5:5cd000ed2fc445465749dbb86b703af8
59.2 kB Preview Download
md5:81a1910284b21b9b058c993abe56b404
335.4 kB Preview Download
md5:a3733c8b5985a217dece4cfd12804405
1.1 kB Download
md5:fbe9b0dc99d6901f37f49eda54a66eeb
2.2 MB Preview Download
md5:db8ab4d5c9cc03861f6c12eab4e0e03b
223 Bytes Preview Download

Additional details