Identifying Different Writing Styles in a Document Intrinsically using Stylometric Analysis
Description
Abstract:
In this project, we developed an Artificial Intelligence (AI) that takes a document and classifies different writing styles within it using stylometric techniques. First, the document is divided into chunks of text using a standard chunk size (a chunk is comprised of a fixed number of sentences). Then for each chunk of text, a vector of stylometric features is computed. Afterward, the chunks are clustered using their stylometric feature vectors. That's where unsupervised machine learning comes into play. Using K-Means Clustering the chunks with similar styles are clustered together. The number of clusters made corresponds to the number of different writing styles that the document has. We also ran an experiment to demonstrate the validity of our approach. We fed a document containing text written by two authors as input to our system. The system was successfully able to distinguish that the document had two different writing styles by forming two distinct clusters of chunks. Our system can also be utilized in many other applied areas like authorship attribution, plagiarism detection, etc.
Notes
Files
      
        Code.zip
        
      
    
    
      
        Files
         (2.6 MB)
        
      
    
    | Name | Size | Download all | 
|---|---|---|
| md5:5cd000ed2fc445465749dbb86b703af8 | 59.2 kB | Preview Download | 
| md5:81a1910284b21b9b058c993abe56b404 | 335.4 kB | Preview Download | 
| md5:a3733c8b5985a217dece4cfd12804405 | 1.1 kB | Download | 
| md5:fbe9b0dc99d6901f37f49eda54a66eeb | 2.2 MB | Preview Download | 
| md5:db8ab4d5c9cc03861f6c12eab4e0e03b | 223 Bytes | Preview Download |