Published March 21, 2022 | Version v1
Dataset Open

Set of obfuscated spam dataset by using LeetSpeak transformations

  • 1. Mondragon Unibertsitatea
  • 2. Instituto Universitário de Lisboa (ISCTE-IUL)
  • 3. University of Vigo

Description

The usage of LeetSpeak and other text hiding tricks is often used by spammers in the distribution of unsolicited contents. To evaluate deobfuscation techniques and their impact on spam content classification, we preprocessed several popular public datasets to partially obfuscate the text. The datasets transformed are:

Files

corpora.zip

Files (42.5 MB)

Name Size Download all
md5:e343d92e9cb2deebf2ffc795cfc3c8d0
42.5 MB Preview Download