Published March 21, 2022 | Version v2
Dataset Open

Image dataset to train a deep learning model to decode Leetspeak obfuscated characters

  • 1. Mondragon Unibertsitatea
  • 2. Instituto Universitário de Lisboa (ISCTE-IUL)
  • 3. University of Vigo

Description

The dataset contains an image database (18,981 images) that could be used to train a deep learning model to accurately detect characters. We have successfully used it to create a model that identifies characters encoded using LeetSpeak. The original dataset can be found in the Mondragon Unibertsitatea Repository -- https://gitlab.danz.eus/datasharing/ski4spam

The training dataset consists of:

- Alphabetic letters (a-z) written using different fonts and styles (regular, cursive, bold, cursive+bold)

- Handwritten letters: English handwriting from the Chars74k dataset [2] which is available at http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/.

Files

images.zip

Files (21.5 MB)

Name Size Download all
md5:c5a5c59e2a6a59f27932b0c6bff51618
21.5 MB Preview Download