Image dataset to train a deep learning model to decode Leetspeak obfuscated characters

Iñaki Velez de Mendizabal; Xabier Vidriales; Vitor Basto Fernandes; Enaitz Ezpeleta; José Ramón Méndez; Urko Zurutuza

doi:10.5281/zenodo.6373558

Published March 21, 2022 | Version v2

Dataset Open

Image dataset to train a deep learning model to decode Leetspeak obfuscated characters

1. Mondragon Unibertsitatea
2. Instituto Universitário de Lisboa (ISCTE-IUL)
3. University of Vigo

The dataset contains an image database (18,981 images) that could be used to train a deep learning model to accurately detect characters. We have successfully used it to create a model that identifies characters encoded using LeetSpeak. The original dataset can be found in the Mondragon Unibertsitatea Repository -- https://gitlab.danz.eus/datasharing/ski4spam

The training dataset consists of:

- Alphabetic letters (a-z) written using different fonts and styles (regular, cursive, bold, cursive+bold)

- Handwritten letters: English handwriting from the Chars74k dataset [2] which is available at http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/.

Files

images.zip

Files (21.5 MB)

Name	Size	Download all
images.zip md5:c5a5c59e2a6a59f27932b0c6bff51618	21.5 MB	Preview Download

768

Views

100

Downloads

Show more details

	All versions	This version
Views	768	510
Downloads	100	55
Data volume	2.2 GB	1.2 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: March 21, 2022
Modified: March 22, 2022

Image dataset to train a deep learning model to decode Leetspeak obfuscated characters

Authors/Creators

Description

Files

images.zip

Files (21.5 MB)