Image dataset to train a deep learning model to decode Leetspeak obfuscated characters

Iñaki Velez de Mendizabal; Xabier Vidriales; Vitor Basto Fernandes; Enaitz Ezpeleta; José Ramón Méndez; Urko Zurutuza

doi:10.5281/zenodo.6373423

There is a newer version of the record available.

Published March 21, 2022 | Version v1

Dataset Open

Image dataset to train a deep learning model to decode Leetspeak obfuscated characters

1. Mondragon Unibertsitatea
2. Instituto Universitário de Lisboa (ISCTE-IUL)
3. University of Vigo

The dataset contains an image database (18,981 images) that could be used to train a deep learning model to accurately detect characters. We have successfully used it to create a model that identifies characters encoded using LeetSpeak. The original dataset can be found in the Mondragon Unibertsitatea Repository -- https://gitlab.danz.eus/datasharing/ski4spam

The training dataset consists of:

- Alphabetic letters (a-z) written using different fonts and styles (regular, cursive, bold, cursive+bold)

- Handwritten letters: English handwriting from the Chars74k dataset [2] which is available at http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/.

Files

images.zip

Files (22.1 MB)

Name	Size	Download all
images.zip md5:fb400473248296b9a0497a2f072b77a9	22.1 MB	Preview Download

560

Views

Downloads

Show more details

	All versions	This version
Views	560	196
Downloads	55	26
Data volume	1.3 GB	597.4 MB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: March 21, 2022
Modified: March 22, 2022

Image dataset to train a deep learning model to decode Leetspeak obfuscated characters

Creators

Description

Files

images.zip

Files (22.1 MB)