Published September 18, 2011 | Version v1
Dataset Open

ICDAR 2011 - French Handwriting Recognition Competition - Line snippets

  • 1. DGA (at the time of the publication)
  • 2. Institute for Communications Technology (IfN), Technische Universitaet Braunschweig (at the time of the publication)
  • 3. Mitek Systems, Inc.

Description

This record contains the dataset used by the French handwriting recognition competition held at ICDAR 2011. It contains the line snippets for the second task (see below).

It is a subset of the RIMES-database (Reconnaissance et Indexation de données Manuscrites et de fac similÉS / Recognition and Indexing of handwritten documents and faxes). RIMES comprises handwritten correspondence letters, in French, “sent” by individuals to companies or administrations; all correspondence is fictitious and there is no PII in the records. There are from 2 to 3 items in each correspondence, a letter, a questionnaire and an optional fax.

The main body from the first 1000  letters was segmented into lines and then isolated words to be used in the competition.

This competition featured two tasks:

1. recognizing isolated snippets of words with the help of a given dictionary

2. recognizing blocks of words segmented into lines.

 

For text lines, the dataset contains 12111 text line images and 12107 transcriptions; this accounts for more than 87k words instances.

Target transcriptions are provided as a separate text file encoded in UTF-8. Images are JPEG-encoded, in grayscale.

Standard training (10188 images), validation (1138 images) and test (778 images) splits are provided.

 

The RIMES database was originally collected and prepared in 2007 by the following partners: DGA/CTA/DT/GIP - CEP Arcueil; TSP – ARTEMIS Télécom SudParis; and A2iA SA, as part of the Techno-Vision project. This project was funded by the French ministries for Research and Defense (Ministère de la Recherche and Ministère de la Défense). After the acquisition of A2iA SA in September 2018,  Mitek Systems, Inc became a legal owner of the dataset, and decided to release it publicly – which was one of the objectives of the project after its conclusion – under a permissive license in 2024, to encourage open science.

 

Files

RIMES-2011-Lines .zip

Files (382.5 MB)

Name Size Download all
md5:42983681a1d5b5b778481c54f09419f9
382.5 MB Preview Download

Additional details

References

  • ICDAR 2011 - French Handwriting Recognition Competition - Line snippets