Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published July 16, 2003 | Version 0.0
Dataset Open

The ICDAR 2003 Informal Competition for the Recognition of On-line Words: The Unipen-ICROW-03 benchmark set - Version 0.0

Description

Proposal for an informal benchmark on word recognition. See for the related ImUnipen collection
of word images from on-line vectorial handwriting data: https://zenodo.org/record/1195059

At the time (ICDAR 2003) there was not a lot of interest so the project was not pursued.

Lambert Schomaker - February 2023

_______________________________________________________________________________

The ICDAR 2003 Informal Competition for the Recognition of On-line Words:
               The Unipen-ICROW-03 benchmark set 
               Version 0.0

Lambert Schomaker / International Unipen Foundation

The ICROW suite of test files for the recognition of isolated on-line
free-style (handprint, mixed and cursive) words has been
composed. Different tablets, nationalities and languages
are involved. Only the ASCII set is used within word labels.

The set contains:

   13119 written words
     884 unique lexical word entries
      72 writers 

Language: Dutch, English, Italian.
Nationalities: Dutch, Irish, Italian, + mixed

The benchmark test is a good estimator for 
"walk-up" recognition performance.

[Note: some of the writers (NIC-Pc95*.dat set) are present in the
UNIPEN R01/V07 distribution, but the actual words are unseen 
outside of the Int. Unipen Foundation.]

Please note the Copyright notice in the 
accompanying file 'Copyright'

Wed Jul 16 21:20:10 CEST 2003

Lambert Schomaker

---------------------------------------------------------------------------

Instructions for the ICDAR 2003 informal competition for
the recognition of on-line words.

1 - unpack the .tgz file
2 - use the UNIPEN files as input for your recognizer.
3 - report, for each writer, a file <writer-id>.res

  Example: do-my-recognizer < NIC-Hi93b-marc.dat > NIC-Hi93b-marc.res

Format of the .res file.

No XML for this moment: simplicity does it.

We assume that the recognizer is able to produce a top-10 list
of likely words, sorted from most likely to least likely.
The output for each word is on a single line. The correct
target word is in the first column.

<targetword 1> <best word hyp.> <2nd-best word hyp.> ... <10th-best word hyp>
<targetword 2> <best word hyp.> <2nd-best word hyp.> ... <10th-best word hyp>

Example with two words:

summertime   slumbertime slipknot summertime somatome spumante simulative semitone schoolmate sermonette semimature
Aberdeen     Adamson Aberdeen Addison Armageddon Abyssinian Araban Albanian Alabamian Abraham Adelaide


4 - pack the  *.res files in a .tgz or .zip file and send them
    to schomaker@ai.rug.nl
    All *.dat files need to be processed.

LS.
 

 

Files

_README.txt

Files (32.1 MB)

Name Size Download all
md5:0fbc16c60e4b760decec6ae98d3fce4e
2.2 kB Preview Download
md5:04d218d6d018f23b94f4af6ea9ac1048
2.0 kB Preview Download
md5:f751a14a4322b4f42ee15f157bb31177
440.4 kB Download
md5:cfe6b108b873c14ea7c6a5ff71146fb4
688.3 kB Preview Download
md5:fd39fd6531b4a20cd2c9edc526af6311
31.0 MB Download