The ICDAR 2003 Informal Competition for the Recognition of On-line Words: The Unipen-ICROW-03 benchmark set - Version 0.0

Lambert Schomaker

doi:10.5281/zenodo.7631142

Published July 16, 2003 | Version 0.0

Dataset Open

The ICDAR 2003 Informal Competition for the Recognition of On-line Words: The Unipen-ICROW-03 benchmark set - Version 0.0

Lambert Schomaker

Proposal for an informal benchmark on word recognition. See for the related ImUnipen collection
of word images from on-line vectorial handwriting data: https://zenodo.org/record/1195059

At the time (ICDAR 2003) there was not a lot of interest so the project was not pursued.

Lambert Schomaker - February 2023

_______________________________________________________________________________

The ICDAR 2003 Informal Competition for the Recognition of On-line Words:
The Unipen-ICROW-03 benchmark set
Version 0.0

Lambert Schomaker / International Unipen Foundation

The ICROW suite of test files for the recognition of isolated on-line
free-style (handprint, mixed and cursive) words has been
composed. Different tablets, nationalities and languages
are involved. Only the ASCII set is used within word labels.

The set contains:

13119 written words
884 unique lexical word entries
72 writers

Language: Dutch, English, Italian.
Nationalities: Dutch, Irish, Italian, + mixed

The benchmark test is a good estimator for
"walk-up" recognition performance.

[Note: some of the writers (NIC-Pc95*.dat set) are present in the
UNIPEN R01/V07 distribution, but the actual words are unseen
outside of the Int. Unipen Foundation.]

Please note the Copyright notice in the
accompanying file 'Copyright'

Wed Jul 16 21:20:10 CEST 2003

Lambert Schomaker

---------------------------------------------------------------------------

Instructions for the ICDAR 2003 informal competition for
the recognition of on-line words.

1 - unpack the .tgz file
2 - use the UNIPEN files as input for your recognizer.
3 - report, for each writer, a file <writer-id>.res

Example: do-my-recognizer < NIC-Hi93b-marc.dat > NIC-Hi93b-marc.res

Format of the .res file.

No XML for this moment: simplicity does it.

We assume that the recognizer is able to produce a top-10 list
of likely words, sorted from most likely to least likely.
The output for each word is on a single line. The correct
target word is in the first column.

Example with two words:

summertime slumbertime slipknot summertime somatome spumante simulative semitone schoolmate sermonette semimature
Aberdeen Adamson Aberdeen Addison Armageddon Abyssinian Araban Albanian Alabamian Abraham Adelaide

4 - pack the *.res files in a .tgz or .zip file and send them
to schomaker@ai.rug.nl
All *.dat files need to be processed.

LS.

Files

_README.txt

Files (32.1 MB)

Name	Size	Download all
_README.txt md5:0fbc16c60e4b760decec6ae98d3fce4e	2.2 kB	Preview Download
Copyright.txt md5:04d218d6d018f23b94f4af6ea9ac1048	2.0 kB	Preview Download
ICROW-2003-rev.odp md5:f751a14a4322b4f42ee15f157bb31177	440.4 kB	Download
ICROW-2003-rev.pdf md5:cfe6b108b873c14ea7c6a5ff71146fb4	688.3 kB	Preview Download
unipen-ICROW-2003.tgz md5:fd39fd6531b4a20cd2c9edc526af6311	31.0 MB	Download

	All versions	This version
Views	47	47
Downloads	33	33
Data volume	142.1 MB	142.1 MB

The ICDAR 2003 Informal Competition for the Recognition of On-line Words: The Unipen-ICROW-03 benchmark set - Version 0.0

Creators

Description

Files

_README.txt

Files (32.1 MB)