Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published June 7, 2024 | Version 1.0
Dataset Open

Muharaf-public

Description

Manuscripts of Handwritten Arabic dataset (Muharaf) for cursive text recognition.  The following files are present in this repositoriy:

  1. public_data_files.zip: Contains the public part of Muharaf dataset. It has the images and the corresponding annotation files in JSON and XML format.
  2. public_line_images.zip: Contains the line images and their corresponding transcriptions.
  3. public_summary_and_keywords.zip: Contains the summary and keywords extracted from the ground truth transcriptions of each image.
  4. sfr_files.zip: Contains the preprocessed files for the start_follow_read_arabic system for training the public part of Muharaf dataset.
  5. public_1100_untrained.zip: Contains an initiailized trial folder with 3 different random splits of (train, validation, test) to reproduce the experiments reported in the paper on Muharaf-public. 
  6. public_1100_trained.zip: Contains the results and models weights after training on Muharaf-public. It has results of three different random splits of (train, validation, test) sets.
  7. trial_15_untrained.zip: Contains an intialized trial folder with 3 different random splits of (train, validation, test) to reproduce the experiments reported in the paper on training all the files of Muharaf dataset (1500 training images). 
  8. trial_15.zip: Contains the results and model weights after training on Muharaf. It has results of three different random splits of (train, validation, test) sets.

Files

public_1100_trained.zip

Files (10.6 GB)

Name Size Download all
md5:923f1f87e924da6694c6b145ca44011f
742.3 MB Preview Download
md5:6b54aeada83897392038f31cd15a45ce
96.5 kB Preview Download
md5:27322f0d2424a99a83ce69fdd2abba90
3.1 GB Preview Download
md5:cd697cded99887b89a83620e83cd9a32
1.3 GB Preview Download
md5:d899652881c25cdcced00a16dd0b40e7
791.5 kB Preview Download
md5:c1386676775c33318a6517978eca8c16
4.6 GB Preview Download
md5:1fdcd7a5a5fd3f258e16e27061dccd24
739.3 MB Preview Download
md5:f7bff5ca281385a57e4d4b3ff8dc2317
76.0 kB Preview Download

Additional details

Funding

A More Complete History of America: Developing Arabic OCR to tell the story of Arabs in America ZPA-283823-22
National Endowment for the Humanities