Published October 24, 2024 | Version v1
Dataset Open

Portuguese Handwriting 16th-19th c.

  • 1. Leopold Franzens Universität für Innsbruck
  • 2. Universitätsarchiv Greifswald

Description

All data were imported from the platform Transkribus on which the AI model for automatic transcription “Portuguese Handwriting 16th-19th c.” was last trained in July 2023 with the recognition engine Pylaia, and can now be used.

The data are divided into ten folders, according to the total number of the trainings, from the initial to the definitive one, plus one set for final validation. The eight previous trainings were realized between June 2022 and May 2023. The history of all trainings can be read on e-Inquisition. Each of these folders corresponds to one collection in the platform; every collection has a number of documents; every document has a number of images, or pages, as indicated below.

The ten uploaded folders (zip) are distributed as follows:

—nine Training Sets (TS) (ca 92% of the whole data; status of the transcriptions from the TS: Ground Truth);

—the final Validation Set (VS) (ca 8% of the whole data; status of the transcriptions from the VS: Ground Truth).

All TS folders contain only the new data added to the following training (thus added to the previous data).

Only the last VS, which is complete (505 p.), is provided.

One document = images / transcribed pages (Ground Truth: transcription made by the members of TraPrInq project (Transcrever os processos da Inquisição portuguesa, 1536-1821 | Transcribing the court records of the Portuguese Inquisition, 1536-1821), which lasted from January 2023 to July 2024.

The majority of the documents are titled as follows: IL_number = document extracted from a trial record (processo) by the Inquisition of Lisbon_number of the processo; other titles: IC_ = Inquisition of Coimbra; IE_ = Inquisition of Évora.

Total of transcribed pages: 6,417.

The quality of the images in the data (jpg) is equal to that of the images used for automatic transcription.

All digitized images can be found on the catalog of the Portuguese National Archives (Arquivo Nacional da Torre do Tombo, ANTT).

Available data (10 zip files, total size 6.7 GB):

Training Set1: 698 pages/images

Training Set2: 984 pages/images

Training Set3: 869 pages/images

Training Set4: 926 pages/images

Training Set5: 631 pages/images

Training Set6: 665 pages/images

Training Set7: 564 pages/images

Training Set8: 549 pages/images

Training Set9: 531 pages/images

Validation Set_Final: 505 pages/images

2-one pdf file:

Paleographical criteria used by the team for the transcription of the documents; list of characters (in Portuguese).

Files

Training_Set_Model1.zip

Files (6.7 GB)

Name Size Download all
md5:814257a9af3cded6c7c97e8a52cbc60b
1.0 GB Preview Download
md5:922de6553f4339acf0109965c6b3a28a
1.1 GB Preview Download
md5:895f5cf3fdc16a0b5e962753ae2b0fb2
473.4 MB Preview Download
md5:139c6d512779e3b9c48d20db68715e21
911.6 MB Preview Download
md5:e668598c38c491fae92fa91efe8476d1
631.8 MB Preview Download
md5:ee908fc78affe6ae1c83989e3b4bd152
625.3 MB Preview Download
md5:fd0bab87bedfa191b70d5fabbff2c108
527.3 MB Preview Download
md5:747fbc7f76fd5305a558209d49700584
493.3 MB Preview Download
md5:2a65bb0d9f40e12e482dd3330f9c6402
452.6 MB Preview Download
md5:842164722de5da610d057b28935506c5
627.6 kB Preview Download
md5:e220fef22d0923b0e1d97b109ca6d7f7
502.4 MB Preview Download

Additional details

Additional titles

Subtitle (Portuguese)
AI Model training data

Funding

EXPL/HAR-HIS/0499/2021 – Transcribing the court records of the Portuguese Inquisition 1536-1821 EXPL/HAR-HIS/0499/2021
Fundação para a Ciência e Tecnologia

Dates

Other
2022
Other
2023