Portuguese Handwriting 16th-19th c.
Creators
- Baudry, Hervé (Project leader)1
- Pedro, Susana Tavares (Project leader)2
- de Campos, Marize Helena (Project member)3
- Soares Fatela, Mário (Project member)2
- Garcia, Leonor Dias (Project member)4
- Paulo, Jorge Ferreira (Project member)5
- Pereira, Maria Olinda Alves (Project member)6
- Salvador, Natalia Casagrande (Project member)1
- Severs, Suzana Maria de Sousa Santos (Project member)7
- Dias da Silva, Ana Margarida (Project member)8
- 1. Universidade Nova de Lisboa
- 2. Universidade de Lisboa
- 3. Universidade Federal do Maranhão
- 4. Universidade de Évora
- 5. Ministério da Educação
- 6. Arquivo Distrital de Viana do Castelo
- 7. Universidade do Estado da Bahia
- 8. Universidade de Coimbra
Contributors
Others:
- 1. Leopold Franzens Universität für Innsbruck
- 2. Universitätsarchiv Greifswald
Description
All data were imported from the platform Transkribus on which the AI model for automatic transcription “Portuguese Handwriting 16th-19th c.” was last trained in July 2023 with the recognition engine Pylaia, and can now be used.
The data are divided into ten folders, according to the total number of the trainings, from the initial to the definitive one, plus one set for final validation. The eight previous trainings were realized between June 2022 and May 2023. The history of all trainings can be read on e-Inquisition. Each of these folders corresponds to one collection in the platform; every collection has a number of documents; every document has a number of images, or pages, as indicated below.
The ten uploaded folders (zip) are distributed as follows:
—nine Training Sets (TS) (ca 92% of the whole data; status of the transcriptions from the TS: Ground Truth);
—the final Validation Set (VS) (ca 8% of the whole data; status of the transcriptions from the VS: Ground Truth).
All TS folders contain only the new data added to the following training (thus added to the previous data).
Only the last VS, which is complete (505 p.), is provided.
One document = images / transcribed pages (Ground Truth: transcription made by the members of TraPrInq project (Transcrever os processos da Inquisição portuguesa, 1536-1821 | Transcribing the court records of the Portuguese Inquisition, 1536-1821), which lasted from January 2023 to July 2024.
The majority of the documents are titled as follows: IL_number = document extracted from a trial record (processo) by the Inquisition of Lisbon_number of the processo; other titles: IC_ = Inquisition of Coimbra; IE_ = Inquisition of Évora.
Total of transcribed pages: 6,417.
The quality of the images in the data (jpg) is equal to that of the images used for automatic transcription.
All digitized images can be found on the catalog of the Portuguese National Archives (Arquivo Nacional da Torre do Tombo, ANTT).
Available data (10 zip files, total size 6.7 GB):
Training Set1: 698 pages/images
Training Set2: 984 pages/images
Training Set3: 869 pages/images
Training Set4: 926 pages/images
Training Set5: 631 pages/images
Training Set6: 665 pages/images
Training Set7: 564 pages/images
Training Set8: 549 pages/images
Training Set9: 531 pages/images
Validation Set_Final: 505 pages/images
2-one pdf file:
Paleographical criteria used by the team for the transcription of the documents; list of characters (in Portuguese).
Files
Training_Set_Model1.zip
Files
(6.7 GB)
Name | Size | Download all |
---|---|---|
md5:814257a9af3cded6c7c97e8a52cbc60b
|
1.0 GB | Preview Download |
md5:922de6553f4339acf0109965c6b3a28a
|
1.1 GB | Preview Download |
md5:895f5cf3fdc16a0b5e962753ae2b0fb2
|
473.4 MB | Preview Download |
md5:139c6d512779e3b9c48d20db68715e21
|
911.6 MB | Preview Download |
md5:e668598c38c491fae92fa91efe8476d1
|
631.8 MB | Preview Download |
md5:ee908fc78affe6ae1c83989e3b4bd152
|
625.3 MB | Preview Download |
md5:fd0bab87bedfa191b70d5fabbff2c108
|
527.3 MB | Preview Download |
md5:747fbc7f76fd5305a558209d49700584
|
493.3 MB | Preview Download |
md5:2a65bb0d9f40e12e482dd3330f9c6402
|
452.6 MB | Preview Download |
md5:842164722de5da610d057b28935506c5
|
627.6 kB | Preview Download |
md5:e220fef22d0923b0e1d97b109ca6d7f7
|
502.4 MB | Preview Download |
Additional details
Additional titles
- Subtitle (Portuguese)
- AI Model training data
Related works
- Is source of
- Model: https://readcoop.eu/model/portuguese-handwriting-16th-19th-century (URL)
Funding
Dates
- Other
-
2022
- Other
-
2023