{"authors": [{"name": "Chagu\u00e9, Alix", "affiliation": "ALMAnaCH, Inria"}], "summary": "LECTAUREP Contemporary French Model (Administration)", "description": "The model was trained from the ground truth produced by the LECTAUREP Project (Inria & Archives Nationales) between 2019 and 2022. The training dataset contained many handwriting examples taken from French administrative documents produced between 1742 and 1928. The training and testing data was collected from LECTAUREP's ground truth repositories (lectaurep-bronod v0.0.1: https://github.com/HTR-United/lectaurep-bronod/releases/tag/v0.0.1, lectaurep-mariages-et-divorces v.1.0: https://github.com/HTR-United/lectaurep-mariages-et-divorces/releases/tag/v1.0, lectaurep-repertoires v2.0: https://github.com/HTR-United/lectaurep-repertoires/releases/tag/v2.0). 12 pages were kept aside to create a test set. The training dataset contained 308 files, 19 364 lines, 329 270 characters. The test dataset contained: 12 files, 962 lines, 15 243 characters. The transcriptions were created with eScriptorium. The guidelines included transcribing what is written (abbreviations are not developed, capitalization follows 19th century practices), superscripted portions of text are signaled by `^` and many signatures are transcription with \u00a5. The model was trained using the NFD normalization. It was trained by Alix Chagu\u00e9 using data created by Aur\u00e9lia Rostaing, Fran\u00e7oise Limon-Bonnet, Nathalie Denis and Marc Durand. Mor information on the model can be found at https://github.com/lectaurep/lectaurep_base_model", "accuracy": 90.9, "license": "CC-BY-4.0", "script": ["Latn"], "name": "lectaurep_base.mlmodel ", "graphemes": [" ", "\"", "%", "&", "'", "(", ")", "+", ",", "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "=", "?", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "]", "^", "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "|", "~", "\u00a5", "\u00a8", "\u00b0", "\u00bd", "\u00e6", "\u0153", "\u023c", "\u0300", "\u0301", "\u0302", "\u0303", "\u0308", "\u0327", "\u20ac", "\u2191", "\u221f"]}