[automatically generated by youtube] Hi my name is Tobias Hodel and I will introduce you to Transkribus and how you can use it for recognizing pre-modern documents contribute has been developed within the read project to reach stands for recognition and enrichment of archival documents the goal was to make archival documents more accessible especially handwritten ones the result was the research infrastructure since grievous which you can download freely on transkribus.eu the infrastructure is maintained by the read cooperative transkribus is based on artificial intelligence neural networks are used to identify lines and recognize the writing neural networks are also used to identify layout structures the layout analysis finds you all lines on the page even for difficult layouts the correct lines are being identified only mana reprocessing has to be done manually it is possible to train your own recognition models for this you need to align text with the layout with only about 50 pages of document from say pre-born area it's possible to train your first model you have to possibility to choose between two training engines the first one is provided by the university of rostock and sitlab the other one called pilaya is provided by technical university of valencia both engines rely on neural networks but their architecture is completely different let's have a look at some examples if you train five thousand words of a carolingian minuscule here from the library of singkal with about 5 000 words about 1000 lines we get a character error rate of 7 so 7 out of 100 characters are going to be recognized incorrectly in this example the problem is the expansion of abbreviations which we decided to do automatically in this second example we try to recognize the letters of contest renburg 48 000 words 6450 lines gives us a recognition model that recognizes over 97 percent of all the characters correctly this is about the best what you will get of the engines that are currently in transgrivas result-wise in this third example the goal was to train not only one single hand as in the examples before but to include several hands and try to model something like a generic model to recognize charges of the 14th and 15th century with 77 000 words and 3500 lines we will reach a cer of incorrectly um recognized characters of about five percent. in transkribus you will also find the variety of already prepared models for different scripts and also early prints for example for roman type but also for administrative hands of 16th century for charter scripts of the 13th to the 15th century or for latin prints of the 16th if you browse through the models it might be possible that you find something for your needs users are also encouraged to share their own models in order to build a larger variety of available models in order to fasten up the process of recognition for some models you can have a look at the validation set in order to make sure that the script is similar to what you are working on it is also possible to search directly in the recently recognized documents try out the keyword spotting where you also conserve true variants [Music] for more information and introductions go to transkribus.eu thank you for your interest and keep transcribing