Documentation of metagrapho-tropy
client.py
Client class.
- class metagrapho_tropy.client.Client(user=None, password=None, api=None, processing_data=None)
Standalone client.
- Parameters:
user (
Optional
[str
]) – User’s Transkribus email, defaults to Nonepassword (
Optional
[str
]) – User’s Transkribus password, defaults to Noneapi (
Optional
[TranskribusProcessingAPI
]) – Transkribus metagrapho API wrapper instance, defaults to Noneprocessing_data (
Optional
[list
]) – map of item IDs to Transkribus metagrapho API processing IDs, defaults to None
- static _load_mapping(mapping_file_path)
Load mapping as dictionary with Tropy item ID as key and list of image index, processing ID as value.
- Return type:
dict
- _process_image(item, item_image_index, line_model_id=None, atr_model_id=None, lowest_common_dir=None)
Process a single image.
- Parameters:
item (
Item
) – a Tropy itemitem_image_index (
int
) – the selected item’s indexline_model_id (
Optional
[int
]) – the Transkribus line model ID, defaults to Noneatr_model_id (
Optional
[int
]) – the Transkribus ATR model ID, defaults to Nonelowest_common_dir (
Optional
[str
]) – lowest common directory, defaults to None
- Return type:
None
- static _repath(image_path, lowest_common_dir)
Refactor a Tropy image path from machine X to be used on machine Y given the lowest common directory of X and Y.
- Parameters:
image_path (
str
) – the Tropy image pathlowest_common_dir (
str
) – the lowest common directory
- Return type:
str
- static _validate(tropy_file_path, tropy_save_path=None, mapping_file_path=None, mapping_save_path=None, item_type=None, item_tag=None, item_image_index=None, line_model_id=None, atr_model_id=None, lowest_common_dir=None)
Validate user input and initialize Tropy instance.
- Parameters:
tropy_file_path (
str
) – complete path to Tropy export file including file extensiontropy_save_path (
Optional
[str
]) – complete path to updated Tropy save file including file extension, defaults to Nonemapping_save_path (
Optional
[str
]) – complete path to CSV mapping save file including file extension, defaults to Noneitem_type (
Optional
[str
]) – the item type, defaults to Noneitem_tag (
Optional
[str
]) – the item tag, defaults to Noneitem_image_index (
Optional
[int
]) – the selected item’s index, defaults to Noneline_model_id (
Optional
[int
]) – the Transkribus line model ID, defaults to Noneatr_model_id (
Optional
[int
]) – the Transkribus ATR model ID, defaults to Nonelowest_common_dir (
Optional
[str
]) – lowest common directory, defaults to None
- Return type:
- download(mapping_file_path, download_save_path=None)
Download image to text transcriptions for Tropy items from the Transkribus Processing API initialized with the Client.process_tropy method.
- Parameters:
mapping_file_path (
str
) – complete path to CSV mapping file including file extensiondownload_save_path (
Optional
[str
]) – complete path to download JSON save file including file extension, defaults to None
- Return type:
None
- enrich_tropy(tropy_file_path, download_file_path, tropy_save_path=None, lines=False)
Enrich items in a Tropy export JSON-LD with transcriptions.
The transcriptions must be provided in a separate file generated by running Client.process_tropy and Client.download first.
- Parameters:
tropy_file_path (
str
) – complete path to Tropy export file including file extensiondownload_file_path (
str
) – complete path to JSON download file including file extensiontropy_save_path (
Optional
[str
]) – complete path to enriched Tropy save file including file extension, defaults to Nonelines (
bool
) – toggle line by line transcription as selection elements, defaults to False
- Return type:
None
- process_tropy(tropy_file_path, tropy_save_path=None, mapping_save_path=None, item_type=None, item_tag=None, item_image_index=None, line_model_id=49272, atr_model_id=39995, lowest_common_dir=None)
Process selected Tropy items to yield image to text transcriptions.
Provide a Tropy export JSON-LD file. Items are selected via type and tag (optional and conjunctive). If no selection is made, all items are enriched. Images are selected via their index. If no specific image is selected, image to text is applied to all images. Processed items get the tag “atr_processed” and are saved to an updated JSON-LD file; in addition, there is a CSV file mapping items to processing IDs. The Transkribus Processing API generates the transcription based on a layout detection model and an ATR model, both customizable via their IDs. If the Tropy image paths do not correspond to the image paths on the machine running this module, provide the losest common directory shared by both paths. Use the Client.download method to download the transcription from the Transkribus Processing API (do this within at most 24 hours).
- Parameters:
tropy_file_path (
str
) – complete path to Tropy export file including file extensiontropy_save_path (
Optional
[str
]) – complete path to updated Tropy save file including file extension, defaults to Nonemapping_save_path (
Optional
[str
]) – complete path to CSV mapping save file including file extension, defaults to Noneitem_type (
Optional
[str
]) – the item type, defaults to Noneitem_tag (
Optional
[str
]) – the item tag, defaults to Noneitem_image_index (
Optional
[int
]) – the selected item’s index, defaults to Noneline_model_id (
int
) – the Transkribus line model ID, defaults to 49272 (= Mixed Text Line Orientation)atr_model_id (
int
) – the Transkribus ATR model ID, defaults to 39995 (= Transkribus Print M1)lowest_common_dir (
Optional
[str
]) – the lowest common directory, defaults to None
- Return type:
None
processing_api.py
TranskribusProcessingAPI class.
- class metagrapho_tropy.api.TranskribusProcessingAPI(user, password)
Wrapper class of the Transkribus Processing API (Transkribus metagrapho API).
Swagger documentation of the API at https://transkribus.eu/processing/swagger/.
- Parameters:
user (
str
) – Transkribus usernamepassword (
str
) – Transkribus password
- static authenticate(user, password)
Wrapper of oAuth2AuthCode.
- Parameters:
user (
str
) – the usernamepassword (
str
) – the password
- Return type:
Response
- get_result(process_id)
Wrapper of https://transkribus.eu/processing/swagger/#/Retrieve%20processing%20status%20and%20result.
- Parameters:
process_id (
int
) – the Transkribus Processing API “processId” parameter
- get_user()
Wrapper of https://transkribus.eu/processing/swagger/#/User%20Account/getUserInfo.
- Return type:
Response
- post_processes(line_model_id, atr_model_id, image)
Wrapper of https://transkribus.eu/processing/swagger/#/Submit%20data%20for%20processing.
- Parameters:
line_model_id (
int
) – the Transkribus layout detection model IDatr_model_id (
int
) – the Transkribus ATR model IDimage (
str
) – an image encoded to Base64
- Return type:
Response
items.py
Item classes.
- class metagrapho_tropy.item.Item(template='https://tropy.org/v1/templates/id#iTbU0YBP', LocationShown=None, LocationCreated=None, PersonInImage=None, PersonInImageWDetails=None, title=None, creator=None, dcterms_creator=None, date=None, dcterms_date=None, type=None, source=None, collection=None, box=None, folder=None, object=None, identifier=None, rights=None, hasPart=None, isPartOf=None, isRelatedTo=None, photo=None, list=None, tag=None, note=None)
A representation of a Tropy item.
The schema used (https://github.com/RISE-UNIBAS/bildersammlung-buddhismus-public/blob/main/indexing/AneignungBuddhismus.ttp) is an extension of the generic Tropy item.
Parameter naming prioritizes Tropy naming conventions for fields over PEP.
- Parameters:
template (
str
) – http://purl.org/dc/elements/1.1/typetitle (
Optional
[str
]) – http://purl.org/dc/elements/1.1/titleLocationShown (
Optional
[str
]) – http://iptc.org/std/Iptc4xmpExt/2008-02-29/LocationShownLocationCreated (
Optional
[str
]) – http://iptc.org/std/Iptc4xmpExt/2008-02-29/LocationCreatedPersonInImage (
Optional
[str
]) – http://iptc.org/std/Iptc4xmpExt/2008-02-29/PersonInImagePersonInImageWDetails (
Optional
[str
]) – http://iptc.org/std/Iptc4xmpExt/2008-02-29/PersonInImageWDetailscreator (
Optional
[str
]) – http://purl.org/dc/elements/1.1/creatordcterms_creator (
Optional
[str
]) – http://purl.org/dc/terms/creatordate (
Optional
[str
]) – http://purl.org/dc/elements/1.1/datedcterms_date (
Optional
[str
]) – http://purl.org/dc/terms/datetype (
Optional
[str
]) – http://purl.org/dc/elements/1.1/typesource (
Optional
[str
]) – http://purl.org/dc/elements/1.1/sourcecollection (
Optional
[str
]) – https://tropy.org/v1/tropy#collectionbox (
Optional
[str
]) – https://tropy.org/v1/tropy#boxfolder (
Optional
[str
]) – https://tropy.org/v1/tropy#folderobject (
Optional
[str
]) – http://www.europeana.eu/schemas/edm/objectidentifier (
Optional
[str
]) – http://purl.org/dc/elements/1.1/identifierrights (
Optional
[str
]) – http://purl.org/dc/elements/1.1/rightshasPart (
Optional
[str
]) – http://purl.org/dc/terms/hasPartisPartOf (
Optional
[str
]) – http://purl.org/dc/terms/isPartOfisRelatedTo (
Optional
[str
]) – http://www.europeana.eu/schemas/edm/isRelatedTophoto (
Optional
[List
[Dict
]]) – https://tropy.org/v1/tropy#photolist (
Optional
[List
[str
]]) – Tropy listtag (
Optional
[list
]) – Tropy tagnote (
Optional
[list
]) – Tropy note
- add_note_element(text, photo_index, language='de')
Add a note element to a photo.
- Parameters:
text (
str
) – the note element’s textphoto_index (
int
) – the photo to which the note will attachlanguage (
str
) – the note’s language, defaults to ‘de’
- Return type:
None
- add_selection_element(text, photo_index, coords, language='de')
Add a selection element with a line transcription to a photo.
- Parameters:
text (
str
) – the note element’s textphoto_index (
int
) – the photo to which the note will attachcoords (
str
) – Transkribus coordinateslanguage (
str
) – the note’s language, defaults to ‘de’
- copy_metadata_from_dict(dictionary)
Copy metadata from dictionary.
- Return type:
None
- copy_metadata_from_item(item, *args)
Copy metadata from item.
- Parameters:
item (
Item
) – the item from which metadata is copiedargs – deselected attributes (values not copied)
- Return type:
None
- static get_inscribed_map()
Get mapping of field to inscribed field.
- Return type:
dict
- static get_normalized_tropy_field_names()
Get normalized keys.
- Return type:
dict
- serialize()
Serialize item as dictionary.
- Return type:
dict
- static transform_coordinates(coordinates)
Transform Transkribus coordinates points to Tropy coordinates.
Sample Transkribus coordinates points: ‘192,458 192,514 332,514 332,458’. Read the tuple ‘192, 458’ as ‘x, y’ where ‘0, 0’ is the top left corner of an image. Note that the y-axis is inverted (going down is positive).
- Parameters:
coordinates (
str
) – value of Transkribus ‘coords’ key- Return type:
list
[int
]
tropy.py
Tropy class.
- class metagrapho_tropy.tropy.Tropy(json_export)
A representation of a Tropy export.
- Parameters:
json_export (
dict
) – loaded Tropy JSON export file
- get_types()
Get deduplicated values of the items’ type fields.
- Return type:
set
- save(file_path)
Save Tropy export to file path.
- Parameters:
file_path – complete path to file including filename and extension
- Return type:
None
utility.py
Utility class.
- class metagrapho_tropy.utility.Utility
A collection of utility functions.
- static load_csv(file_path)
Load CSV from file as list.
- Parameters:
file_path (
str
) – complete path to file including filename and extension- Return type:
list
- static load_json(file_path)
Load a JSON object from file.
- Parameters:
file_path (
str
) – complete path to file including filename and extension- Return type:
dict
- static save_csv(header, data, file_path)
Save data as CSV file.
- Parameters:
header (
List
) – the headerdata (
List
) – the data to be savedfile_path (
str
) – complete path to file including filename and extension
- Return type:
None
- static save_json(data, file_path)
Save data as JSON file.
- Parameters:
data (
Union
[List
,Dict
]) – the data to be savedfile_path (
str
) – complete path to file including filename and extension
- Return type:
None