Documentation of metagrapho-tropy

client.py

Client class.

class metagrapho_tropy.client.Client(user=None, password=None, api=None, processing_data=None)

Standalone client.

Parameters:
  • user (Optional[str]) – User’s Transkribus email, defaults to None

  • password (Optional[str]) – User’s Transkribus password, defaults to None

  • api (Optional[TranskribusProcessingAPI]) – Transkribus metagrapho API wrapper instance, defaults to None

  • processing_data (Optional[list]) – map of item IDs to Transkribus metagrapho API processing IDs, defaults to None

static _load_mapping(mapping_file_path)

Load mapping as dictionary with Tropy item ID as key and list of image index, processing ID as value.

Return type:

dict

_process_image(item, item_image_index, line_model_id=None, atr_model_id=None, lowest_common_dir=None)

Process a single image.

Parameters:
  • item (Item) – a Tropy item

  • item_image_index (int) – the selected item’s index

  • line_model_id (Optional[int]) – the Transkribus line model ID, defaults to None

  • atr_model_id (Optional[int]) – the Transkribus ATR model ID, defaults to None

  • lowest_common_dir (Optional[str]) – lowest common directory, defaults to None

Return type:

None

static _repath(image_path, lowest_common_dir)

Refactor a Tropy image path from machine X to be used on machine Y given the lowest common directory of X and Y.

Parameters:
  • image_path (str) – the Tropy image path

  • lowest_common_dir (str) – the lowest common directory

Return type:

str

static _validate(tropy_file_path, tropy_save_path=None, mapping_file_path=None, mapping_save_path=None, item_type=None, item_tag=None, item_image_index=None, line_model_id=None, atr_model_id=None, lowest_common_dir=None)

Validate user input and initialize Tropy instance.

Parameters:
  • tropy_file_path (str) – complete path to Tropy export file including file extension

  • tropy_save_path (Optional[str]) – complete path to updated Tropy save file including file extension, defaults to None

  • mapping_save_path (Optional[str]) – complete path to CSV mapping save file including file extension, defaults to None

  • item_type (Optional[str]) – the item type, defaults to None

  • item_tag (Optional[str]) – the item tag, defaults to None

  • item_image_index (Optional[int]) – the selected item’s index, defaults to None

  • line_model_id (Optional[int]) – the Transkribus line model ID, defaults to None

  • atr_model_id (Optional[int]) – the Transkribus ATR model ID, defaults to None

  • lowest_common_dir (Optional[str]) – lowest common directory, defaults to None

Return type:

Tropy

download(mapping_file_path, download_save_path=None)

Download image to text transcriptions for Tropy items from the Transkribus Processing API initialized with the Client.process_tropy method.

Parameters:
  • mapping_file_path (str) – complete path to CSV mapping file including file extension

  • download_save_path (Optional[str]) – complete path to download JSON save file including file extension, defaults to None

Return type:

None

enrich_tropy(tropy_file_path, download_file_path, tropy_save_path=None, lines=False)

Enrich items in a Tropy export JSON-LD with transcriptions.

The transcriptions must be provided in a separate file generated by running Client.process_tropy and Client.download first.

Parameters:
  • tropy_file_path (str) – complete path to Tropy export file including file extension

  • download_file_path (str) – complete path to JSON download file including file extension

  • tropy_save_path (Optional[str]) – complete path to enriched Tropy save file including file extension, defaults to None

  • lines (bool) – toggle line by line transcription as selection elements, defaults to False

Return type:

None

process_tropy(tropy_file_path, tropy_save_path=None, mapping_save_path=None, item_type=None, item_tag=None, item_image_index=None, line_model_id=49272, atr_model_id=39995, lowest_common_dir=None)

Process selected Tropy items to yield image to text transcriptions.

Provide a Tropy export JSON-LD file. Items are selected via type and tag (optional and conjunctive). If no selection is made, all items are enriched. Images are selected via their index. If no specific image is selected, image to text is applied to all images. Processed items get the tag “atr_processed” and are saved to an updated JSON-LD file; in addition, there is a CSV file mapping items to processing IDs. The Transkribus Processing API generates the transcription based on a layout detection model and an ATR model, both customizable via their IDs. If the Tropy image paths do not correspond to the image paths on the machine running this module, provide the losest common directory shared by both paths. Use the Client.download method to download the transcription from the Transkribus Processing API (do this within at most 24 hours).

Parameters:
  • tropy_file_path (str) – complete path to Tropy export file including file extension

  • tropy_save_path (Optional[str]) – complete path to updated Tropy save file including file extension, defaults to None

  • mapping_save_path (Optional[str]) – complete path to CSV mapping save file including file extension, defaults to None

  • item_type (Optional[str]) – the item type, defaults to None

  • item_tag (Optional[str]) – the item tag, defaults to None

  • item_image_index (Optional[int]) – the selected item’s index, defaults to None

  • line_model_id (int) – the Transkribus line model ID, defaults to 49272 (= Mixed Text Line Orientation)

  • atr_model_id (int) – the Transkribus ATR model ID, defaults to 39995 (= Transkribus Print M1)

  • lowest_common_dir (Optional[str]) – the lowest common directory, defaults to None

Return type:

None

processing_api.py

TranskribusProcessingAPI class.

class metagrapho_tropy.api.TranskribusProcessingAPI(user, password)

Wrapper class of the Transkribus Processing API (Transkribus metagrapho API).

Swagger documentation of the API at https://transkribus.eu/processing/swagger/.

Parameters:
  • user (str) – Transkribus username

  • password (str) – Transkribus password

static authenticate(user, password)

Wrapper of oAuth2AuthCode.

Parameters:
  • user (str) – the username

  • password (str) – the password

Return type:

Response

get_result(process_id)

Wrapper of https://transkribus.eu/processing/swagger/#/Retrieve%20processing%20status%20and%20result.

Parameters:

process_id (int) – the Transkribus Processing API “processId” parameter

get_user()

Wrapper of https://transkribus.eu/processing/swagger/#/User%20Account/getUserInfo.

Return type:

Response

post_processes(line_model_id, atr_model_id, image)

Wrapper of https://transkribus.eu/processing/swagger/#/Submit%20data%20for%20processing.

Parameters:
  • line_model_id (int) – the Transkribus layout detection model ID

  • atr_model_id (int) – the Transkribus ATR model ID

  • image (str) – an image encoded to Base64

Return type:

Response

items.py

Item classes.

class metagrapho_tropy.item.Item(template='https://tropy.org/v1/templates/id#iTbU0YBP', LocationShown=None, LocationCreated=None, PersonInImage=None, PersonInImageWDetails=None, title=None, creator=None, dcterms_creator=None, date=None, dcterms_date=None, type=None, source=None, collection=None, box=None, folder=None, object=None, identifier=None, rights=None, hasPart=None, isPartOf=None, isRelatedTo=None, photo=None, list=None, tag=None, note=None)

A representation of a Tropy item.

The schema used (https://github.com/RISE-UNIBAS/bildersammlung-buddhismus-public/blob/main/indexing/AneignungBuddhismus.ttp) is an extension of the generic Tropy item.

Parameter naming prioritizes Tropy naming conventions for fields over PEP.

Parameters:
add_note_element(text, photo_index, language='de')

Add a note element to a photo.

Parameters:
  • text (str) – the note element’s text

  • photo_index (int) – the photo to which the note will attach

  • language (str) – the note’s language, defaults to ‘de’

Return type:

None

add_selection_element(text, photo_index, coords, language='de')

Add a selection element with a line transcription to a photo.

Parameters:
  • text (str) – the note element’s text

  • photo_index (int) – the photo to which the note will attach

  • coords (str) – Transkribus coordinates

  • language (str) – the note’s language, defaults to ‘de’

copy_metadata_from_dict(dictionary)

Copy metadata from dictionary.

Return type:

None

copy_metadata_from_item(item, *args)

Copy metadata from item.

Parameters:
  • item (Item) – the item from which metadata is copied

  • args – deselected attributes (values not copied)

Return type:

None

static get_inscribed_map()

Get mapping of field to inscribed field.

Return type:

dict

static get_normalized_tropy_field_names()

Get normalized keys.

Return type:

dict

serialize()

Serialize item as dictionary.

Return type:

dict

static transform_coordinates(coordinates)

Transform Transkribus coordinates points to Tropy coordinates.

Sample Transkribus coordinates points: ‘192,458 192,514 332,514 332,458’. Read the tuple ‘192, 458’ as ‘x, y’ where ‘0, 0’ is the top left corner of an image. Note that the y-axis is inverted (going down is positive).

Parameters:

coordinates (str) – value of Transkribus ‘coords’ key

Return type:

list[int]

tropy.py

Tropy class.

class metagrapho_tropy.tropy.Tropy(json_export)

A representation of a Tropy export.

Parameters:

json_export (dict) – loaded Tropy JSON export file

get_types()

Get deduplicated values of the items’ type fields.

Return type:

set

save(file_path)

Save Tropy export to file path.

Parameters:

file_path – complete path to file including filename and extension

Return type:

None

utility.py

Utility class.

class metagrapho_tropy.utility.Utility

A collection of utility functions.

static load_csv(file_path)

Load CSV from file as list.

Parameters:

file_path (str) – complete path to file including filename and extension

Return type:

list

static load_json(file_path)

Load a JSON object from file.

Parameters:

file_path (str) – complete path to file including filename and extension

Return type:

dict

static save_csv(header, data, file_path)

Save data as CSV file.

Parameters:
  • header (List) – the header

  • data (List) – the data to be saved

  • file_path (str) – complete path to file including filename and extension

Return type:

None

static save_json(data, file_path)

Save data as JSON file.

Parameters:
  • data (Union[List, Dict]) – the data to be saved

  • file_path (str) – complete path to file including filename and extension

Return type:

None

Indices and tables