Published June 2, 2024 | Version v1
Presentation Open

Fuzzy Name Matching for Untangling Provenance of Colonial Heritage

  • 1. ROR icon Vrije Universiteit Amsterdam
  • 1. ROR icon Vrije Universiteit Amsterdam

Description

 

This study addresses the challenges museums face in managing collections, particularly in determining rightful ownership amidst ethical and legal complexities. It focuses on enhancing museum datasets by incorporating collector background information and linking person names across heritage institutions. Key contributions include the application of string matching techniques, where five methods (Exact String Matching, Initial + Surname Matching, Surname Matching, and Fuzzy String Matching) were explored, detailing their strengths and limitations in balancing precision and recall. The research adapted DeezyMatch, an open-source Python library, for person name matching using the multilingual JRC-Names dataset, and fine-tuned the model with NMVW person instances and additional data points to enhance performance. Evaluations through ground truth assessments and approximate name matching tasks revealed that DeezyMatch achieved high precision but highlighted the need for human intervention to improve accuracy in complex cases. By utilizing datasets from the National Museum of World Culture (NMVW) and Museum Bronbeek, the study demonstrated the practical application and limitations of string matching approaches in real-world scenarios. The implications for future research emphasize the importance of human-in-the-loop approaches for disambiguation in provenance research and set a foundation for leveraging collector background information to improve the integration and accuracy of museum datasets. This study contributes to the field by providing a comprehensive analysis of string matching techniques and their application in enhancing the provenance research of museum collections.

Files

DH_BeNeLux_2024.pdf

Files (2.9 MB)

Name Size Download all
md5:be4eeb5ed3151e49b8cc20efe84119b6
2.8 MB Preview Download
md5:ed46ce9187a227e4b55f9b74892e76a9
170.5 kB Preview Download

Additional details

Dates

Accepted
2024-06-02