Published March 17, 2025 | Version v1
Computational notebook Open

GLOBALISE Word2Vec Experiment

  • 1. Universiteit van Amsterdam
  • 2. ROR icon Huygens Institute for History and Culture of the Netherlands

Description

Searching the VOC Archives (https://www.nationaalarchief.nl/onderzoeken/archief/1.04.02) can be challenging due to the numerous spelling variations and obscure terms in the documents. In GLOBALISE (https://globalise.huygens.knaw.nl/), we've trained a Word2Vec model that helps by identifying spelling variants, synonyms, and other semantic relationships for any word in the GLOBALISE corpus.

This dataset contains the jupyter notebook that can be used to interact with the model. Instructions for obtaining the Word2Vec model are in the notebook. In case the model is not available anymore, the notebook also contains instructions to recreate the model from the GLOBALISE VOC transcriptions v2 (https://hdl.handle.net/10622/LVXSBW).

This notebook is made available on the GLOBALISE Lab website: https://lab.globalise.huygens.knaw.nl/experiments/GLOBALISE_Word2Vec_Lab/

Files

GLOBALISE_Word2Vec_Lab.ipynb

Files (30.3 kB)

Name Size Download all
md5:475d3983d735260df2186a15f141ff43
30.3 kB Preview Download

Additional details

Funding

Dutch Research Council
General Letters Ontology Based AccessibiLity InfraStructure (GLOBALISE) 37465

Software

Programming language
Python