GLOBALISE Word2Vec Experiment

van Wissen, Leon; GLOBALISE

doi:10.5281/zenodo.15038313

Published March 17, 2025 | Version v1

Computational notebook Open

GLOBALISE Word2Vec Experiment

1. Universiteit van Amsterdam
2. Huygens Institute for History and Culture of the Netherlands

Searching the VOC Archives (https://www.nationaalarchief.nl/onderzoeken/archief/1.04.02) can be challenging due to the numerous spelling variations and obscure terms in the documents. In GLOBALISE (https://globalise.huygens.knaw.nl/), we've trained a Word2Vec model that helps by identifying spelling variants, synonyms, and other semantic relationships for any word in the GLOBALISE corpus.

This dataset contains the jupyter notebook that can be used to interact with the model. Instructions for obtaining the Word2Vec model are in the notebook. In case the model is not available anymore, the notebook also contains instructions to recreate the model from the GLOBALISE VOC transcriptions v2 (https://hdl.handle.net/10622/LVXSBW).

This notebook is made available on the GLOBALISE Lab website: https://lab.globalise.huygens.knaw.nl/experiments/GLOBALISE_Word2Vec_Lab/

Files

GLOBALISE_Word2Vec_Lab.ipynb

Files (30.3 kB)

Name	Size	Download all
GLOBALISE_Word2Vec_Lab.ipynb md5:475d3983d735260df2186a15f141ff43	30.3 kB	Preview Download

Additional details

Dutch Research Council
General Letters Ontology Based AccessibiLity InfraStructure (GLOBALISE) 37465

Programming language: Python

	All versions	This version
Views	174	174
Downloads	48	48
Data volume	5.1 MB	5.1 MB

GLOBALISE_Word2Vec_Lab.ipynb

Files (30.3 kB)

Funding

Software

GLOBALISE Word2Vec Experiment

Authors/Creators

Description

Files

GLOBALISE_Word2Vec_Lab.ipynb

Files (30.3 kB)

Additional details

Funding

Software