GLOBALISE Word2Vec Experiment
Authors/Creators
Description
Searching the VOC Archives (https://www.nationaalarchief.nl/onderzoeken/archief/1.04.02) can be challenging due to the numerous spelling variations and obscure terms in the documents. In GLOBALISE (https://globalise.huygens.knaw.nl/), we've trained a Word2Vec model that helps by identifying spelling variants, synonyms, and other semantic relationships for any word in the GLOBALISE corpus.
This dataset contains the jupyter notebook that can be used to interact with the model. Instructions for obtaining the Word2Vec model are in the notebook. In case the model is not available anymore, the notebook also contains instructions to recreate the model from the GLOBALISE VOC transcriptions v2 (https://hdl.handle.net/10622/LVXSBW).
This notebook is made available on the GLOBALISE Lab website: https://lab.globalise.huygens.knaw.nl/experiments/GLOBALISE_Word2Vec_Lab/
Files
GLOBALISE_Word2Vec_Lab.ipynb
Files
(30.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:475d3983d735260df2186a15f141ff43
|
30.3 kB | Preview Download |
Additional details
Funding
- Dutch Research Council
- General Letters Ontology Based AccessibiLity InfraStructure (GLOBALISE) 37465
Software
- Programming language
- Python