Published March 22, 2018 | Version 2.1
Dataset Open

JeSemE models for lexical semantic change

  • 1. Friedrich-Schiller-Universität Jena, Germany


  • 1. Friedrich-Schiller-Universität Jena, Germany


Models for diachronic lexical semantics used by the Jena Semantic Explorer (JeSemE) web site described in our COLING 2018 paper "JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion".

Also described and applied in Johannes Hellrich's Ph.D. thesis "Word Embeddings: Reliability & Semantic Change" who was funded by the Deutsche Forschungsgemeinschaft (DFG) within the graduate school "The Romantic Model" (GRK 2041/1).

One ZIP file per corpus, each containing several CSV files:

  • CHI.csv with χword association values (structure: word-id, word-id, time, value)
  • EMBEDDING.csv with SVD-PPMI word embeddings (aligned; structure: word-id, time, values)
  • EMOTION.csv with VAD word emotion values (structure: word-id, time, values)
  • FREQUENCY.csv with relative word frequency values (structure: word-id, time, value)
  • PPMI.csv with PPMI word association values (structure: word-id, word-id, time, value)
  • SIMILARITY.csv with word embedding derived word similarity values (structure: word-id, word-id, time, value)
  • WORDIDS.csv mapping words to their corpus specific IDs

Corpora are:

  • coha: Corpus of Historical American English

  • dta: Deutsches Textarchiv 'German Text Archive'

  • google_fiction: Google Books N-Gram corpus, English fiction subcorpus

  • google_german: Google Books N-Gram corpus, German subcorpus

  • rsc: Royal Society Corpus 


Files (5.1 GB)

Name Size Download all
1.7 GB Preview Download
426.7 MB Preview Download
2.0 GB Preview Download
810.8 MB Preview Download
111.9 MB Preview Download

Additional details